What is an affine transformation?
How are affine transformations implemented?
Some related transforms
Shearing an image
An affine transformation is the most general linear transformation on an image:
x' = a*x + b*y + c (1) y' = d*x + e*y + for in (transposed) matrix notation:
[x' y'] = [x y 1]T (2)where T is a 3x2 matrix of coefficients:
T = [a d b e (3) c f]
There are a couple of ways this can be visualized geometrically. If you look at a two-dimensional surface (coordinate system) from a great distance with arbitrary orientation in the third dimension, you will see that any object in the two-dimensional surface appears to be distorted by a combination, in general, of translation, shear and scaling. So an affine transformation takes any coordinate system in a plane into another coordinate system that can be found from such a projection. Under affine transformation, parallel lines remain parallel and straight lines remain straight.
Consider this transformation of coordinates. A coordinate system (or coordinate space) in two-dimensions is defined by an origin, two non-parallel axes (they need not be perpendicular), and two scale factors, one for each axis. This can be described by 3 points, one for the origin and one for the unit distance along each of the two axes. It takes six numbers to specify three points. Suppose we have these three points: (x1, y1), (x2, y2) and (x3, y3). Then given an affine transformation as above, we can find the corresponding three transformed points from:
x1' = a*x1 + b*y1 + c y1' = d*x1 + e*y1 + f x2' = a*x2 + b*y2 + c (4) y2' = d*x2 + e*y2 + f x3' = a*x3 + b*y3 + c y3' = d*x3 + e*y3 + f
Conversely, if we are given the three points in the transformed (primed) coordinate space that correspond to the three unprimed points, we could solve the set of 6 equations in (4) for the six coefficients. These coefficents could then be used in (1) to transform any point in the original coordinate space to its location in the primed coordinate space.
There are two very different approaches to implement an affine transformation on an image:
In practice, the pointwise approach is the most useful. For each dest pixel, the corresponding src pixel (or pixels) are located, and the dest pixel value is computed.
For binary images, the pointwise transform is about 3 times slower than the sequential transform, whereas for grayscale images the pointwise transform is an order of magnitude faster! However, of more importance, the results, particularly on text, are far better with the pointwise transform. The sequential transform involves a number of shears, which cause visible dislocation lines through characters. After such a set of sequential transforms, the edges, that were initially smooth, become irregular, significantly degrading the visual quality of the text. The pointwise transform has a minimal number of such artifacts, and any shear lines that are developed tend to have shorter spatial correlation. It is thus strongly urged that in any situation where the quality of the resulting text image is important, the pointwise transformation should be used.
The following image fragments show the image quality resulting from
the same affine transformation. The first fragment has been
done pointwise, whereas the second one was done sequentially.
See the documentation in affine.c, where it is shown that the pointwise transformation should be performed backwards, so that for every point in the (primed) dest, you find the corresponding point in the (unprimed) src. For binary images, the implementation is slower in the backwards direction, because you have to find the corresponding point in the src for every point in the dest, whereas going in the forward direction, you only write to the dest when the src pixel is ON. However, the forward direction implementation can miss some pixels entirely in the dest. When a solid black region is transformed, this will in general result in a regular pattern of white pixels within it, even when no overall scaling is done.
With the affine transform, as with the projective and bilinear transforms (see below), the pointwise transform can be done for all pixel depths by sampling. For 8 bpp and RGB images, better results are obtained, particularly on document images that have sharp edges, using interpolation. Interpolation involves subdividing the src pixels into subpixels (we divide each into 16 x 16 subpixels), and using the subpixel location to generate weighting factors for the four neighboring src pixels. Interpolation is about 5x slower than sampling the nearest src pixel. When there is relatively little scale change, interpolation is nearly equivalent to area mapping, as we have shown with rotation.
We now describe the situation where the affine transform is performed as a sequence of translations, scalings, shears and, optionally, rotations. Because a rotation is equivalent to three shears), we need in principle only translations, scaling and shear. We have already demonstrated that there are 6 independent parameters in the affine transformation. This can also be seen from the transformation from one coordinate space to another. There are 2 parameters to align the origins, 2 parameters for the scaling of the two axes, and 2 parameters describing the change in angle of each axis. Now, suppose you are given the two coordinate spaces, and want to transform from one to the other. How do we do this with the image operations in Leptonica?
These operations are as follows:
To make use of these transforms to align our two coordinate spaces, we use the following construction, described also in affine.c. The problem to be solved is to take an image with one coordinate space (unprimed) and transform it by an affine transformation to coincide with another coordinate space (primed). We use only shears parallel to the x and y axes, orthogonal scaling with scaling factors in x and y, and translation.
A typical application is to align the image with a second image in another coordinate space, related by the affine transformation. This invites the following model for making the affine transform. Imagine that the unprimed coordinate space is in our original image (image 1) and the corresponding primed coordinate space is in a second image (image 2). We can transform both images so that the coordinate spaces coincide and are aligned with the x and y axes. After both the unprimed and primed coordinate spaces are coincident, we finally shear the unprimed coordinate space to coincide with the original primed space. Thus, all operations really happen on a single image. Specifically, we do the following:
In all this, it is only necessary to keep track of the shear angles and translations of points during the shears. What has been accomplished is a general affine transformation on the image. See affine.c for the specifics of the implementation.
Use these links for more detail on rotation, translation and scaling. Shear is described below.
T = [a -b b a (5) c d]The four parameters correspond to homogeneous scaling, rotation, and x and y translation.
x' = (a*x + b*y + c) / (g*x + h*y + 1) (6) y' = (d*x + e*y + f) / (g*x + h*y + 1)It takes 4 points, or eight coefficients, to describe this transformation. The projective transform can equivalently be described by the (transposed) matrix equations:
x' = u / w (7) y' = v / wwhere
[u v w] = [x y 1]T (8)and T is the 3x3 matrix:
T = [a d g b e h (9) c f 1]Compared with the affine transform, the extra point (or 2 parameters) allows specification of a keystone warp by the relative distances between pairs of points. Because the projective transform is not linear, it cannot be composed as a sequence of translations, shears, scalings and (optionally) rotations.
The eight coefficients in (6) can be computed from four corresponding pairs of points (8 equations). No three points may be collinear. Projective transforms, and their effects on images, are implemented in projective.c. As with the affine transform, both sampling and interpolation implementations are given of the transform, with the interpolation method being about 5x slower.
It is possible to speed up the interpolation by a small amount by dividing the pixels into 4 x 4 subpixels (instead of 16 x 16 subpixels), and inlining an approximation for each of the 16 cases. (See the low-level implementation of pixRotateAMColorFast() for the method.) This can be also be done for affine and bilinear transforms. The speed increase is less than 20%, so in most cases it is not worth the increase in complexity.
The bilinear transform has a nonlinear cross-term in the transformation equation:
x' = a*x + b*y + c*x*y + d (10) y' = e*x + f*y + g*x*y + hand can equivalently be described by the matrix equation
[x' y'] = [x y xy 1]T (11)where T is the 4x2 matrix:
T = [a e b f (12) c g d h]
Like the projective transform, the bilinear transform cannot be composed as a sequence of translation, scaling and shears. Bilinear tranforms of images are implemented in bilinear.c, for both sampling and interpolation of the src image.
Formally, a horizontal shear of angle theta about a line y = b, with the origin at the UL corner and a cw rotation taken to be positive, is
[x' y'] = [x y 1]T (13)where T is the 3x2 matrix:
T = [1 0 -tan(theta) 1 (14) b*tan(theta) 0]Likewise, a vertical shear of angle theta about a line x = a is given by (13) with
T = [1 -tan(theta) 0 1 (15) 0 a*tan(theta)]
All interfaces to implementations of shear in Leptonica are given at a high level, using the PIX data structure. Two-image shear, where the src is unchanged, is implemented by pixRasterop(). For example, a horizontal shear requires moving full-width blocks of pixels horizontally by varying amounts, depending on the vertical location of the block. The height of the block is inversely proportional to the shear angle, appropriately integerized. We express angles in radians, which are a natural unit, because for small angles the height of each block is approximately equal to the inverse of the shear angle.
We provide special cases where the image is sheared about the upper-left corner and also about the center. When shearing by very small angles, the block height (for horizontal shear) is large. When shearing about the upper-left corner, the height of the block that is not moved is only half the block height, whereas when shearing about the center of the image, the "dead zone" is the full block height.
We also provide in-place versions of shear, implemented by
block shearing of the in-place rasterop functions
pixRasteropHip() and pixRasteropVip().
The higher level two-image horizontal and vertical shear
functions pixHShear() and pixVShear() call
the in-place shear functions pixHShearIP()
and pixVShearIP() when the src and dest
images are the same.