 
 
How do you rotate an image by an arbitrary angle?
Rotation by sampling
Rotation by shear
Rotation by area mapping
Fast rotation by area mapping
Special rotations by 90, 180 or 270 degrees
We provide three different methods for image rotation by an arbitrary angle. The first is rotation by sampling (RSamp), which chooses the value of each dest pixel to be that of the source pixel closest to the location the dest pixel came from (i.e., before rotation). The second is rotation by shear (RShear), which, depending on the implementation, is an approximation to rotation by sampling. The third is rotation by area mapping (RAM), which computes the value of each dest pixel from the 4 source pixels from which it was derived, suitably weighted by the actual overlap. Rotation by area mapping is identical to linear interpolation on the four closest src pixels.
RSamp and RShear have flexibility and speed advantages over RAM. First, RSamp and RShear can be applied to images of arbitrary pixel depth, with no change in the algorithm. Second, for 1 bpp images, RShear is faster than RSamp because it is implemented with horizontal and vertical rasterops. And RShear is about 10 to 15 times faster than RAM in our implementations. And third, RShear is easily implemented as an in-place operation, changing the pixel values of the source image directly.
Of course, that's not the entire story. For grayscale and color images, RAM produces far better image quality than RSamp and RShear. RAM naturally blurs sharp edges (an effect often called "anti-aliasing") so that stair-step "jaggie" artifacts are not introduced, as they can be in RSamp and RShear. Also, for large angles where you must use 3 shears, RShear causes some extra clipping of pixels near the corners if the rotation is about the center and the dest image is the same size as the source.
For applications where it is important not to lose any pixels due to rotation, we provide a high-level interface, pixRotate(), which embeds the src in a minimum-sized dest image so that no pixels will be lost.
The mathematics of the rotation of a continuous image (i.e., an image with infinitesimally small pixels) using horizontal and vertical shears is simple. A horizontal shear moves a row of image pixels a horizontal distance that is proportional to its vertical distance from some reference point. A succession of 3 alternating horizontal and vertical shears can be used to rotate an image perfectly by any angle. For rotation by small angles of 0.05 radians (about 3 degrees) or less, rotation can be performed using only 2 shears. Expressing the rotation angle in radians, this 2-shear rotation induces a distortion in the rotated image that is a change in the length to width ratio of the amount:
      delta(length/width) = 0.5 * angle2
For an angle of 0.05 radians, this is about 1 part in a thousand,
which is typically not something you'd worry about.  For
larger angles, 3 shears are required.  The details of
the shear angles required for 2 and 3 shear rotation are given
in rotateshear.c.
In-place rotation is performed by in-place shears. For a horizontal in-place shear, each pixel row is translated to the left or right by some number of pixels. Typically, several rows are translated together, and the pixels that are not over-written are filled in. A similar up or down translation of pixel columns is used for vertical in-place shears. For shear, we give two choices for the color of the filling pixels: white and black. (Henry Ford only offered black.) The "jaggies" at sharp boundaries are visible because images are composed of pixels of finite size, arranged on a square lattice. They arise because the two and three shear rotations are nearest-pixel approximations to continuous rotations.
An interesting result is obtained using a sequence of in-place rotations with the same angle. The distortions introduced due to the finite pixel size accumulate, resulting in a nearly random pixel diffusion if repeated many times. For example, suppose you rotate an image six full revolutions (2160 degrees) using 180 rotations of 12 degrees each. The result is an image that is "encrypted" because of the pixel diffusion; everything is blurred and a text image is not readable. However, if you then unwind the image with 180 rotations of 12 degrees in the opposite direction, you obtain the original image. Well, not quite. You get a part of the image -- a circular subimage that is equal in size to the largest inscribed circle, with the circle center at the center of rotation, that fits within the original boundaries of the image. You can try this yourself, using prog/rotatetest1.c with the above parameters on the image test24.jpg. See what you get. You can then recover the central part of the original image by unscrambling the "encrypted" image using an angle of -12.0 degrees. If you want to regain the entire image, it is necessary before "encrypting" to first embed the original in a larger rectangle that would contain the full rotated image at every angle it is rotated through.
Because the underlying operation is rasterops, RShear works on images of all depths, with or without color mapping. Speed plus flexibility, plus the ability to rotate in-place, make RShear of ubiquitous importance. The number of pixels/sec that can be rotated by RShear is inversely proportional to the pixel depth. RShear rotates bits at a rate of about 1 bit in 3 machine cycles. On a 3 GHz machine, a 1 bpp image is thus rotated at about 1 billion pixels/sec, and an 8 bpp image is rotated at about 120 million pixels/sec.
Area mapping works as follows. For each dest pixel you find the 4 source pixels that it partially covers. You then compute the dest pixel value as the area-weighted average of those 4 source pixels. We make two simplifying approximations:
 
 
    Dest pixel value = (1/N2) [(N - x)(N - y)fi,j + x(N - y)fi,j+1 + y(N - x)fi+1,j + xyfi+1,j+1]
This is the same digital filter that is obtained for linear interpolation when arbitrarily scaling grayscale images!
For most applications, the loss in accuracy (relative to the 16 x 16 subpixel version) is minimal, and the rotation speed for RGB images on a 3 GHz processor is about 18 million pixels/second.
Let's look at the low-level part of the left-right shift operation in more detail. For image depth < 8 bpp, it is more efficient to extract bytes and use a lookup table to reverse the pixels in the byte. This is particularly important for binary images. However, because each raster line begins on a 32-bit word boundary, but does not necessarily end on one, before selecting the bytes for reversal, we must right-justify the raster lines on the 32-bit word boundaries. This is most easily done by shifting the entire image as a block, using an in-place horizontal rasterop. The code for a binary image is:
    extra = w & 31;
    if (extra)
        shift = 32 - extra;
    else
        shift = 0;
    if (shift)
        rasteropHipLow(datas, w, h, d, wpls, 0, h, shift);
    databpl = (w + 7) / 8;
    bpl = 4 * wpls;
    for (i = 0; i < h; i++) {
        lines = datas + (h - 1 - i) * wpls;
        lined = datad + i * wpld;
        for (j = 0; j < databpl; j++) {
            if (val = GET_DATA_BYTE(lines, bpl - 1 - j))
                SET_DATA_BYTE(lined, j, tab[val]);
        }
    }
    if (shift)
        rasteropHipLow(datas, w, h, d, wpls, 0, h, -shift);
The second rasterop restores the source to its normal
left-justification on word boundaries.
The table supplied here reverses the bits in a byte.
For a 2 bpp image, a different table that reverses the
four 2-bit pixels within a byte must be used.
Note that as mentioned above,
if the value of the byte is 0, we do not bother to
write its reversed value to the dest.  Such
tests before setting are particularly useful with binary images of
scanned text, where most of the image is background (white) and
the majority of bytes have zero value.  This rotates each
binary pixel in about 5 machine cycles.
The 90 degree rotation, like the 180 rotation, is conceptually trivial: we scan pixels in the source and copy them to the dest. As with the 180 degree rotation, the dest must be cleared in advance. In our high-level code, this is implicit because the dest is always calloc'd. It's usually convenient to organize the two loops by considering the scan for the dest pixels, and one needs to decide in which direction is to be the fast scan. If we're simply plucking pixels from the source, it doesn't really matter. Either the source or the dest will have a fast scan direction jumping through the image data. This will cause secondary cache misses, resulting in main memory fetches, but it's not too serious and takes some effort to ameliorate.
Choosing the fast direction in the dest to be horizontal, for cw rotation we have the following simple algorithm:
    for (i = 0; i < hd; i++) {
        lined = datad + i * wpld;
        lines = datas + (wd - 1) * wpls;
        for (j = 0; j < wd; j++) {
            if (GET_DATA_BIT(lines, i))
                SET_DATA_BIT(lined, j);
            lines -= wpls;
        }
    }
In the following drawing, the arrows show the first line
scanned in the fast scan direction
in the source and dest images.  I always let the index i
label the dest rows; i is in the outer loop because it
is the slow scan direction.  The fast scan direction, j,
labels the dest columns and the source rows. 
 
 However, particularly for binary images that have mostly OFF pixels, the fast scan direction through the source image matters, because we can eliminate all computation related to any source byte or word that is 0. Therefore, we want the fast scan in the source to be horizontal (i.e., by rows); this requires the fast scan in the dest to be by columns:
    for (j = 0; j < wd; j++) {
        lined = datad;
        lines = datas + (wd - 1 - j) * wpls;
        for (i = 0; i < hd; i++) {
            if (GET_DATA_BIT(lines, i))
                SET_DATA_BIT(lined, j);
            lined += wpld;
        }
    }
This is illustrated as follows:
 
 To take advantage of situations where most pixels in binary images are OFF, we must scan through the source in units larger than bits so that we can easily test them. Two obvious choices are bytes and 32-bit words. With bytes, it takes more work to make a decision, but because of the smaller granularity, more bytes (in total) will be eliminated. However, with typical images, it is more efficient to use 32-bit chunks, and we do that as follows:
    nswords = hd / 32;
    for (j = 0; j < wd; j++) {
        lined = datad;
        lines = datas + (wd - 1 - j) * wpls;
        for (k = 0; k < nswords; k++) {
            word = lines[k];
            if (!word) {
                lined += 32 * wpld;
                continue;
            }
            else {
                iend = 32 * (k + 1);
                for (m = 0, i = 32 * k; i < iend; i++, m++) {
                    if ((word << m) & 0x80000000)
                        SET_DATA_BIT(lined, j);
		    lined += wpld;
                }
            }
        }
        for (i = 32 * nswords; i < hd; i++) {
            if (GET_DATA_BIT(lines, i))
                SET_DATA_BIT(lined, j);
            lined += wpld;
        }
    }
This is just an elaboration of the previous pixel-plucking code,
but it is typically about 4 times faster,
rotating each pixel in about 7 machine cycles on average.
We find the number of full 32-bit words in the source line,
and march over them.  Each source word is checked.  If 0 it is
skipped in its entirely; this alone reduces computation by
about a factor of 2.5.  Otherwise each of the 32 bits
is tested and, if 1, is set in the dest.  We get another reduction
factor of about 1.5 by shifting and masking the word, rather
than using GET_DATA_BIT(lines, i).  Finally, for each source
image line, a scan is made over any source bits not contained
in a full 32-bit word.
We can actually make the binary 90 degree rotation arbitrarily complicated. For example, to avoid pixel-plucking, by brute force you can write an inner loop with lookup tables to rotate each 8x8 block of pixels separately. Using 8 bit lookup tables, for each 8x8 block of pixels, you need to extract 8 bytes from the source, construct 8 addresses from these bytes, perform 8 table lookups (each one generating a 32 bit word corresponding to 4 of the 8 dest bytes), XOR the 32-bit words in groups of 4 to get the 8 rotated destination bytes in two 32-bit words, and then set these bytes in the proper location in the dest image. It's not at all evident that such shenanigans are worth the effort!
In summary, there are a few "angles" to rotation.  RSamp and RShear have
flexibility and speed.  RAM has accuracy, and for efficiency
requires specific implementations for each depth.  Fortunately,
RAM can be made surprisingly fast with relatively simple implementations.
The orthogonal rotations (90, 180, 270 degrees) have special
implementations for each depth.  The 180 rotation is a combination
of left-right and top-bottom flips, with the latter being trivial
and depth-independent.
