[Leptonica Home Page]

Supplemental notes on using the library


Updated: Aug 23, 2022
Forever under construction

I/O: File read/write
Pix data structure
Rasterops
Scaling
Shear
Affine, projective and bilinear transforms
Binary morphology
Grayscale morphology
Block convolution
Connected components

First, read the README to get an overview of what is available and how to use it.

This supplements that information in particular areas.

  1. I/O: File read/write (ref: section 12 of the README)

    When building programs, it is often important to look at images. The function

    	    pixWrite(char *filename, PIX *pix, l_int32  format)
            
    writes an image to file. See any of the programs in the prog directory for examples.

    You can display this image with a variety of programs, such as xv (which scales the image automatically to fit on the screen) display (which displays at full resolution), gqview (which displays at full resolution and allows easy zooming), and gimp (which is set up for image manipulation and displays by default at low resolution). For programmatic display with xv, we provide a function

    	    pixDisplay(PIX  *pix, l_int32 x, l_int32 y)
    	
    which scales the image to fit on the screen if necessary and then displays it with the UL corner at (x, y).

    Images are read into memory from file using

    	    PIX   *pixRead(char *filename)
    	

    For the file formats that are supported (PNG, JFIF_JPEG, TIFF (various compressions), PNM and BMP), the extension (if any) is ignored and the format type is determined from the file itself.

    Regression test: prog/ioformats.c

  2. The Pix data structure

    The Pix data structure is the internal (memory) representation of images in this library. It is very simple, and is described in pix.h, along with some of the flags and other data structures that are associated with it. The field accessors for Pix, provided in pix1.c, should ALWAYS be used. The pixClone() function is used to get a new handle (pointer) to the same Pix data structure, without actually copying the image data to a new array. The pixDestroy(PIX **) function should be used on every handle you have -- see the comments in pix1.c at the pixClone() definition.

    Throughout we use these definitions:

    	   bpp      bits/pixel  (leptonica supports 1, 2, 4, 8, 16 and 32)
    	   ppi      pixels/inch  (resolution of image relative to original
    			  scanned page)
    	   src      source image in image processing operation
    	   dest     destination image in image processing operation
    	

    A Pix can also have a colormap, and we support a number of operations on colormaps in colormap.c. Colormapped images can have depths of 1, 2, 4 or 8 bpp. Where appropriate, functions will handle both colormapped and non-colormapped Pix. Functions that use interpolation, such as grayscale or color area-mapping rotation, will make a temporary image without the colormap, and use that to compute the dest Pix, which will then not have a colormap. It should be noted that except for in-place functions, the src Pix is never altered.

    Except for RGB images, all pixels in a Pix are packed (the pixels are represented as compactly as possible without compression). Each raster line is 32-bit aligned. See the comments in pix.h that describe the constraints and conventions for the image data representation. RGB images are packed into 32 bits, leaving 8 bits for an alpha channel that is not used.

  3. Rasterops

    A fundamental imaging operation, this is an operation that takes a rectangular region of one image and combines it with a rectangular region of a second image, using one of 12 boolean operations, and writing the result into the second image. The 12 operations between two images are described in detail in rop.c.

    There are also in-place rasterops, where a rectangular region of a single image is painted according to its (shifted) values. The in-place rasterops can be used to translate a full image, or a vertical or horizontal band of the image. The latter are used to shear the image; e.g., a horizontal shear is implemented by shifting full-width bands horizontally, as described in shear.c. With in-place rasterops, one must be careful not to overwrite data that will be used later.

    All rasterops operate on images of any depth, and they are automatically clipped to the respective images to avoid illegal reads and writes. They have a large number of uses, including a relatively fast implementation of binary morphology (for 1 bpp images). For examples and details, see also the writeup at http://www.leptonica.com/rasterops.html

  4. Scaling

    A large variety of efficient scaling functions can be found in scale.c, many of which are described in http://www.leptonica.com/scaling.html. The generic function, pixScale(), does the best job given the image type and the scaling factors. The best upscaling is typically done with linear interpolation, and the best downscaling is done either with a lowpass filter followed by subsamling, or by area mapping. The former is a fast anti-aliased approximation, particularly for small scaling factors (i.e., large downscaling). The area mapping method integrates with subpixel accuracy over the region of the src image that corresponds to each dest pixel.

    Some of the other fast scaling operations given in scale.c are:

    Special fast scaling on binary images is also available, and is useful for image analysis of scanned binary text. Examples are:

    Some scaling scripts:

    Regression test: prog/scaletest3.c

  5. Rotation

    Rotation seems mundane, but there are in fact a large number of ways of doing it, some of which are described in. http://www.leptonica.com/rotation.html.

    The top-level general rotator is pixRotate() in rotate.c. Here's the description from the source file:

            The general rotation pixRotate() does the best job for
            rotating about the image center.  For 1 bpp, it uses shear;
            for others, it uses either shear or area mapping.
            If requested, it expands the output image so that no pixels are
            lost in the rotation, and this can be done on multiple
            successive shears without expanding beyond the maximum
            necessary size.
            

    There are three other top-level rotation source files, each of which uses different methods for different purposes:

    Some rotation scripts:

    Regression tests:


  6. Shear

    Image shear is another special linear transform in the plane. It can be used to approximate a continuous rotation, using either 2 shears (for small angles) or 3 shears. Because it is implemented with rasterops, it is both very fast and it works for all depths. For its use in rotation, see http://www.leptonica.com/rotation.html.

    Shear can be performed either with src and dest, or in-place. The latter uses in-place rasterops. Vertical shear is used in the implementation of the skew angle finder. The definition of the shear transform is given in http://www.leptonica.com/affine.html.

    Some scripts using shear:


  7. Affine, projective and bilinear transforms

    Affine transforms are the most general linear transforms in a plane. They are specified by 3 corresponding points (i.e., 6 coefficients) in the two coordinate spaces. They can be implemented both in a pointwise fashion (with or without interpolation) and as a set of successive special linear transforms (translation, scaling, shear). We provide an example of the latter, but its use in applications is deprecated; in all situations you should use the pointwise transforms. See the code in affine.c for details.

    Projective and bilinear transforms are more general, nonlinear, 4-point transforms in the plane, and they are specified by 8 coefficients. The implementations are in projective.c and bilinear.c, respectively. Whereas affine transforms keep straight lines straight and preserve parallel lines, projective transforms only keep straight lines straight. And bilinear transforms do not even preserve straight lines. Affine transforms project a 3-dimensional scene onto a plane at infinity, whereas projective transforms view the 3-D scene at a finite distance, so that lines that are parallel in the affine transform all meet at a 'vanishing point' in the projective transform. For example, projective transforms can remove "keystoning" in an object imaged by a camera at close range. See http://www.leptonica.com/affine.html for details.

    Some scripts using 3- and 4-point transforms:


  8. Binary morphology

    Bin morph ...

  9. Grayscale morphology

    Bin morph ...

  10. Block convolution

    Block convolution is my term for a convolution, applied to a grayscale image, using a rectangular kernel with constant value. For this case, the so-called "integral image" formulation can be used to compute the convolution in a time that is independent of the size of the convolving kernel. To do this, it is necessary to precompute an accumulation matrix from which each value in the dest can be computed by adding (or subtracting) four entries in the matrix. For details, see range. See http://www.leptonica.com/convolution.html. Using the same technique, it is also possible to apply a rank order filter with a rectangular kernel to binary images, again in a time independent of the size of the kernel.

  11. Connected components

    Conn comps ...



[Leptonica Home Page]

Creative Commons License
This documentation is licensed by Dan Bloomberg under a Creative Commons Attribution 3.0 United States License.

© Copyright 2001-2023, Leptonica