[Leptonica Home Page]
Border Representations of Connected Components
Updated: Aug 31, 2006
Overview of connected component borders
The generation of connected component (c.c.) borders is a relatively
simple application in Leptonica. There are many rendering
machines that can convert outlines to binary images.
PostScript and other page description languages use outline
fonts for scalable rendering, to get smooth pixellated curves
at any output device resolution. There are several
different font formats in use, such as the original PostScript
outline fonts, Microsoft's TrueType fonts, and the open
source FreeType fonts.
Borders of connected components can be generated by following
either the border pixels or the "cracks" between border pixels
of different colors. The first type are called chain codes;
the second are crack codes. It is relatively simple
to convert crack codes to chain codes, but we will use
chain codes throughout. Borders must be followed consistently
so that the inside of the c.c. is on either the left
or right. We choose having the inside on the right side,
so that outer borders are traversed cw and inner (hole) borders
are traversed ccw.
After the chain code for the border(s) of a c.c. are determined,
the following constructions are of interest:
Let's look at each of the operations in some detail. The
source is in ccbord.c and ccbord.h, which
should be consulted for further details.
- Finding a compressed, serialized format for writing to file.
- Linking the outer border with all inner borders to form
a single, connected chain for each c.c.
- Rendering the outline as the original raster.
Generating chain codes for connected components
We first find the bitmaps and bounding boxes for each 8-connected
component. The border of an 8-connected component is represented
by an 8-connected chain of pixels.
Then for each c.c., the outer border is found by
first adding a one-pixel border of OFF pixels to the PIX
for the c.c., finding a first border pixel in the fg (i.e., an
ON pixel next to an OFF pixel), and sequentially finding
the next pixel on the border. We use a method described by
Rosenfeld and Kak in Digital Picture Processing, Vol 2,
pp. 219ff, Academic Press, 1982. The next pixel in an
8-connected border is found by looking in the 8 directions
from the current pixel, starting with the pixel next in
cw rotation from the previous border pixel. The search
sweeps around until an ON pixel is found. The chain is
terminated after the first pixel is reached again AND the
next pixel would be the second pixel in the chain. This
is necessary because pixels can be traversed on the chain
multiple times (up to 4 times; consider a 3x3 plus sign).
Remember, the background to 8-connected foreground is 4-connected.
The holes are found as follows. We take each minimum-sized
image of an 8 c.c., and use 4-filling from the border. This
fill stops when you hit the outside of the c.c.
Then inverting the result,
you get an image of the holes (if any) as 4-connected fg objects.
Run the c.c. finder looking for these 4-connected objects.
Then for each hole, the
hole border in the fg of the original c.c. is found by
first finding a pixel within the
hole, then searching in the original c.c. for a border pixel,
and then following the border to generate the chain code. Doing
it this way, we are guaranteed to find all holes within
components, and not to run into trouble in very complicated
situations where there can be components within holes
within components, etc, nested to any arbitrary depth.
Serializing the data for a compressed file format
We show a simple way to do this, that gives a typical
compression on text images that is about 25% better on
borders of connected components than using
png on the original image. If there are halftones,
you can expect worse compression than png. These
outline representations should generally not be used
on halftones, but they work well with text and unstippled
The chain code can be represented as a sequence of directions
for each step. There are 8 directions, so this requires 3 bits
of information. Pack 2 steps into each byte. You can do
better, putting 8 steps into 3 bytes, but gzip, which works
on bytes, will be less successful at compressing such data,
and we use gzip on the sequential step data. For each
c.c. you also need to store the UL corner coordinates,
and for each border (both outer and hole) you need to store
the coordinates of the starting pixel. We also store the
width and height of each c.c., but this is not necessary
because it can be determined from the outer border.
This data is collected in memory, using the byte buffer
utility in bbuffer.c. It is then compressed in memory
with gzip and written out to file. It can be read back,
gunzip'd in memory, and parsed, with the construction of
the CCBORDA data structure.
There are other ways to do this. For example,
we can store run-lengths at each direction rather than
single steps. We can compress using a Huffman code
on run-lengths, rather than a universal code on bytes
composed of 2-step direction codes. I adopted the
approach here because it's easy to implement using the
byte buffer utility and the in-memory gzip functions.
Forming a single border for each connected component
Kris Popat suggested that it is useful to connect the hole
borders with the outer border, so that there is a single
border for each c.c. (In fact, this whole chain code
representation is Kris's suggestion.) To avoid having
artifacts generated by the rendering machine,
each cut path between inner and
outer borders is traversed twice, in opposite directions,
along exactly the same pixel path. The renderers are usually
smart enough to figure out what is inside and what is outside.
The implementation is most simple if we find a path from
each hole border to the outer border, making sure that
it is fully contained within the fg of the c.c.
We use again a bitmap of the c.c. under study.
For each hole, we start at the center of the hole, making
sure it is an OFF pixel, and march in one of 4 directions
until we find an ON pixel, which must be on the hole border.
We then continue in the same direction until we hit either
an OFF pixel or an ON pixel at the edge of the bitmap.
Check if the last ON pixel is on the outer border of the c.c.
If it is, we have our cut path; if not, choose another of
the four directions and repeat. This way we have four
chances. Because the application is for text, where the
most likely joining of characters is horizontal, we pick
the first two directions vertically and the last two
horizontally. If we fail in all four directions, we just lose a hole.
Then, once a cut path is located for each hole, we start
on the outer border. When we come to a cut path, we
take it, go around the hole, come back on the cut path
to the outer border, and continue there. After the
outer border has been completely traversed, we are finished.
For completeness, I note that we offer the following
representations of the chain code in the CCBORDA
- Separate outer and hole borders, in local coords in the c.c.
- Separate outer and hole borders, in global coords
- Separate outer and hole borders, in step (direction) codes
- Single border with cut paths, in local coords in the c.c.
- Single border with cut paths, in global coords.
For this, you can have either all the points on the border,
or just the turning points, so that a straight line
between turning points intersects all the actual border pixels.
For a typical image, using turning points reduces the
number of stored points by a factor of about 2.
Rendering the outline as a filled raster
The standard method for filling outlines is scan line
conversion. This sweeps a line across the image, noting
when it goes through end points of oriented line segments, and
keeping track of how many lines (and, optionally, their orientations)
that it cuts.
You can understand how this works intuitively
by tracking across just one raster line, starting at the left edge.
Suppose you start in bg. The first line you cross will be
oriented up, and after crossing it you will be in fg. (This
follows our convention that inside is on the right
as you traverse the path.) For
simple shapes, each line you cross will be oriented oppositely
to the preceeding line, and you will toggle from fg <--> bg.
In the general case there can be any number of lines oriented up
or down, and a rule is needed to determine what to do when
crossing. There are two common rules:
For simple shapes, these two rules give the same result, but
for complex paths they will differ. Using the nonzero winding
number rule, if you pass 2 lines both oriented up, the winding
number is 2; you are still in fg and will remain there until
the winding number has gone back to 0.
Scan line conversion has not yet been implemented in Leptonica.
- Nonzero winding number rule. Sum the crossings,
using +1 if the path crosses oriented up and -1 if it
is going down. If the sum is 0, you are in bg;
otherwise, you are in fg.
- Even-odd rule. Sum the crossings, independent
of path orientation. If the sum is even, you are in bg;
otherwise, you are in fg.
We provide instead two topological algorithms for filling
the outlines, which are represented as separate outer and
hole borders for each c.c. These are the functions
ccbaDisplayImage1() and ccbaDisplayImage2().
Both algorithms are described in some detail at the top
of ccbord.c. The algorithms are very similar, but
Method 2 is a little simpler, so I describe it here.
Each 8-connected component is filled separately, and then
rasterop'd into the destination image. For each c.c.,
This is relatively fast, requiring only seedfill and rasterop
as the basic bit-parallel functions.
- Make a PIX with the border pixels in the c.c. in
the fg, and with an added 1 pixel border of bg
pixels. If w and h are the width and height of the
c.c., the PIX has width w+2 and height h+2.
This is the clipping mask.
- Make a seed image of the same size, and for each border
(both outer and holes) put one seed pixel outside
the border. "Outside" is determined by our right-hand
convention for borders. The 1 pixel border was added
to each image to simplify the procedure for finding
a seed outside the outer border of the c.c.; namely, by
guaranteeing that those pixels are accessible as seeds.
- Fill the seed image, using the border pixels as a clipping
mask to stop each fill at the borders. This is done
with pixSeedfillBinary(), inverting the clipping
mask to make it a filling mask, and using 4-connected
filling for the bg. We have now filled both the holes
and the region outside the outer border of the c.c.
- Invert the seed image, to get the properly filled c.c,
still centered in the oversize (by 2 pixels) PIX.
- Rasterop using XOR the filled c.c. (but not including
the outer 1 pixel boundary) into the full dest image.
Putting it together
You can run all these operations with prog/ccbordtest.c.
Let R be the ratio of 8-c.c. to 4-c.c. For typical text, R is
very close to 1. However, if halftones are present, R can
be much smaller than 1.
The following three page images were used to test different aspects of
the border finding and rendering routines:
- feyn.png. This is a typical scan of a text
page, with R = 4305/4452. Many of
the c.c. are multiple characters within a word that
have been joined.
- witten.png. This is a high quality scanned
page, with R = 4972/5208. It has very few touching
characters. However, it has
a large stippled "O"
that has many small
holes, and our algorithm for joining interior hole
borders with the outer border misses three of them.
- rabi.png. This has a large, topologically
complicated halftone pattern on the page. One of the
c.c. in the halftone contains 351 holes!
Most of the c.c. in the page are in the halftone:
the ratio R = 21748/98981 is much less than 1.
To reduce output when the outer and hole borders are
joined into a single path, we set a maximum number of
holes for any c.c, and if this number is exceeded, we assume the
component is not text and no hole borders are connected.
It will consequently be rasterized (filled) without holes.
(By contrast, the topological renderer, which uses separate hole
and outer borders, rasterizes the image correctly,
regardless of the number of holes.)
[Leptonica Home Page]
Leptonica by Dan Bloomberg is
licensed under a Creative Commons
Attribution 3.0 United States License.