16 lines
829 B
Plaintext
16 lines
829 B
Plaintext
Page Layout Detection Tools is a project aiming to automate the layout
|
|
detection in scanned page images. This task is a necessary step in OCR
|
|
processing. One would like to detect the orientation of the text, to
|
|
determine the text bounding box(es) for the text and graphics, to deskew
|
|
the page images if necessary, and to remove scanning artifacts (dirt,
|
|
speckles, shadows).
|
|
|
|
The entire code will be distributed under the conditions of the GPL.
|
|
|
|
The initial implementation works with black/white images in TIFF or PBM
|
|
format. The first application in the project is a program to determine
|
|
the skew angle for text. This is performed using an original algorithm
|
|
based on a fast implementation of the Radon transform. (The fast Radon
|
|
code was received from an anonymous contributor who has allowed us to
|
|
publish the code under GPL.)
|