RFC: allow flexible or better binarization

Tesseract has always included its own, internal binarization – which is **not** based on Leptonica and is of rather bad quality (custom global Otsu implementation without normalization). Leptonica does have lots of nice adaptive local normalization and thresholding implementations, but they are not utilized.

Since Tesseract 4.0, _recognition_ does not (under normal circumstances) use that binarized image, but uses the greyscale-converted raw image. If the input image was already bitonal, then it still works. However, what the LSTM model expects depends on what data it was _trained_ on. (Original tessdata models were all trained on artificial/clean greyscale IIRC.)

But the binarized image is still needed for all _segmentation_ and layout analysis (OSD, separator detection, picture detection). And to make matters worse, segmentation **also** makes use of the greyscale image and threshold values from the binarizer – in order to get a better approximation/interpolation of blob outlines (`ComputeEdgeOffsets`). If the input image was already bitonal, then a fallback method is used which is not as accurate (`ComputeBinaryOffsets`).

Now the user (both CLI and API) is in a **dilemma**:
1. Present the originial/raw image:
   - Segmentation on some images may be suboptimal, because good binarization is hard.
   - Recognition might expect a (colour/contrast-) normalized or even binarized (e.g. from tesstrain) image and thus be suboptimal.
2. Present an externally binarized image: 
   - Segmentation on other images may be suboptimal, because the blob outlines are inaccurate.
   - Recognition might expect a greyscale image and thus be suboptimal.

So what do we do? Allow delegating to Leptonica's `pixContrastNorm`, `pixSauvolaBinarizeTiled` etc. methods via parameter variable? Or extend the API to allow passing the threshold values of an external binarization? How do we inform/document this and encapsulate an LSTM model's expectations?

There are many more aspects, but I just wanted to open the discussion.

If requested, I can provide example images (of bad segmentation due to internal Otsu or bad segmentation due to inaccurate interpolation from bitonal input; of recognition perplexed by background masked to white), as well as pointers to code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: allow flexible or better binarization #3083

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RFC: allow flexible or better binarization #3083

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions