Not able to Train Assamese language #144

Alok31555 · 2025-05-14T12:54:13Z

I am trying to train the Tesseract OCR engine to recognize the Assamese language. I have created the following files:

.tif – image of the text
.txt – text file with correct text
.box – box file with character positions

I followed the training steps, but the model does not work well. The accuracy is very low. It makes a lot of mistakes, even on the training image.
I am training with 8000 text samples. I’m unsure if this amount of data is sufficient or if I need to add more data to improve the model’s accuracy.

Can someone help me understand what went wrong?
I want to improve the accuracy and make the model work for Assamese.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Not able to Train Assamese language #144

Not able to Train Assamese language #144

Alok31555 commented May 14, 2025

Not able to Train Assamese language #144

Not able to Train Assamese language #144

Comments

Alok31555 commented May 14, 2025