Skip to content

Not able to Train Assamese language #144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Alok31555 opened this issue May 14, 2025 · 0 comments
Open

Not able to Train Assamese language #144

Alok31555 opened this issue May 14, 2025 · 0 comments

Comments

@Alok31555
Copy link

I am trying to train the Tesseract OCR engine to recognize the Assamese language. I have created the following files:

  • .tif – image of the text
  • .txt – text file with correct text
  • .box – box file with character positions

I followed the training steps, but the model does not work well. The accuracy is very low. It makes a lot of mistakes, even on the training image.
I am training with 8000 text samples. I’m unsure if this amount of data is sufficient or if I need to add more data to improve the model’s accuracy.

Can someone help me understand what went wrong?
I want to improve the accuracy and make the model work for Assamese.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant