Open
Description
Running lstmtraining for frk language with 50000 iterations terminated with an assertion.
$ lstmtraining -U /home/stweil/src/github/tesseract-ocr/tesseract/frk/train/frk.unicharset --script_dir ~/src/github/tesseract-ocr/langdata --net_spec '[1,36,0,1 Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256 O1c105]' --model_output /home/stweil/src/github/tesseract-ocr/tesseract/frk/output/base --train_listfile /home/stweil/src/github/tesseract-ocr/tesseract/frk/train/frk.training_files.txt --eval_listfile /home/stweil/src/github/tesseract-ocr/tesseract/frk/train/frk.training_files.txt --max_iterations 50000
...
At iteration 15778/49900/49900, Mean rms=0.37%, delta=0.112%, char train=0.381%, word train=1.506%, skip ratio=0%, wrote checkpoint.
At iteration 15788/50000/50000, Mean rms=0.363%, delta=0.104%, char train=0.346%, word train=1.387%, skip ratio=0%, wrote checkpoint.
Finished! Error rate = 0.26
num_docs > 0:Error:Assert failed:in file ../../../../ccstruct/imagedata.cpp, line 648
I used latest Tesseract sources, a slightly modified font list and a longer training text for frk training.
A previous run with 10000 iterations and nearly the same conditions did not raise the assertion:
...
2 Percent improvement time=807, best error was 3.911 @ 8211
At iteration 9018/10000/10000, Mean rms=0.835%, delta=0.465%, char train=1.729%, word train=6.095%, skip ratio=0%, New best char error = 1.729Deserialize failed wrote best model:/home/stweil/src/github/tesseract-ocr/tesseract/tutorial/frkoutput/base1.729_9018.lstm wrote checkpoint.
Finished! Error rate = 1.729