Dropout layers for Tesseract

### Your Feature Request

I am trying to implement the feature of dropout layers for Tesseract. For now, the hope is to enable something like, say, "Dr0.2" or so to the [VGSLSpecs](https://tesseract-ocr.github.io/tessdoc/tess4/VGSLSpecs.html) syntax. I implemented some of the code, but have encountered a few issues, and I figure this may be the place for discussion.
1. The files I have edited are

```
 Changes to be committed:
   (use "git restore --staged <file>..." to unstage)
 	new file:   ../src/lstm/dropout.cpp
 	new file:   ../src/lstm/dropout.h
 
 Changes not staged for commit:
   (use "git add <file>..." to update what will be committed)
   (use "git restore <file>..." to discard changes in working directory)
 	modified:   ../Makefile.am
 	modified:   ../configure.ac (for my own environment and irrelevant to the new dropout feature)
 	modified:   ../src/lstm/fullyconnected.cpp
 	modified:   ../src/lstm/network.cpp
 	modified:   ../src/lstm/network.h
 	modified:   ../src/training/common/networkbuilder.cpp
 	modified:   ../src/training/common/networkbuilder.h
```

2. The code compiles but cannot run

```
  ~/Documents/OCR/tesstrain_units_6 (main*) » make training
  make[1]: Entering directory '~/Documents/OCR/tesstrain_units_6'
  ~/Documents/OCR/tesseract_dr/build/combine_lang_model \
 	--input_unicharset data/units/unicharset \
 	--script_dir data/langdata \
 	--numbers data/units/units.numbers \
 	--puncs data/units/units.punc \
 	--words data/units/units.wordlist \
 	--output_dir data \
 	 \
 	--lang units
  dyld[91402]: symbol not found in flat namespace '__ZN9tesseract7Network11DeSerializeEPNS_5TFileE'
  make[1]: *** [dr_training.mk:40: data/units/units.traineddata] Abort trap: 6
  make[1]: Leaving directory '~/Documents/OCR/tesstrain_units_6'
  make: *** [Makefile:17: training] Error 2
```

This is not surprising, as I am sure there are additional and essential modifications needed on other parts of the codebase.

3. It is obvious that I need to be able to disable the dropout feature for the deployed `.trainedmodel`s, for which I may need to further modify `network.cpp`. I need to ask the community about the best practice in terms of adding the new flag or switch for this purpose.

4. Ideally, I want to, when continuing training from a checkpoint, be able to adjust the dropout rate(s) to a different value(s), including setting it/them to 0 (perhaps when the training is converging). There is probably more than one way to do it, but I want to ask the community for the best practice.

5. Let me know when you want to go over my already implemented modifications (that do not work yet).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dropout layers for Tesseract #4252

Your Feature Request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dropout layers for Tesseract #4252

Description

Your Feature Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions