Open
Description
Hi,@gemma team
I’m working on a biomedical/medical domain fine-tuning dataset converted from PDFs to Markdown. The content contains:
• Images (medical diagrams, charts)
• Complex tables (clinical data)
• Mathematical formulas (drug dosage equations, statistical models)
Could the Gemma team advise:
1. Best practices for handling multimodal elements in Markdown format during dataset preparation?
2. Recommended preprocessing steps for preserving semantic relationships between text & non-text elements?
3. Any existing tutorials/documentation for similar technical/academic domain fine-tuning?
Any help would be greatly appreciated!
Yang
Metadata
Metadata
Assignees
Labels
No labels