-
Notifications
You must be signed in to change notification settings - Fork 3.1k
feat: render math equations in .docx documents #1160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@microsoft-github-policy-service agree |
@afourney FYI: Since issue #289 is waiting unresolved for a long time I went ahead and worked on this. There is another PR created for this purpose but it went inactive for a longer period. Also, the I took a different path to resolve math equations (with all python approach), which are being used by a couple of major python document transformation libs. |
Thanks for this! That is correct. There was a big refactor from 0.0.x to 0.1.x. I will kick off the CI tests tonight, and try to test/review tomorrow. This is important feature. Let's also invite/credit the original PR author to review. It's not their fault I haven't merge it yet... it's just a matter of timing. |
Sure, happy to have more eyes on this to make sure everything looks good. FYI @marromlam But just to be clear, this approach used python for rendering, the other PR worked on this issue used |
First of all, this is super cool. And I'm inclined to merge it ASAP. But the adapted code is under the Apache 2 license: https://github.com/xiilei/dwml/blob/master/LICENSE I need to figure out what I need to do to distribute a modification here (until now, we've not hosted any 3rd party-derived content directly). Probably I need to add another acknowledgments file or something. Let me look into it. |
@sathinduga Can you please fill in the TODO in this file: https://gist.github.com/afourney/4ae6af3d5b3aaf329705d04c6cf182b4 and add it to the root of the MarkItDown repo? I tried, but do not have permissions to push to your fork. Once that's in place, I think we're good to go. |
@afourney added as a .md file. let me know if you prefer it to be .txt. |
Looks good to me. I want to do a little more testing before merge -- as soon as I get a chance -- but I think it's basically all good otherwise. |
Reformatted following with black, because of
Added reformatting modifications to the ThirdPartyNotices. |
Merged! Thanks for the contribution. |
Nice approach @sathinduga! However this would require a lot of effort to get it to convert as many formulae as mine. For example with this docx:
|
This PR address the issue #289: Add support for mathematical formulas in DOCX conversion.
Before this work, if we try to convert a .docx file with math equations, it was rendering as blank spaces. This is a concerning issue when working with engineering, mathematics, and other scientific documents.
With this update, we will convert the OMML math equations present in .docx document to LaTeX and wrap them with $ / $$ accordingly to represent equations in markdown format (for both inline and block equations).
This is done by pre-processing the .docx document before sending it to mammoth to get the html. I created this in a way to add any other pre-processing steps in the future if needed.
Test case to validate equations also included.
Special acknowledgment to xiilei/dwml for the initial work on OMML rendering to LaTeX.