-
Notifications
You must be signed in to change notification settings - Fork 188
Output replaces HTML Entities with unicode literals #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
OK, so it seems etree.tostring looks at the encoding to decide whether it can output something as a literal or has to escape it. So calling |
Running
Definitely need an option against that transformation. |
If there's a way to tell lxml to stay the heck out of trying to do that trick that would be great. And something we ought to do by default. |
Noticed that |
Maybe |
Passing `--encoding ascii` should fix issues peterbe#93, peterbe#100, peterbe#152 and peterbe#157.
Intended as an illustration of a solution to issues peterbe#72 and peterbe#130 (and also peterbe#93 et al.) I assume this actually wants pushing into `transform` somewhere.
Running transform seems to translate HTML entities in the source into unicode literals. For example:
<p>© 2014</p>
becomes
<p>© 2014</p>
This is causing issues for me and I'm guessing it's just a side effect of the lxml settings and not intentional. My understanding is that "©" has better email client compatibility as "©" (If anything I'd prefer an option to go the other way: escape any unicode literals in the source)
The text was updated successfully, but these errors were encountered: