Skip to content

Output replaces HTML Entities with unicode literals #93

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
elidickinson opened this issue Nov 21, 2014 · 5 comments
Open

Output replaces HTML Entities with unicode literals #93

elidickinson opened this issue Nov 21, 2014 · 5 comments

Comments

@elidickinson
Copy link
Contributor

Running transform seems to translate HTML entities in the source into unicode literals. For example:
<p>&copy; &nbsp;&nbsp; 2014</p>
becomes
<p>©    2014</p>

This is causing issues for me and I'm guessing it's just a side effect of the lxml settings and not intentional. My understanding is that "&copy" has better email client compatibility as "©" (If anything I'd prefer an option to go the other way: escape any unicode literals in the source)

@elidickinson
Copy link
Contributor Author

OK, so it seems etree.tostring looks at the encoding to decide whether it can output something as a literal or has to escape it. So calling .transform(encoding='ascii') gets me:
<p>&#169; &#160;&#160; 2014</p>
Which is probably close enough for me. It'd be cool if there was a way to preserve the named entities though.

@elidickinson elidickinson changed the title Output replaces HTML Entities with unciode literals Output replaces HTML Entities with unicode literals Nov 21, 2014
@dandv
Copy link

dandv commented Jul 17, 2015

Running Copyright &copy; 2015 iDoRecall, Inc. through premailer.io, I saw &copy; converted to just "c", not the Unicode copyright symbol:

Copyright c 2015 iDoRecall, Inc.

Definitely need an option against that transformation.

@peterbe
Copy link
Owner

peterbe commented Jul 20, 2015

If there's a way to tell lxml to stay the heck out of trying to do that trick that would be great. And something we ought to do by default.

@1951FDG
Copy link

1951FDG commented Jan 20, 2016

Noticed that &rsquo; get converted as well as &quot;

@OrangeDog
Copy link
Contributor

a way to tell lxml

Maybe resolve_entities=False?

OrangeDog added a commit to OrangeDog/premailer that referenced this issue Jun 15, 2016
Passing `--encoding ascii` should fix issues peterbe#93, peterbe#100, peterbe#152 and peterbe#157.
OrangeDog added a commit to OrangeDog/premailer that referenced this issue Jun 15, 2016
Intended as an illustration of a solution to issues peterbe#72 and peterbe#130 (and also peterbe#93 et al.)
I assume this actually wants pushing into `transform` somewhere.
peterbe pushed a commit that referenced this issue Jul 11, 2016
* Allow setting encoding from command-line.

Passing `--encoding ascii` should fix issues #93, #100, #152 and #157.

* PEP8 fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants