Skip to content
This repository was archived by the owner on Nov 11, 2018. It is now read-only.

parse5 serialization on text nodes and removal of some htmlEntities conversion #107

Merged
merged 2 commits into from
Jan 27, 2015

Conversation

9802-old
Copy link
Contributor

closes #100


var dummyNode = parser.parse(' ');
dummyNode.childNodes = [dom];
var contents = serializer.serialize(dummyNode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't using the new parse5 serialization option to not escape the string as I expected. Wouldn't that remove the need for your string replacement below?

@timaeudg
Copy link
Contributor

The option is enabled by default, and with regards to the string replacements, for single quotes, and double quotes, it's somewhat impossible for anyone really to know if they should be converted.

Since the parser of parse5 converts escape sequences to what they are meant to represent, that means that:

', ', and &(r/l)squote; all map to ' in text

This means, that if we see ' in text, we have no idea which it could be, and what it should be, since all 3 forms are valid (additionally, what if the users didn't want a fancy single quote)

There is no spec for whether they must be escaped, so, the serializer doesn't convert them. Thus, if we want to escape them, we must do it ourselves, if you are saying that we shouldn't escape them; by all means, I see no problem with this, but we can't read the user's mind so long as the parser is doing what it wants to the document

@timaeudg timaeudg merged commit a8e31bb into master Jan 27, 2015
@timaeudg timaeudg deleted the parse5-update-changes branch May 19, 2015 10:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stop decoding HTML entities when serializing HTML
3 participants