Skip to content

Commit ebe0c40

Browse files
committed
Improve performance of JSON HTML entity escaping
Running gsub! 5 times with string arguments seems to be faster than running it once with a regex and Hash. When there are matches to the regex (there are characters to escape) this is faster in part because CRuby will allocate a new match object and string as a key to lookup in the map hash provided. It's possible that could be optimized upstream, but at the moment this avoids those allocations. Surprisingly (at least to me) this is still much faster when there is no replacement needed: in my test ~3x faster on a short ~200 byte string, and ~5x faster on a pre-escaped ~600k twitter.json.
1 parent 807bd54 commit ebe0c40

File tree

3 files changed

+23
-26
lines changed

3 files changed

+23
-26
lines changed

actionview/test/template/erb_util_test.rb

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,15 @@ class ErbUtilTest < ActiveSupport::TestCase
1313
end
1414
end
1515

16-
ERB::Util::JSON_ESCAPE.each do |given, expected|
16+
{
17+
"&" => '\u0026',
18+
">" => '\u003e',
19+
"<" => '\u003c',
20+
"\u2028" => '\u2028',
21+
"\u2029" => '\u2029'
22+
}.each do |given, expected|
1723
define_method "test_json_escape_#{expected.gsub(/\W/, '')}" do
18-
assert_equal ERB::Util::JSON_ESCAPE[given], json_escape(given)
24+
assert_equal expected, json_escape(given)
1925
end
2026
end
2127

activesupport/lib/active_support/core_ext/erb/util.rb

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,7 @@ module ERBUtilPrivate
3838
class ERB
3939
module Util
4040
HTML_ESCAPE = { "&" => "&amp;", ">" => "&gt;", "<" => "&lt;", '"' => "&quot;", "'" => "&#39;" }
41-
JSON_ESCAPE = { "&" => '\u0026', ">" => '\u003e', "<" => '\u003c', "\u2028" => '\u2028', "\u2029" => '\u2029' }
4241
HTML_ESCAPE_ONCE_REGEXP = /["><']|&(?!([a-zA-Z]+|(#\d+)|(#[xX][\dA-Fa-f]+));)/
43-
JSON_ESCAPE_REGEXP = /[\u2028\u2029&><]/u
4442

4543
# Following XML requirements: https://www.w3.org/TR/REC-xml/#NT-Name
4644
TAG_NAME_START_CODEPOINTS = "@:A-Z_a-z\u{C0}-\u{D6}\u{D8}-\u{F6}\u{F8}-\u{2FF}\u{370}-\u{37D}\u{37F}-\u{1FFF}" \
@@ -124,7 +122,12 @@ def html_escape_once(s)
124122
# JSON gem, do not provide this kind of protection by default; also some gems
125123
# might override +to_json+ to bypass Active Support's encoder).
126124
def json_escape(s)
127-
result = s.to_s.gsub(JSON_ESCAPE_REGEXP, JSON_ESCAPE)
125+
result = s.to_s.dup
126+
result.gsub!(">", '\u003e')
127+
result.gsub!("<", '\u003c')
128+
result.gsub!("&", '\u0026')
129+
result.gsub!("\u2028", '\u2028')
130+
result.gsub!("\u2029", '\u2029')
128131
s.html_safe? ? result.html_safe : result
129132
end
130133

activesupport/lib/active_support/json/encoding.rb

Lines changed: 9 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -39,33 +39,21 @@ def encode(value)
3939
value = value.as_json(options.dup)
4040
end
4141
json = stringify(jsonify(value))
42+
43+
# Rails does more escaping than the JSON gem natively does (we
44+
# escape \u2028 and \u2029 and optionally >, <, & to work around
45+
# certain browser problems).
4246
if Encoding.escape_html_entities_in_json
43-
json.gsub! ESCAPE_REGEX_WITH_HTML_ENTITIES, ESCAPED_CHARS
44-
else
45-
json.gsub! ESCAPE_REGEX_WITHOUT_HTML_ENTITIES, ESCAPED_CHARS
47+
json.gsub!(">", '\u003e')
48+
json.gsub!("<", '\u003c')
49+
json.gsub!("&", '\u0026')
4650
end
51+
json.gsub!("\u2028", '\u2028')
52+
json.gsub!("\u2029", '\u2029')
4753
json
4854
end
4955

5056
private
51-
# Rails does more escaping than the JSON gem natively does (we
52-
# escape \u2028 and \u2029 and optionally >, <, & to work around
53-
# certain browser problems).
54-
ESCAPED_CHARS = {
55-
"\u2028" => '\u2028',
56-
"\u2029" => '\u2029',
57-
">" => '\u003e',
58-
"<" => '\u003c',
59-
"&" => '\u0026',
60-
}
61-
62-
ESCAPE_REGEX_WITH_HTML_ENTITIES = /[\u2028\u2029><&]/u
63-
ESCAPE_REGEX_WITHOUT_HTML_ENTITIES = /[\u2028\u2029]/u
64-
65-
# Mark these as private so we don't leak encoding-specific constructs
66-
private_constant :ESCAPED_CHARS, :ESCAPE_REGEX_WITH_HTML_ENTITIES,
67-
:ESCAPE_REGEX_WITHOUT_HTML_ENTITIES
68-
6957
# Convert an object into a "JSON-ready" representation composed of
7058
# primitives like Hash, Array, String, Symbol, Numeric,
7159
# and +true+/+false+/+nil+.

0 commit comments

Comments
 (0)