Skip to content

Commit 910e5a2

Browse files
authored
Fix performance issue caused by using repeated > characters inside <xml><!-- --></xml> (#177)
A `<` is treated as a string delimiter. In certain cases, if `<` is used in succession, read and match are repeated, which slows down the process. Therefore, the following is used to read ahead to a specific part of the string in advance.
1 parent 1f1e6e9 commit 910e5a2

File tree

2 files changed

+8
-1
lines changed

2 files changed

+8
-1
lines changed

lib/rexml/parsers/baseparser.rb

+1-1
Original file line numberDiff line numberDiff line change
@@ -430,7 +430,7 @@ def pull_event
430430
#STDERR.puts "SOURCE BUFFER = #{source.buffer}, #{source.buffer.size}"
431431
raise REXML::ParseException.new("Malformed node", @source) unless md
432432
if md[0][0] == ?-
433-
md = @source.match(/--(.*?)-->/um, true)
433+
md = @source.match(/--(.*?)-->/um, true, term: Private::COMMENT_TERM)
434434

435435
if md.nil? || /--|-\z/.match?(md[1])
436436
raise REXML::ParseException.new("Malformed comment", @source)

test/parse/test_comment.rb

+7
Original file line numberDiff line numberDiff line change
@@ -128,5 +128,12 @@ def test_gt_linear_performance
128128
REXML::Document.new('<!-- ' + ">" * n + ' -->')
129129
end
130130
end
131+
132+
def test_gt_linear_performance_in_element
133+
seq = [10000, 50000, 100000, 150000, 200000]
134+
assert_linear_performance(seq, rehearsal: 10) do |n|
135+
REXML::Document.new('<xml><!-- ' + '>' * n + ' --></xml>')
136+
end
137+
end
131138
end
132139
end

0 commit comments

Comments
 (0)