Skip to content

Commit 9f1415a

Browse files
authored
Fix performance issue caused by using repeated > characters inside CDATA [ PAYLOAD ] (#172)
A `<` is treated as a string delimiter. In certain cases, if `<` is used in succession, read and match are repeated, which slows down the process. Therefore, the following is used to read ahead to a specific part of the string in advance.
1 parent c1b64c1 commit 9f1415a

File tree

2 files changed

+19
-1
lines changed

2 files changed

+19
-1
lines changed

lib/rexml/parsers/baseparser.rb

+2-1
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,7 @@ module Private
127127
INSTRUCTION_END = /#{NAME}(\s+.*?)?\?>/um
128128
INSTRUCTION_TERM = "?>"
129129
COMMENT_TERM = "-->"
130+
CDATA_TERM = "]]>"
130131
TAG_PATTERN = /((?>#{QNAME_STR}))\s*/um
131132
CLOSE_PATTERN = /(#{QNAME_STR})\s*>/um
132133
ATTLISTDECL_END = /\s+#{NAME}(?:#{ATTDEF})*\s*>/um
@@ -431,7 +432,7 @@ def pull_event
431432

432433
return [ :comment, md[1] ]
433434
else
434-
md = @source.match(/\[CDATA\[(.*?)\]\]>/um, true)
435+
md = @source.match(/\[CDATA\[(.*?)\]\]>/um, true, term: Private::CDATA_TERM)
435436
return [ :cdata, md[1] ] if md
436437
end
437438
raise REXML::ParseException.new( "Declarations can only occur "+

test/parse/test_cdata.rb

+17
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
require "test/unit"
2+
require "core_assertions"
3+
4+
require "rexml/document"
5+
6+
module REXMLTests
7+
class TestParseCData < Test::Unit::TestCase
8+
include Test::Unit::CoreAssertions
9+
10+
def test_gt_linear_performance
11+
seq = [10000, 50000, 100000, 150000, 200000]
12+
assert_linear_performance(seq, rehearsal: 10) do |n|
13+
REXML::Document.new('<description><![CDATA[ ' + ">" * n + ' ]]></description>')
14+
end
15+
end
16+
end
17+
end

0 commit comments

Comments
 (0)