Skip to content

Commit 7fb5bb8

Browse files
committed
Lift expensive Regex construction from DateFormat method body.
Constructing the Regex touched in this commit can represent a significant fraction (e.g. half or better) of the runtime of the DateFormat method touched in this commit. To make this DateFormat method more efficient, let's lift that Regex construction out of that method body.
1 parent a3c2798 commit 7fb5bb8

File tree

1 file changed

+31
-2
lines changed

1 file changed

+31
-2
lines changed

stdlib/Dates/src/io.jl

Lines changed: 31 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -332,6 +332,23 @@ const CONVERSION_TRANSLATIONS = IdDict{Type, Any}(
332332
Time => (Hour, Minute, Second, Millisecond, Microsecond, Nanosecond, AMPM),
333333
)
334334

335+
# The `DateFormat(format, locale)` method just below consumes the following Regex.
336+
# Constructing this Regex is fairly expensive; doing so in the method itself can
337+
# consume half or better of `DateFormat(format, locale)`'s runtime. So instead we
338+
# construct and cache it outside the method body. Note, however, that when
339+
# `keys(CONVERSION_SPECIFIERS)` changes, the cached Regex must be updated accordingly;
340+
# hence the mutability (Ref-ness) of the cache, the helper method with which to populate
341+
# the cache, the cache of the hash of `keys(CONVERSION_SPECIFIERS)` (to facilitate checking
342+
# for changes), and the lock (to maintain consistency of these objects across threads when
343+
# threads simultaneously modify `CONVERSION_SPECIFIERS` and construct `DateFormat`s).
344+
function compute_dateformat_regex(conversion_specifiers)
345+
letters = String(collect(keys(conversion_specifiers)))
346+
return Regex("(?<!\\\\)([\\Q$letters\\E])\\1*")
347+
end
348+
const DATEFORMAT_REGEX_LOCK = ReentrantLock()
349+
const DATEFORMAT_REGEX_HASH = Ref(hash(keys(CONVERSION_SPECIFIERS)))
350+
const DATEFORMAT_REGEX_CACHE = Ref(compute_dateformat_regex(CONVERSION_SPECIFIERS))
351+
335352
"""
336353
DateFormat(format::AbstractString, locale="english") -> DateFormat
337354
@@ -379,8 +396,20 @@ function DateFormat(f::AbstractString, locale::DateLocale=ENGLISH)
379396
prev = ()
380397
prev_offset = 1
381398

382-
letters = String(collect(keys(CONVERSION_SPECIFIERS)))
383-
for m in eachmatch(Regex("(?<!\\\\)([\\Q$letters\\E])\\1*"), f)
399+
# To understand this block, please see the comments attached to the definitions of
400+
# DATEFORMAT_REGEX_LOCK, DATEFORMAT_REGEX_HASH, and DATEFORMAT_REGEX_CACHE.
401+
lock(DATEFORMAT_REGEX_LOCK)
402+
try
403+
dateformat_regex_hash = hash(keys(CONVERSION_SPECIFIERS))
404+
if dateformat_regex_hash != DATEFORMAT_REGEX_HASH[]
405+
DATEFORMAT_REGEX_HASH[] = dateformat_regex_hash
406+
DATEFORMAT_REGEX_CACHE[] = compute_dateformat_regex(CONVERSION_SPECIFIERS)
407+
end
408+
finally
409+
unlock(DATEFORMAT_REGEX_LOCK)
410+
end
411+
412+
for m in eachmatch(DATEFORMAT_REGEX_CACHE[], f)
384413
tran = replace(f[prev_offset:prevind(f, m.offset)], r"\\(.)" => s"\1")
385414

386415
if !isempty(prev)

0 commit comments

Comments
 (0)