Skip to content

Commit 1009210

Browse files
committed
Make inference profiling thread safe for julia 1.9+
Starting in 1.9, julia does not hold the codegen lock for all of typeinference! This means that we cannot use a single shared scratch-space for profiling type inference. This PR changes the scratch space to use Task Local Storage, so that we build a separate inference profile tree per Task doing inference, and then report them to the global results buffer at the end.
1 parent ead079a commit 1009210

File tree

1 file changed

+24
-5
lines changed

1 file changed

+24
-5
lines changed

base/compiler/typeinfer.jl

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ being used for this purpose alone.
2323
module Timings
2424

2525
using Core.Compiler: -, +, :, >, Vector, length, first, empty!, push!, pop!, @inline,
26-
@inbounds, copy, backtrace
26+
@inbounds, copy, backtrace, IdDict, Task, Ref, get!
2727

2828
# What we record for any given frame we infer during type inference.
2929
struct InferenceFrameInfo
@@ -94,14 +94,30 @@ end
9494
# thread safe.
9595
const _finished_timings = Timing[]
9696

97-
# We keep a stack of the Timings for each of the MethodInstances currently being timed.
97+
# We store a profiling stack for *each Task* as a task-local-storage variable, _timings.
98+
# This is a stack of the Timings for each of the MethodInstances currently being timed.
9899
# Since type inference currently operates via a depth-first search (during abstract
99100
# evaluation), this vector operates like a call stack. The last node in _timings is the
100101
# node currently being inferred, and its parent is directly before it, etc.
101102
# Each Timing also contains its own vector for all of its children, so that the tree
102103
# call structure through type inference is recorded. (It's recorded as a tree, not a graph,
103104
# because we create a new node for duplicates.)
104-
const _timings = Timing[]
105+
# You will see this accessed below as `task_local_storage(:_timings)`
106+
107+
# ------- Task Local Storage -------
108+
# Reimplementation of Task Local Storage, since these functions aren't available yet
109+
# at this stage of bootstrapping.
110+
current_task() = ccall(:jl_get_current_task, Ref{Task}, ())
111+
task_local_storage() = get_task_tls(current_task())
112+
function get_task_tls(t::Task)
113+
if t.storage === nothing
114+
t.storage = IdDict()
115+
end
116+
return (t.storage)::IdDict{Any,Any}
117+
end
118+
# -------
119+
120+
tls_timings() = get!(task_local_storage(), :_timings, Vector{Timing}())
105121

106122
# ROOT() is an empty function used as the top-level Timing node to measure all time spent
107123
# *not* in type inference during a given recording trace. It is used as a "dummy" node.
@@ -145,6 +161,8 @@ end
145161
end
146162

147163
@inline function enter_new_timer(frame)
164+
_timings = tls_timings()
165+
148166
# Very first thing, stop the active timer: get the current time and add in the
149167
# time since it was last started to its aggregate exclusive time.
150168
if length(_timings) > 0
@@ -154,8 +172,7 @@ end
154172
# Start the new timer right before returning
155173
mi_info = _typeinf_identifier(frame)
156174
push!(_timings, Timing(mi_info, UInt64(0)))
157-
len = length(_timings)
158-
new_timer = @inbounds _timings[len]
175+
new_timer = @inbounds _timings[end]
159176

160177
# Set the current time _after_ appending the node, to try to exclude the
161178
# overhead from measurement.
@@ -170,6 +187,8 @@ end
170187
# assert that indeed we are always returning to a parent after finishing all of its
171188
# children (that is, asserting that inference proceeds via depth-first-search).
172189
@inline function exit_current_timer(_expected_frame_)
190+
_timings = tls_timings()
191+
173192
# Finish the new timer
174193
stop_time = _time_ns()
175194

0 commit comments

Comments
 (0)