Skip to content

Commit 2bf4750

Browse files
authored
implement "engine" for managing inference/codegen (#54816)
Continuing from previous PRs to making CodeInstance the primary means of tracking compilation, this introduces an "engine" which keeps track externally of whether a particular inference result is in progress and where. At present, this handles unexpected cycles by permitting both threads to work on it. This is likely to be optimal most of the time currently, until we have the ability to do work-stealing of the results. To assist with that, CodeInstance is now primarily allocated by `jl_engine_reserve`, which also tracks that this is being currently inferred. This creates a sort of per-(MI,owner) tuple lock mechanism, which can be used with the double-check pattern to see if inference was completed while waiting on that. The `world` value is not included since that is inferred later, so there is a possibility that a thread waits only to discover that the result was already invalid before it could use it (though this should be unlikely). The process then can notify when it has finished and wants to release the reservation lock on that identity pair. When doing so, it may also provide source code, allowing the process to potentially begin a threadpool to compile that result while the main thread continues with the job of inference. Includes fix for #53434, by ensuring SOURCE_MODE_ABI results in the item going into the global cache. Fixes #53433, as inInference is computed by the engine and protected by a lock, which also fixes #53680.
1 parent 323e725 commit 2bf4750

19 files changed

+458
-283
lines changed

base/compiler/typeinfer.jl

Lines changed: 191 additions & 159 deletions
Large diffs are not rendered by default.

base/compiler/types.jl

Lines changed: 8 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ mutable struct InferenceResult
8989
effects::Effects # if optimization is finished
9090
analysis_results::AnalysisResults # AnalysisResults with e.g. result::ArgEscapeCache if optimized, otherwise NULL_ANALYSIS_RESULTS
9191
is_src_volatile::Bool # `src` has been cached globally as the compressed format already, allowing `src` to be used destructively
92-
ci::CodeInstance # CodeInstance if this result has been added to the cache
92+
ci::CodeInstance # CodeInstance if this result may be added to the cache
9393
function InferenceResult(mi::MethodInstance, argtypes::Vector{Any}, overridden_by_const::Union{Nothing,BitVector})
9494
return new(mi, argtypes, overridden_by_const, nothing, nothing, nothing,
9595
WorldRange(), Effects(), Effects(), NULL_ANALYSIS_RESULTS, false)
@@ -402,36 +402,15 @@ get_inference_world(interp::NativeInterpreter) = interp.world
402402
get_inference_cache(interp::NativeInterpreter) = interp.inf_cache
403403
cache_owner(interp::NativeInterpreter) = nothing
404404

405-
"""
406-
already_inferred_quick_test(::AbstractInterpreter, ::MethodInstance)
405+
engine_reserve(interp::AbstractInterpreter, mi::MethodInstance) = engine_reserve(mi, cache_owner(interp))
406+
engine_reserve(mi::MethodInstance, @nospecialize owner) = ccall(:jl_engine_reserve, Any, (Any, Any), mi, owner)::CodeInstance
407+
# engine_fulfill(::AbstractInterpreter, ci::CodeInstance, src::CodeInfo) = ccall(:jl_engine_fulfill, Cvoid, (Any, Any), ci, src) # currently the same as engine_reject, so just use that one
408+
engine_reject(::AbstractInterpreter, ci::CodeInstance) = ccall(:jl_engine_fulfill, Cvoid, (Any, Ptr{Cvoid}), ci, C_NULL)
407409

408-
For the `NativeInterpreter`, we don't need to do an actual cache query to know if something
409-
was already inferred. If we reach this point, but the inference flag has been turned off,
410-
then it's in the cache. This is purely for a performance optimization.
411-
"""
412-
already_inferred_quick_test(interp::NativeInterpreter, mi::MethodInstance) = !mi.inInference
413-
already_inferred_quick_test(interp::AbstractInterpreter, mi::MethodInstance) = false
414410

415-
"""
416-
lock_mi_inference(::AbstractInterpreter, mi::MethodInstance)
417-
418-
Hint that `mi` is in inference to help accelerate bootstrapping.
419-
This is particularly used by `NativeInterpreter` and helps us limit the amount of wasted
420-
work we might do when inference is working on initially inferring itself by letting us
421-
detect when inference is already in progress and not running a second copy on it.
422-
This creates a data-race, but the entry point into this code from C (`jl_type_infer`)
423-
already includes detection and restriction on recursion, so it is hopefully mostly a
424-
benign problem, since it should really only happen during the first phase of bootstrapping
425-
that we encounter this flag.
426-
"""
427-
lock_mi_inference(::NativeInterpreter, mi::MethodInstance) = (mi.inInference = true; nothing)
428-
lock_mi_inference(::AbstractInterpreter, ::MethodInstance) = return
429-
430-
"""
431-
See `lock_mi_inference`.
432-
"""
433-
unlock_mi_inference(::NativeInterpreter, mi::MethodInstance) = (mi.inInference = false; nothing)
434-
unlock_mi_inference(::AbstractInterpreter, ::MethodInstance) = return
411+
function already_inferred_quick_test end
412+
function lock_mi_inference end
413+
function unlock_mi_inference end
435414

436415
"""
437416
add_remark!(::AbstractInterpreter, sv::InferenceState, remark)

doc/src/devdocs/locks.md

Lines changed: 2 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -71,19 +71,8 @@ The following is a level 5 lock
7171
7272
The following are a level 6 lock, which can only recurse to acquire locks at lower levels:
7373

74-
> * codegen
7574
> * jl_modules_mutex
7675
77-
The following is an almost root lock (level end-1), meaning only the root look may be held when
78-
trying to acquire it:
79-
80-
> * typeinf
81-
>
82-
> > this one is perhaps one of the most tricky ones, since type-inference can be invoked from many
83-
> > points
84-
> >
85-
> > currently the lock is merged with the codegen lock, since they call each other recursively
86-
8776
The following lock synchronizes IO operation. Be aware that doing any I/O (for example,
8877
printing warning messages or debug information) while holding any other lock listed above
8978
may result in pernicious and hard-to-find deadlocks. BE VERY CAREFUL!
@@ -149,34 +138,17 @@ Module serializer : toplevel lock
149138

150139
JIT & type-inference : codegen lock
151140

152-
MethodInstance/CodeInstance updates : Method->writelock, codegen lock
141+
MethodInstance/CodeInstance updates : Method->writelock
153142

154143
> * These are set at construction and immutable:
155144
> * specTypes
156145
> * sparam_vals
157146
> * def
158147
> * owner
159148
160-
> * These are set by `jl_type_infer` (while holding codegen lock):
161-
> * cache
162-
> * rettype
163-
> * inferred
164-
* valid ages
165-
166-
> * `inInference` flag:
167-
> * optimization to quickly avoid recurring into `jl_type_infer` while it is already running
168-
> * actual state (of setting `inferred`, then `fptr`) is protected by codegen lock
169-
170149
> * Function pointers:
171-
> * these transition once, from `NULL` to a value, while the codegen lock is held
172-
>
173-
> * Code-generator cache (the contents of `functionObjectsDecls`):
174-
> * these can transition multiple times, but only while the codegen lock is held
175-
> * it is valid to use old version of this, or block for new versions of this, so races are benign,
176-
> as long as the code is careful not to reference other data in the method instance (such as `rettype`)
177-
> and assume it is coordinated, unless also holding the codegen lock
150+
> * these transition once, from `NULL` to a value, which is coordinated internal to the JIT
178151
>
179-
LLVMContext : codegen lock
180152
181153
Method : Method->writelock
182154

src/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ SRCS := \
4646
simplevector runtime_intrinsics precompile jloptions mtarraylist \
4747
threading scheduler stackwalk gc gc-debug gc-pages gc-stacks gc-alloc-profiler gc-page-profiler method \
4848
jlapi signal-handling safepoint timing subtype rtutils gc-heap-snapshot \
49-
crc32c APInt-C processor ircode opaque_closure codegen-stubs coverage runtime_ccall
49+
crc32c APInt-C processor ircode opaque_closure codegen-stubs coverage runtime_ccall engine
5050

5151
RT_LLVMLINK :=
5252
CG_LLVMLINK :=

src/aotcompile.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -307,7 +307,7 @@ jl_code_instance_t *jl_ci_cache_lookup(const jl_cgparams_t &cgparams, jl_method_
307307
jl_error("Refusing to automatically run type inference with custom cache lookup.");
308308
}
309309
else {
310-
codeinst = jl_type_infer(mi, world, 0, SOURCE_MODE_ABI);
310+
codeinst = jl_type_infer(mi, world, SOURCE_MODE_ABI);
311311
/* Even if this codeinst is ordinarily not cacheable, we need to force
312312
* it into the cache here, since it was explicitly requested and is
313313
* otherwise not reachable from anywhere in the system image.

src/codegen.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9895,7 +9895,7 @@ void jl_compile_workqueue(
98959895
if (policy != CompilationPolicy::Default &&
98969896
jl_atomic_load_relaxed(&codeinst->inferred) == jl_nothing) {
98979897
// Codegen lock is held, so SOURCE_MODE_FORCE_SOURCE_UNCACHED is not required
9898-
codeinst = jl_type_infer(codeinst->def, jl_atomic_load_relaxed(&codeinst->max_world), 0, SOURCE_MODE_FORCE_SOURCE);
9898+
codeinst = jl_type_infer(codeinst->def, jl_atomic_load_relaxed(&codeinst->max_world), SOURCE_MODE_FORCE_SOURCE);
98999899
}
99009900
if (codeinst) {
99019901
orc::ThreadSafeModule result_m =

src/engine.cpp

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
// This file is a part of Julia. License is MIT: https://julialang.org/license
2+
3+
#include <mutex>
4+
#include <condition_variable>
5+
#include <llvm/ADT/DenseMap.h>
6+
#include <llvm/ADT/DenseSet.h>
7+
#include <llvm/ADT/SmallVector.h>
8+
#include "julia.h"
9+
#include "julia_internal.h"
10+
#include "julia_assert.h"
11+
12+
using namespace llvm;
13+
14+
struct ReservationInfo {
15+
int16_t tid = 0;
16+
jl_code_instance_t *ci = nullptr;
17+
};
18+
19+
struct InferKey {
20+
jl_method_instance_t *mi = nullptr;
21+
jl_value_t *owner = nullptr;
22+
};
23+
24+
template<> struct llvm::DenseMapInfo<InferKey> {
25+
using FirstInfo = DenseMapInfo<jl_method_instance_t*>;
26+
using SecondInfo = DenseMapInfo<jl_value_t*>;
27+
28+
static inline InferKey getEmptyKey() {
29+
return InferKey{FirstInfo::getEmptyKey(),
30+
SecondInfo::getEmptyKey()};
31+
}
32+
33+
static inline InferKey getTombstoneKey() {
34+
return InferKey{FirstInfo::getTombstoneKey(),
35+
SecondInfo::getTombstoneKey()};
36+
}
37+
38+
static unsigned getHashValue(const InferKey& PairVal) {
39+
return detail::combineHashValue(FirstInfo::getHashValue(PairVal.mi),
40+
SecondInfo::getHashValue(PairVal.owner));
41+
}
42+
43+
static bool isEqual(const InferKey &LHS, const InferKey &RHS) {
44+
return LHS.mi == RHS.mi && LHS.owner == RHS.owner;
45+
}
46+
};
47+
48+
static std::mutex engine_lock;
49+
static std::condition_variable engine_wait;
50+
// map from MethodInstance to threadid that owns it currently for inference
51+
static DenseMap<InferKey, ReservationInfo> Reservations;
52+
// vector of which threads are blocked and which lease they need
53+
static SmallVector<InferKey, 0> Awaiting; // (this could be merged into ptls also)
54+
55+
56+
#ifdef __cplusplus
57+
extern "C" {
58+
#endif
59+
60+
jl_code_instance_t *jl_engine_reserve(jl_method_instance_t *m, jl_value_t *owner)
61+
{
62+
jl_task_t *ct = jl_current_task;
63+
ct->ptls->engine_nqueued++; // disables finalizers until inference is finished on this method graph
64+
jl_code_instance_t *ci = jl_new_codeinst_uninit(m, owner); // allocate a placeholder
65+
JL_GC_PUSH1(&ci);
66+
int8_t gc_state = jl_gc_safe_enter(ct->ptls);
67+
InferKey key = {m, owner};
68+
std::unique_lock<std::mutex> lock(engine_lock);
69+
auto tid = jl_atomic_load_relaxed(&ct->tid);
70+
if ((signed)Awaiting.size() < tid + 1)
71+
Awaiting.resize(tid + 1);
72+
while (1) {
73+
auto record = Reservations.find(key);
74+
if (record == Reservations.end()) {
75+
Reservations[key] = ReservationInfo{tid, ci};
76+
lock.unlock();
77+
jl_gc_safe_leave(ct->ptls, gc_state); // contains jl_gc_safepoint
78+
JL_GC_POP();
79+
return ci;
80+
}
81+
// before waiting, need to run deadlock/cycle detection
82+
// there is a cycle if the thread holding our lease is blocked
83+
// and waiting for (transitively) any lease that is held by this thread
84+
auto wait_tid = record->second.tid;
85+
while (1) {
86+
if (wait_tid == tid) {
87+
lock.unlock();
88+
jl_gc_safe_leave(ct->ptls, gc_state); // contains jl_gc_safepoint
89+
JL_GC_POP();
90+
ct->ptls->engine_nqueued--;
91+
return ci; // break the cycle
92+
}
93+
if ((signed)Awaiting.size() <= wait_tid)
94+
break; // no cycle, since it is running (and this should be unreachable)
95+
auto key2 = Awaiting[wait_tid];
96+
if (key2.mi == nullptr)
97+
break; // no cycle, since it is running
98+
auto record2 = Reservations.find(key2);
99+
if (record2 == Reservations.end())
100+
break; // no cycle, since it is about to resume
101+
assert(wait_tid != record2->second.tid);
102+
wait_tid = record2->second.tid;
103+
}
104+
Awaiting[tid] = key;
105+
engine_wait.wait(lock);
106+
Awaiting[tid] = InferKey{};
107+
}
108+
}
109+
110+
int jl_engine_hasreserved(jl_method_instance_t *m, jl_value_t *owner)
111+
{
112+
jl_task_t *ct = jl_current_task;
113+
InferKey key = {m, owner};
114+
std::unique_lock<std::mutex> lock(engine_lock);
115+
auto record = Reservations.find(key);
116+
return record != Reservations.end() && record->second.tid == jl_atomic_load_relaxed(&ct->tid);
117+
}
118+
119+
STATIC_INLINE int gc_marked(uintptr_t bits) JL_NOTSAFEPOINT
120+
{
121+
return (bits & GC_MARKED) != 0;
122+
}
123+
124+
void jl_engine_sweep(jl_ptls_t *gc_all_tls_states)
125+
{
126+
std::unique_lock<std::mutex> lock(engine_lock);
127+
bool any = false;
128+
for (auto I = Reservations.begin(); I != Reservations.end(); ++I) {
129+
jl_code_instance_t *ci = I->second.ci;
130+
if (!gc_marked(jl_astaggedvalue(ci)->bits.gc)) {
131+
auto tid = I->second.tid;
132+
Reservations.erase(I);
133+
jl_ptls_t ptls2 = gc_all_tls_states[tid];
134+
ptls2->engine_nqueued--;
135+
any = true;
136+
}
137+
}
138+
if (any)
139+
engine_wait.notify_all();
140+
}
141+
142+
void jl_engine_fulfill(jl_code_instance_t *ci, jl_code_info_t *src)
143+
{
144+
jl_task_t *ct = jl_current_task;
145+
std::unique_lock<std::mutex> lock(engine_lock);
146+
auto record = Reservations.find(InferKey{ci->def, ci->owner});
147+
if (record == Reservations.end() || record->second.ci != ci)
148+
return;
149+
assert(jl_atomic_load_relaxed(&ct->tid) == record->second.tid);
150+
ct->ptls->engine_nqueued--; // re-enables finalizers, but doesn't immediately try to run them
151+
Reservations.erase(record);
152+
engine_wait.notify_all();
153+
}
154+
155+
156+
#ifdef __cplusplus
157+
}
158+
#endif

src/gc-stacks.c

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ static void *malloc_stack(size_t bufsz) JL_NOTSAFEPOINT
5555
}
5656

5757

58-
static void free_stack(void *stkbuf, size_t bufsz)
58+
static void free_stack(void *stkbuf, size_t bufsz) JL_NOTSAFEPOINT
5959
{
6060
#ifdef JL_USE_GUARD_PAGE
6161
size_t guard_size = LLT_ALIGN(jl_guard_size, jl_page_size);
@@ -111,7 +111,7 @@ static void *malloc_stack(size_t bufsz) JL_NOTSAFEPOINT
111111
}
112112
# endif
113113

114-
static void free_stack(void *stkbuf, size_t bufsz)
114+
static void free_stack(void *stkbuf, size_t bufsz) JL_NOTSAFEPOINT
115115
{
116116
#ifdef JL_USE_GUARD_PAGE
117117
size_t guard_size = LLT_ALIGN(jl_guard_size, jl_page_size);
@@ -124,7 +124,7 @@ static void free_stack(void *stkbuf, size_t bufsz)
124124
}
125125
#endif
126126

127-
JL_DLLEXPORT uint32_t jl_get_num_stack_mappings(void)
127+
JL_DLLEXPORT uint32_t jl_get_num_stack_mappings(void) JL_NOTSAFEPOINT
128128
{
129129
return jl_atomic_load_relaxed(&num_stack_mappings);
130130
}
@@ -159,7 +159,7 @@ static unsigned select_pool(size_t nb) JL_NOTSAFEPOINT
159159
}
160160

161161

162-
static void _jl_free_stack(jl_ptls_t ptls, void *stkbuf, size_t bufsz)
162+
static void _jl_free_stack(jl_ptls_t ptls, void *stkbuf, size_t bufsz) JL_NOTSAFEPOINT
163163
{
164164
#ifdef _COMPILER_ASAN_ENABLED_
165165
__asan_unpoison_stack_memory((uintptr_t)stkbuf, bufsz);
@@ -238,7 +238,7 @@ JL_DLLEXPORT void *jl_malloc_stack(size_t *bufsz, jl_task_t *owner) JL_NOTSAFEPO
238238
return stk;
239239
}
240240

241-
void sweep_stack_pools(void)
241+
void sweep_stack_pools(void) JL_NOTSAFEPOINT
242242
{
243243
// Stack sweeping algorithm:
244244
// // deallocate stacks if we have too many sitting around unused

src/gc.c

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -444,7 +444,7 @@ JL_DLLEXPORT void jl_gc_run_pending_finalizers(jl_task_t *ct)
444444
if (ct == NULL)
445445
ct = jl_current_task;
446446
jl_ptls_t ptls = ct->ptls;
447-
if (!ptls->in_finalizer && ptls->locks.len == 0 && ptls->finalizers_inhibited == 0) {
447+
if (!ptls->in_finalizer && ptls->locks.len == 0 && ptls->finalizers_inhibited == 0 && ptls->engine_nqueued == 0) {
448448
run_finalizers(ct, 0);
449449
}
450450
}
@@ -620,7 +620,7 @@ JL_DLLEXPORT void jl_finalize_th(jl_task_t *ct, jl_value_t *o)
620620
}
621621

622622
// explicitly scheduled objects for the sweepfunc callback
623-
static void gc_sweep_foreign_objs_in_list(arraylist_t *objs)
623+
static void gc_sweep_foreign_objs_in_list(arraylist_t *objs) JL_NOTSAFEPOINT
624624
{
625625
size_t p = 0;
626626
for (size_t i = 0; i < objs->len; i++) {
@@ -638,7 +638,7 @@ static void gc_sweep_foreign_objs_in_list(arraylist_t *objs)
638638
objs->len = p;
639639
}
640640

641-
static void gc_sweep_foreign_objs(void)
641+
static void gc_sweep_foreign_objs(void) JL_NOTSAFEPOINT
642642
{
643643
assert(gc_n_threads);
644644
for (int i = 0; i < gc_n_threads; i++) {
@@ -1584,8 +1584,11 @@ STATIC_INLINE void gc_sweep_pool_page(gc_page_profiler_serializer_t *s, jl_gc_pa
15841584
// sweep over all memory that is being used and not in a pool
15851585
static void gc_sweep_other(jl_ptls_t ptls, int sweep_full) JL_NOTSAFEPOINT
15861586
{
1587+
sweep_stack_pools();
1588+
gc_sweep_foreign_objs();
15871589
sweep_malloced_memory();
15881590
sweep_big(ptls, sweep_full);
1591+
jl_engine_sweep(gc_all_tls_states);
15891592
}
15901593

15911594
static void gc_pool_sync_nfree(jl_gc_pagemeta_t *pg, jl_taggedvalue_t *last) JL_NOTSAFEPOINT
@@ -3662,8 +3665,6 @@ static int _jl_gc_collect(jl_ptls_t ptls, jl_gc_collection_t collection)
36623665
#endif
36633666
current_sweep_full = sweep_full;
36643667
sweep_weak_refs();
3665-
sweep_stack_pools();
3666-
gc_sweep_foreign_objs();
36673668
gc_sweep_other(ptls, sweep_full);
36683669
gc_scrub();
36693670
gc_verify_tags();
@@ -3953,7 +3954,7 @@ JL_DLLEXPORT void jl_gc_collect(jl_gc_collection_t collection)
39533954
// Only disable finalizers on current thread
39543955
// Doing this on all threads is racy (it's impossible to check
39553956
// or wait for finalizers on other threads without dead lock).
3956-
if (!ptls->finalizers_inhibited && ptls->locks.len == 0) {
3957+
if (!ptls->finalizers_inhibited && ptls->locks.len == 0 && ptls->engine_nqueued == 0) {
39573958
JL_TIMING(GC, GC_Finalizers);
39583959
run_finalizers(ct, 0);
39593960
}

0 commit comments

Comments
 (0)