Skip to content

Prevent TOCTTOU race in finalize_ref #41786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

jpsamaroo
Copy link
Member

Just because we check islocked is false, doesn't mean that it won't be locked in the next instant when we check again. Manually deletes the WKD entry if the lock can be taken, otherwise defer the finalizer.

Should prevent cases like JuliaParallel/Dagger.jl#243 (comment).

Please let me know if there's a better way to fix this race!

@jpsamaroo jpsamaroo force-pushed the jps/finalize-ref-race branch from 3cc4ce2 to 945048e Compare August 4, 2021 20:45
@krynju
Copy link
Contributor

krynju commented Aug 6, 2021

Tried this branch with the code from JuliaParallel/Dagger.jl#243 (comment)
and this happens https://pastebin.com/hHpHRv0j

@krynju
Copy link
Contributor

krynju commented Aug 6, 2021

Still the same after latest commit

➜  Dagger.jl git:(ae34779) ../julia/julia -t4
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.0-DEV.309 (2021-08-06)
 _/ |\__'_|_|_|\__'_|  |  jps/finalize-ref-race/09b29c2776 (fork: 6883 commits, 1052 days)
|__/                   |

(@v1.8) pkg> activate .
  Activating project at `/tmp/Dagger.jl`

julia> using Dagger, DataFrames
[ Info: Precompiling Dagger [d58978e5-989f-55fb-8d15-ea34adc7bf54]

[ Info: Precompiling DataFrames [a93c6f00-e57d-5684-b7b6-d8193f3e46c0]

julia>

julia> s=100_000;d = DTable(DataFrame((a=rand(8*s), b=rand(8*s), c=rand(8*s), d=rand(8*s))),chunksize=s);

julia> Dagger._temp(d, :c;npartitions=1000);
error in running finalizer: ErrorException("task switch not allowed from inside gc finalizer")
jl_error at /tmp/julia/src/rtutils.c:41
jl_switch at /tmp/julia/src/task.c:481
try_yieldto at ./task.jl:754
wait at ./task.jl:824
wait at ./condition.jl:112
lock at ./lock.jl:100
lock at ./lock.jl:185
lock at ./weakkeydict.jl:90 [inlined]
del_client at /tmp/julia/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:236 [inlined]
del_client at /tmp/julia/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:234 [inlined]
del_client at /tmp/julia/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:232 [inlined]
send_del_client at /tmp/julia/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:265
finalize_ref at /tmp/julia/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:102
jl_apply at /tmp/julia/src/julia.h:1768 [inlined]
run_finalizer at /tmp/julia/src/gc.c:278
jl_gc_run_finalizers_in_list at /tmp/julia/src/gc.c:365
run_finalizers at /tmp/julia/src/gc.c:394 [inlined]
run_finalizers at /tmp/julia/src/gc.c:372
jl_gc_collect at /tmp/julia/src/gc.c:3266
maybe_collect at /tmp/julia/src/gc.c:881 [inlined]
jl_gc_pool_alloc at /tmp/julia/src/gc.c:1205
jl_gc_alloc_ at /tmp/julia/src/julia_internal.h:354 [inlined]
jl_gc_alloc at /tmp/julia/src/gc.c:3291
_new_array_ at /tmp/julia/src/array.c:137
jl_array_copy at /tmp/julia/src/array.c:1230
copy at ./array.jl:369
getindex at /home/krynju/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:500
#445 at /tmp/Dagger.jl/src/table/operations.jl:108
macro expansion at /tmp/Dagger.jl/src/processor.jl:154 [inlined]
#53 at ./threadingconstructs.jl:178
unknown function (ip: 0x7ffb109c586f)
jl_apply at /tmp/julia/src/julia.h:1768 [inlined]
start_task at /tmp/julia/src/task.c:880
fatal: error thrown and no exception handler available.
ErrorException("attempt to switch to exited task")
jl_error at /tmp/julia/src/rtutils.c:41
jl_switch at /tmp/julia/src/task.c:479
try_yieldto at ./task.jl:754
wait at ./task.jl:824
task_done_hook at ./task.jl:531
jl_apply at /tmp/julia/src/julia.h:1768 [inlined]
jl_finish_task at /tmp/julia/src/task.c:217
start_task at /tmp/julia/src/task.c:891
double free or corruption (!prev)

signal (6): Aborted
in expression starting at REPL[4]:1
gsignal at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
__libc_message at /usr/lib/libc.so.6 (unknown line)
malloc_printerr at /usr/lib/libc.so.6 (unknown line)
_int_free at /usr/lib/libc.so.6 (unknown line)
_int_realloc at /usr/lib/libc.so.6 (unknown line)
realloc at /usr/lib/libc.so.6 (unknown line)
arraylist_grow at /tmp/julia/src/support/arraylist.c:58 [inlined]
arraylist_push at /tmp/julia/src/support/arraylist.c:69
jl_gc_run_finalizers_in_list at /tmp/julia/src/gc.c:357
run_finalizers at /tmp/julia/src/gc.c:394 [inlined]
run_finalizers at /tmp/julia/src/gc.c:372
jl_gc_run_pending_finalizers at /tmp/julia/src/gc.c:405
jl_mutex_unlock at /tmp/julia/src/locks.h:133 [inlined]
jl_process_events at /tmp/julia/src/jl_uv.c:215
process_events at ./libuv.jl:104 [inlined]
wait at ./task.jl:825
wait at ./condition.jl:112
fetch_buffered at ./channels.jl:365
fetch at ./channels.jl:359
fetch_ref at /tmp/julia/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:544
wait_ref at /tmp/julia/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:504 [inlined]
call_on_owner at /tmp/julia/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:497 [inlined]
wait at /tmp/julia/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:520 [inlined]
#remotecall_wait#147 at /tmp/julia/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:429
jl_apply at /tmp/julia/src/julia.h:1768 [inlined]
do_apply at /tmp/julia/src/builtins.c:713
remotecall_wait at /tmp/julia/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:429
jl_apply at /tmp/julia/src/julia.h:1768 [inlined]
do_apply at /tmp/julia/src/builtins.c:713
#remotecall_wait#151 at /tmp/julia/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:453
jl_apply at /tmp/julia/src/julia.h:1768 [inlined]
do_apply at /tmp/julia/src/builtins.c:713
remotecall_wait at /tmp/julia/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:453
jl_apply at /tmp/julia/src/julia.h:1768 [inlined]
do_apply at /tmp/julia/src/builtins.c:713
macro expansion at /home/krynju/.julia/packages/MemPool/RPu2r/src/datastore.jl:101 [inlined]
#15 at ./task.jl:411
unknown function (ip: 0x7ffb10900b1f)
jl_apply at /tmp/julia/src/julia.h:1768 [inlined]
start_task at /tmp/julia/src/task.c:880
Allocations: 72035309 (Pool: 72018731; Big: 16578); GC: 54
[1]    17508 abort (core dumped)  ../julia/julia -t4
➜  Dagger.jl git:(ae34779)

@jpsamaroo
Copy link
Member Author

That's actually a different one in finalize_ref. Let me see if I can fix that too...

@jpsamaroo
Copy link
Member Author

Ok, this one is harder to hack around, so let's try making del_client asynchronous.

@krynju
Copy link
Contributor

krynju commented Aug 6, 2021

error in running finalizer: ErrorException("task switch not allowed from inside gc finalizer")
jl_error at /opt/julia/src/rtutils.c:41
jl_switch at /opt/julia/src/task.c:481
try_yieldto at ./task.jl:754
wait at ./task.jl:824
wait at ./condition.jl:112
lock at ./lock.jl:100
lock at ./lock.jl:185
lock at ./weakkeydict.jl:89 [inlined]
del_client at /opt/julia/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:231 [inlined]
del_client at /opt/julia/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:229 [inlined]
del_client at /opt/julia/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:227 [inlined]
send_del_client at /opt/julia/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:260
finalize_ref at /opt/julia/usr/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:97
jl_apply at /opt/julia/src/julia.h:1768 [inlined]
run_finalizer at /opt/julia/src/gc.c:278
jl_gc_run_finalizers_in_list at /opt/julia/src/gc.c:365
run_finalizers at /opt/julia/src/gc.c:394 [inlined]
run_finalizers at /opt/julia/src/gc.c:372
jl_gc_collect at /opt/julia/src/gc.c:3266
maybe_collect at /opt/julia/src/gc.c:881 [inlined]
jl_gc_pool_alloc at /opt/julia/src/gc.c:1205
jl_gc_alloc_ at /opt/julia/src/julia_internal.h:354 [inlined]
jl_gc_alloc at /opt/julia/src/gc.c:3291
_new_array_ at /opt/julia/src/array.c:137
jl_array_copy at /opt/julia/src/array.c:1230
copy at ./array.jl:369
getindex at /home/krynju/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:500
#445 at /tmp/Dagger.jl/src/table/operations.jl:108
macro expansion at /tmp/Dagger.jl/src/processor.jl:154 [inlined]
#53 at ./threadingconstructs.jl:178
unknown function (ip: 0x7fb209d74caf)
jl_apply at /opt/julia/src/julia.h:1768 [inlined]
start_task at /opt/julia/src/task.c:880
fatal: error thrown and no exception handler available.
ErrorException("attempt to switch to exited task")
jl_error at /opt/julia/src/rtutils.c:41
jl_switch at /opt/julia/src/task.c:479
try_yieldto at ./task.jl:754
wait at ./task.jl:824
task_done_hook at ./task.jl:531
jl_apply at /opt/julia/src/julia.h:1768 [inlined]
jl_finish_task at /opt/julia/src/task.c:217
start_task at /opt/julia/src/task.c:891

@jpsamaroo
Copy link
Member Author

Superseded by #41846

@jpsamaroo jpsamaroo closed this Aug 10, 2021
@jpsamaroo jpsamaroo deleted the jps/finalize-ref-race branch August 10, 2021 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants