-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Issue was that if a BGC thread handles a mark stack overflow, #74571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue was that if a BGC thread handles a mark stack overflow, #74571
Conversation
…s into yet another mark stack overflow on another heap, we set a flag on the region, and the containing heap. However, the BGC handling the other heap may have already decided to move on, and may thus not see the flag. Fix is to set the flag on the heap doing the scan rather than the heap containing the object causing the mark stack stack overflow. The thread handling that heap will indeed recheck the flag and rescan if necessary. This necessitates another change because in the concurrent case, we need each BGC thread to enter mark stack overflow scanning if there was a mark stack overflow on its heap. So we need to propagate the per-heap flag to all the heaps. Fixed another issue for regions where the small_object_segments local variable in background_process_mark_overflow_internal would be set incorrectly in the non-concurrent case. It would be set to FALSE as soon as all the regions for gen 0 are processed.
Tagging subscribers to this area: @dotnet/gc Issue Detailsbut runs into yet another mark stack overflow on another heap, we set a flag on the region, and the containing heap. However, the BGC thread handling the other heap may have already decided to move on, and may thus not see the flag. Fix is to set the flag on the heap doing the scan rather than the heap containing the object causing the mark stack stack overflow. The thread handling that heap will indeed recheck the flag and rescan if necessary. This necessitates another change because in the concurrent case, we need each BGC thread to enter mark stack overflow scanning if there was a mark stack overflow on its heap. So we need to propagate the per-heap flag to all the heaps. Fixed another issue for regions where the small_object_segments local variable in background_process_mark_overflow_internal would be set incorrectly in the non-concurrent case. It would be set to FALSE as soon as all the regions for gen 0 are processed.
|
#ifdef USE_REGIONS | ||
// in the regions case, compute the OR of all the per-heap flags | ||
if (g_heaps[i]->background_overflow_p) | ||
all_heaps_background_overflow_p = TRUE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any perf impact of setting this across all heaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the perf impact is to do some work balancing between all Server BGC threads processing OF. for heaps that didn't actually have any OF it would quickly filter by its regions' OF flag.
/backport to release/7.0 |
Started backporting to release/7.0: https://github.com/dotnet/runtime/actions/runs/2956839509 |
but runs into yet another mark stack overflow on another heap, we set a flag on the region, and the containing heap. However, the BGC thread handling the other heap may have already decided to move on, and may thus not see the flag.
Fix is to set the flag on the heap doing the scan rather than the heap containing the object causing the mark stack stack overflow. The thread handling that heap will indeed recheck the flag and rescan if necessary. This necessitates another change because in the concurrent case, we need each BGC thread to enter mark stack overflow scanning if there was a mark stack overflow on its heap. So we need to propagate the per-heap flag to all the heaps.
Fixed another issue for regions where the small_object_segments local variable in background_process_mark_overflow_internal would be set incorrectly in the non-concurrent case. It would be set to FALSE as soon as all the regions for gen 0 are processed.