-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
SIGSEGV on macOS when calling ssl.SSLContext.load_verify_location during a shutdown #114653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I got it to reliably crash again, and it seems that it's related to an exception being thrown in one thread whilst many other threads are inside Upping the number of open files prevents the initial exception from happening, but whenever an exception is thrown python will quit with a Here is a crash due to a |
Is it possible to reproduce without using third-party libraries? |
I can try but I’m not sure if I will be able to and I don’t have a gigantic amount of time I can spend digging into this. Looking at the thread traces, there don’t appear to be any stack traces that are in 3rd party native modules - it’s all in CPython or ssl internals. But the issue may be triggered by how boto3 implements connection pooling, which is something that I’m not familiar with. I’ll give it a go. If I’m able to reproduce this again, is there any more information (outside of a reproducible test case) that would help with this? I tried enabling faulthandler, but it didn’t output anything which makes me think it’s something that is erroring after Python has shut down. |
Can this be reproduced without tqdm? Can this be reproduced with smaller number of threads or tasks? How many tasks were finished at the moment of the crash, how many tasks were started, and how much data the unfinished tasks read? |
I've managed to create a reproduction case: from multiprocessing.pool import ThreadPool
import ssl
import certifi
ctx = ssl.create_default_context()
location = certifi.where()
def test_func(_):
ctx.load_verify_locations(location)
with ThreadPool(processes=100) as pool:
for idx, item in enumerate(pool.imap_unordered(test_func, range(1000), chunksize=1)):
if idx == 100:
raise RuntimeError() On MacOS, this reliably causes it to trigger:
if you up the threads to something silly like 10k, it errors immediately:
|
Using |
I get the following stack trace when testing with a recentish checkout of main and openssl 3.0.12. OpenSSL was build without debug information, I'll have to rebuild it with debug information to get more information.
|
Stack trace of the crashing thread with openssl debug info, with open 3.0.13:
This tries to unlock a NULL lock, and according to the source for line two this is the global engine lock: CRYPTO_THREAD_unlock(global_engine_lock); From quickly scanning the openssl code base this lock is cleared from This seems to be caused by a running thread that is still active when the process is already shutting down, which happens here because the thread pool from #114653 (comment) uses daemon threads and AFAIK Py_Finalize() won't wait for daemon threads to exit. And indeed, if I replace the 'raise RuntimeError' statement by 'pass' the script works without problems. I'm not sure if we can fix this if my analysis is correct, there is a race condition between the main thread invoking the atexit(3) handlers and secondary threads still running openssl functions. From what I've found POSIX doesn't specify any ordering here. UPATE: For completeness sake here's the stacktrace for the main thread which is running atexit handlers in particular openssl's cleanup:
|
Apparently A recent commit to openssl added a no-atexit configure option (openssl/openssl@99fb31c), we could turn that option on in our macOS installer when we switch to a release that includes this option (that's not ideal though, this is not a macOS specific issue and also won't affect other installation (e.g. homebrew, anaconda). |
Uh oh!
There was an error while loading. Please reload this page.
Crash report
What happened?
On MacOS, calling
ssl.SSLContext.load_verify_locations
from multiple threads causes aSIGSEGV
during process termination when an unhandled exception is raised.I was able to create a repoduction case and I've attached the five MacOS crash reports here.
Reproduction
MacOS crash reports:
five.txt
four.txt
one.txt
three.txt
two.txt
CPython versions tested on:
3.11
Operating systems tested on:
macOS
Output from running 'python -VV' on the command line:
Python 3.11.7 (v3.11.7:fa7a6f2303, Dec 4 2023, 15:22:56) [Clang 13.0.0 (clang-1300.0.29.30)]
The text was updated successfully, but these errors were encountered: