2.10.1 client deadlocks on Python GIL during logging while nacking and receiving messages

We've observed full python interpreter lockups (not just "blocking": the interpreter calling the client halts; can't be unblocked or time out/raise exceptions, even if the blocking operation is moved to a python Thread and waited on with a timeout) in the presence of:
- The 2.10.1 python client.
- Python threading (using pulsar Client from a thread).
- Python asyncio/event loop Future manipulation.
- consumers in the act of receiving messages (running client's internal receive loop).
- Many Nacks of the same message.
- Multiple consumers.
- using a Python `logger=` argument to Client. We must do this, otherwise the logs emitted by the client to STDOUT fill up our disks.


All of those have to be present to trigger the issue. When multiple Shared consumers are repeatedly nacking messages with a 15sec delay on a topic with a few hundred messages (100% of them are nacked over and over), all but one of the consumers eventually (within a few minutes) locks up--that is, no Python in that consumer can run. It's not just that it's blocked in a `negative_acknowledge` call, it's that all threads, signal handlers, coroutines, etc. in that interpreter are stuck. This says GIL conflict to me.

While this program has many hundreds of threads, the stacktraces from the most relevant ones are included here:
[threads.txt](https://github.com/apache/pulsar-client-python/files/11406968/threads.txt)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2.10.1 client deadlocks on Python GIL during logging while nacking and receiving messages #116

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

2.10.1 client deadlocks on Python GIL during logging while nacking and receiving messages #116

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions