Skip to content

2.10.1 client deadlocks on Python GIL during logging while nacking and receiving messages #116

Closed
@zbentley

Description

@zbentley

We've observed full python interpreter lockups (not just "blocking": the interpreter calling the client halts; can't be unblocked or time out/raise exceptions, even if the blocking operation is moved to a python Thread and waited on with a timeout) in the presence of:

  • The 2.10.1 python client.
  • Python threading (using pulsar Client from a thread).
  • Python asyncio/event loop Future manipulation.
  • consumers in the act of receiving messages (running client's internal receive loop).
  • Many Nacks of the same message.
  • Multiple consumers.
  • using a Python logger= argument to Client. We must do this, otherwise the logs emitted by the client to STDOUT fill up our disks.

All of those have to be present to trigger the issue. When multiple Shared consumers are repeatedly nacking messages with a 15sec delay on a topic with a few hundred messages (100% of them are nacked over and over), all but one of the consumers eventually (within a few minutes) locks up--that is, no Python in that consumer can run. It's not just that it's blocked in a negative_acknowledge call, it's that all threads, signal handlers, coroutines, etc. in that interpreter are stuck. This says GIL conflict to me.

While this program has many hundreds of threads, the stacktraces from the most relevant ones are included here:
threads.txt

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions