-
Notifications
You must be signed in to change notification settings - Fork 603
Semaphore exception when publishing with confirmation #1818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Related:
Hello, thanks for the report. There are tests that publish in parallel using the same channel, so you must have found an edge case. There's also not enough information to reproduce this issue, so I hope you can help out with that. Can you reproduce this issue easily, or easily enough? Please upgrade to version 7.1.2 and try to reproduce this issue using that version.
Can you describe this further? Why did you get nacks from RabbitMQ for confirmations? Did RabbitMQ log anything at that time? |
Fixes #1818 Start by modifying test to concurrently publish to the same `IChannel` instance.
@patriktiain please see the test I modified in this PR - #1819 Can you confirm that I'm publishing with the same channel using multiple threads ( |
Fixes #1818 Start by modifying test to concurrently publish to the same `IChannel` instance.
Hi! Thank you for you fast response and contribution! During a 18 hour test with 2.6 million publish this exception was thrown 10 times. So I wouldn't call it easy nor difficult but I don't think I can do it today. But I will check with our testing resources.
I'm sorry for not being clear. I didn't mean confirmation Nack. I meant that the wait for the confirmation timed out. On this run we used a timeout of 5 seconds. So we get this SemaphoreException and a burst of Confirmation Timeouts. I have looked through some system metrics from the test and I suspect some other process were using quite a lot of resources during the time we got these exceptions. Might have something to do with it or not, I don't know.
The semaphore exception occurred: 2025-03-27 13:28:20.588 2025-03-27 13:41:57.078 2025-03-27 14:24:50.593 The only RabbitMQ logs around that time were:
Yes, I would say this is similar. The only difference is:
I hope this provides enough information! |
Within RabbitMQ itself, we have seen timeouts of 5 seconds — it happens to be a common default in parts of Erlang/OTP — to be guaranteed to produce false positives under substantial load. So the practical minimum we try to use is 15s, although, of course, there is no right answer since every deployment, host resources and what constitutes "substantial load" will vary. Is the behavior the same with a 15s timeout, just to compare? |
Interesting, I didn't know of any default timeouts in RabbitMQ. Is there any documentation or blog post I can read on this subject? We can't use such a high publish confirmation timeout as we time restrictions from the client that created the message. Usually we use 2 seconds for confirmation timeout |
I was referring to operation timeouts between the components of RabbitMQ, different cluster members, and so on. There are no timeouts on client publishing operations (publisher confirms). There is a delivery acknowledgement timeout but it is 30 minutes by default or so, and it does not apply to your case. |
Can you tell me how you are setting this confirmation timeout? I want to be sure I'm doing it exactly the same way. Thank you. |
By CancellationToken
Where _publishingAckTimeoutMs is the configurable timeout. In the last test it was 5000 ms. Thank you! |
Great, thank you. You're probably aware of this, but |
Thank you! I didn't know that. I'll fix right away. |
@patriktiain - can you re run your tests using version https://www.myget.org/feed/rabbitmq-dotnet-client/package/nuget/RabbitMQ.Client |
Thank you, I'll try to test it out this week. I tested with version 7.1.2 and got more SemephoreExceptions then before. Also, which exception should you catch in case of confirmation timeout? I have catched the TaskCancellationException but with 7.1.2 I got a lot of:
Is this another type of Exception? If yes, is it safest to dispose the channel after such a exception? Thanks again! |
@patriktiain - |
Thank you for clearing that out! |
@patriktiain I've made some comments related to this issue here - #1674 (reply in thread) If you are using a 5 second timeout for confirmations from RabbitMQ, and experiencing those timeouts, your server is probably overloaded. Also, version 3.11 is very, very old and no longer supported. Just FYI. Thanks for testing |
@patriktiain - did you find time to test |
Hi! Sorry, we just went through regression so the resources has used up by testers. I tried it out today though. The application crashed unfortunately. I restarted it and it seams to be running now. We will attach some load within the hour. Here is the exception from EventViewer.
|
@patriktiain your environment is triggering some unusual behavior! The stack trace you provide is from hitting this code: The message If that line is hit, it means that the incoming consumer tag for the Let me know how your testing goes. |
Fixes #1818 Start by modifying test to concurrently publish to the same `IChannel` instance.
@patriktiain - if it's not too much trouble, please use https://www.myget.org/feed/rabbitmq-dotnet-client/package/nuget/RabbitMQ.Client It has my latest changes from PR #1819. Thanks again. |
Hi! We started to load yesterday on 7.1.3-alpha1 on the test server and unfortunate we still se the exception.
I have tried to figure out if we something strange setup in our environment. One thing could be that we have about 130 potential threads competing for 2 IChannels. This is not a setup that we think is viable in production but we wanted to test how high we can push the concurrency on this client. I till try with the new build asap. Thank you for continues support! |
I got same error but I found log on channel.BasicReturnAsync event as below. Client version: 7.1.0
|
@tai-yi please share what your code does in @patriktiain - if possible, could we do a Zoom session so I could review your code? I'm sure it would give me insights into trying to reproduce this issue. Since you can't share your code, and I can't reproduce this issue, that is the best option at this time. |
@lukebakken private Task HandleBasicReturnAsync(object _, BasicReturnEventArgs args)
{
Logger.LogWarning(
"Message {id} was returned by RabbitMQ, Code:{code} Reason:{reason} RoutingKey {routingKey}",
args.BasicProperties.MessageId,
args.ReplyCode,
args.ReplyText,
args.RoutingKey);
return Task.CompletedTask;
} We use same code like below public class RabbitMqPublisher // DI Singleton service
{
//instance has one MQ connection
private readonly ObjectPool<PooledChannel> _channels;
public RabbitMqPublisher(...)
{
_channels = new DefaultObjectPoolProvider().Create(new PooledChannelPolicy(this));
}
private class PooledChannelPolicy(RabbitMqPublisher publisher) : PooledObjectPolicy<PooledChannel>
{
public override PooledChannel Create()
{
return new PooledChannel(publisher);
}
public override bool Return(PooledChannel obj) => obj.Reusable;
}
}
|
I will try disable channel pooling on next release( maybe need 2 months) |
@patriktiain - any word on testing the latest release? FYI, I have started a new position that will take my time away from this project for a while, but I will continue to work on this issue when I can. |
@lukebakken, we have refactored a part of our RabbitMQ integration last week. Most of the changes are to use separate channels for publishing and topology. We will start load testing again this week. I haven't tested the new build. Sure, we could have a zoom meeting next week if that is alright with you? |
Describe the bug
Found on client version 7.1.1
I'm running in parallel allowing multiple threads using the same IChannel instance to publish message with confirmation.
When we use the version 6 of the client we had a channel pool to make sure that the channel wasn't shared between threads but when we upgraded to version 7 and got noticed that it might be thread safe (From discussion like this one #1721). We have started to test with a more liberal channel pool that allows multiple threads publish on the same IChannel instance.
After a period testing we have found multiple semaphore exceptions among a burst of confirmation fails. All exceptions happened within one second.
We are using RabbitMQ server version 3.11.28 and .Net 8.
Reproduction steps
1.Enable confirmation on channel
2.Publish in parallel on channel
Expected behavior
Additional context
No response
The text was updated successfully, but these errors were encountered: