Skip to content

Scalability with high cardinality of subscriptions and published message topics #710

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gwik opened this issue Jul 16, 2018 · 14 comments
Closed
Assignees

Comments

@gwik
Copy link
Contributor

gwik commented Jul 16, 2018

We use NATS primarily as a message routing system with a very high cardinality of topics of published messages and subscriptions. Most of the message topics won't find any subscription and our rate of subscription / unsubscription is pretty high (8-11K/s inserts and the same number of removes in the sublist) for a total number of 200-300k subscription. Our message rate though is pretty humble compared to what NATS can handle (8K/s).

Over that last few weeks we had a serie of production issues that were related to NATS. The symptoms were always the same, after some peak in the subscription rate the server becomes instable, refuses new clients and message rate drops. Slow consumers are detected, client are kicked and reconnect sending all their subscriptions back...

On my local machine I had a hard time to reproduce it, with the message and subscription rate of production the NATS server was able to handle the load and the insert/remove rate of subscription needed to be way bigger than the production to show the issue. Long story short I finally found that when publishing to topics that had a high cardinality of topics I was able to reproduce the issue that match the one we had in production and profile the server. Looking at contention profiles I saw that contention on the sublist lock was leading the profile which was not the case previously. The end-to-end publisher to subscription latency starts to grow almost immediately.

Low cardinality:

screenshot from 2018-07-16 18-31-39

High cardinality:

screenshot from 2018-07-16 18-32-32

After playing with the sublist I found that removing the sublist cache hash map completely solves my issue. Since the cache has a limited size of 1024 it is useless with a high cardinality of message topics and grabbing the write lock and maintain the cache creates too much contention for the server to handle the load.

https://github.com/nats-io/gnatsd/blob/master/server/sublist.go#L265-L277

With the cache removed the server has no problems and scales with a subscription rate more that 10x higher.
I believe this cache was added in order to increase the throughput, however it hurts us in our use case.
We are now running NATS with the cache completely removed and we no longer have issues and the CPU dropped about 30%.
I don't want to support a fork though and I'd like to submit a patch to add a command line flag that allow disable the cache. This would allow to support both the throughput oriented and the high cardinality topics / high subscription rate use case. Unless you have a better idea ?

@derekcollison
Copy link
Member

First off thank you for such a detailed report, much appreciated! Can you tell me a bit about what version you were using and the hardware setup? Size of machine, memory, CPU(s) etc?

Thanks and I will dig in a bit on this.

@derekcollison derekcollison self-assigned this Jul 16, 2018
@derekcollison
Copy link
Member

Could you email me the pprof trees so I can have a hires version? [email protected]. Thanks!

@gwik
Copy link
Contributor Author

gwik commented Jul 17, 2018

Adding CPU profile as it illustrates the cache issue better:

pprof-cpu-ko

@derekcollison
Copy link
Member

Still be good to know what version of Go and what version of NATS we are talking about. Also hardware specs can be helpful as well. Thanks.

@gwik
Copy link
Contributor Author

gwik commented Jul 17, 2018

go 1.10.3
Tested with both 1.1.0 and 1.2.0 on linux machine with Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz (8 cores).
Production is running in docker on GCP with 3 CPU reservation.

@derekcollison
Copy link
Member

Thanks again for the info, do you have a test that can reproduce? Or could you describe to me (you may have already, if so apologize) how to reproduce so I could write a test. Thx.

@derekcollison
Copy link
Member

I see the description above, so no need for a repeat, apologies for the request. If you would be willing to share your test program that would be great.

@derekcollison
Copy link
Member

What is a good example of a subject in your system? Looking for number of tokens, total length etc..

@gwik
Copy link
Contributor Author

gwik commented Aug 7, 2018

Sorry I was off. My example program use /usr/share/dict/words so the number of tokens is always one and the size is rather small. I'll share the program.

@derekcollison
Copy link
Member

That would be great. Thanks!

@gwik
Copy link
Contributor Author

gwik commented Aug 8, 2018

There you go: https://github.com/znly/natsworkout

@derekcollison
Copy link
Member

Thanks will take a look.

@derekcollison
Copy link
Member

Been looking at making the cache more efficient and avoiding the contention issues you are seeing. We could just give a flag to disable, but prefer a better solution. I think I am on the right track.

benchmark                                           old ns/op     new ns/op     delta
Benchmark______________________SublistInsert-20     348           340           -2.30%
Benchmark____________SublistMatchSingleToken-20     27.5          28.4          +3.27%
Benchmark______________SublistMatchTwoTokens-20     28.0          29.4          +5.00%
Benchmark____________SublistMatchThreeTokens-20     27.9          29.8          +6.81%
Benchmark_____________SublistMatchFourTokens-20     28.6          29.7          +3.85%
Benchmark_SublistMatchFourTokensSingleResult-20     28.6          29.9          +4.55%
Benchmark_SublistMatchFourTokensMultiResults-20     28.6          30.3          +5.94%
Benchmark_______SublistMissOnLastTokenOfFive-20     28.6          31.3          +9.44%
Benchmark____________Sublist10XMultipleReads-20     142           73.1          -48.52%
Benchmark___________Sublist100XMultipleReads-20     88.3          45.8          -48.13%
Benchmark__________Sublist1000XMultipleReads-20     71.8          33.2          -53.76%
Benchmark________________SublistMatchLiteral-20     925           915           -1.08%
Benchmark_____SublistMatch10kSubsWithNoCache-20     1567          1961          +25.14%
Benchmark__________SublistRemove1TokenSingle-20     332           361           +8.73%
Benchmark___________SublistRemove1TokenBatch-20     264           282           +6.82%
Benchmark_________SublistRemove2TokensSingle-20     615           588           -4.39%
Benchmark__________SublistRemove2TokensBatch-20     490           496           +1.22%
Benchmark________SublistRemove1TokenQGSingle-20     366           377           +3.01%
Benchmark_________SublistRemove1TokenQGBatch-20     299           320           +7.02%
Benchmark_________SublistRemove1kSingleMulti-20     1353576       1541284       +13.87%
Benchmark__________SublistRemove1kBatchMulti-20     446161        454942        +1.97%
Benchmark__SublistRemove1kSingle2TokensMulti-20     1485844       1323442       -10.93%
Benchmark___SublistRemove1kBatch2TokensMulti-20     559833        568543        +1.56%
Benchmark____SublistCacheContention10M10A10R-20     6779          2327          -65.67%
Benchmark_SublistCacheContention100M100A100R-20     8181          2691          -67.11%
Benchmark____SublistCacheContention1kM1kA1kR-20     5903          2932          -50.33%
Benchmark_SublistCacheContention10kM10kA10kR-20     7822          3105          -60.30%

@derekcollison
Copy link
Member

I believe this is addressed with merge of #726. Feel free to open if the issue comes back up with the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants