-
Notifications
You must be signed in to change notification settings - Fork 27
fix: allow pubsub rpc to be processed concurrently #106
fix: allow pubsub rpc to be processed concurrently #106
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but it reintroduces unbounded concurrency which means a remote user could swamp a process that has slow/expensive message handling - please can you update this PR to use it-parallel (and make the parameter configurable) to let the user tune this behaviour and then we can get this in.
How can we use it-parallel and effectively curb the concurrency? |
To allow processing pubsub messages that have steps that are slow but async, process the messages in a queue. Makes the concurrency configurable with a default of 10x messages.
@wemeetagain I think you're right, wrapping |
fix: allow pubsub rpc to be processed concurrently
Co-authored-by: Alex Potsides <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good to me!
@achingbrain any other input you would like to give or can I get this shipped?
All good! |
thank you all for figuring this out! |
Sorry for the late comment but I have some concerns with this:
Also this introduces yet another dependency with its associated NPM security risks, performance costs, etc. I would love to see libp2p one day minimally simple and only with dependencies maintained by the libp2p team + required crypto libs |
Here's the implementation: https://github.com/sindresorhus/p-queue/blob/main/source/index.ts
|
The problem, I think, is that we allow the configurable message validation function to be async (perfectly reasonable on paper, since it's executed in the same flow as signature verification - we're async at that point to verify the signature, why not). Correct me if I'm wrong but as I understand it in Lodestar the message validation function verifies all blocks in the message are available to the application which can take on the order of minutes or more. I think the intention of message validation is just to say "this message is structurally correct", not to execute involved business logic that can then become a performance concern - if that's the requirement, it should happen after validation in a suitable manner managed by the application. I'd suggest:
We can then remove the queue introduced here and the extra complexity it introduces. |
@achingbrain Some context of Lodestar spec requirements: The eth2.0 spec forces us to have async validation functions that can potentially long for very long time. For example if you get a block that references a parent that's known but its associated state is not currently in the memory cache Ldoestar must read blocks from disk and replay the state transition function, which takes between 50-200ms per block. The whole process can be very long and must not be synchronous since it would freeze the node preventing it from attesting and doing other mandatory tasks. The eth2 gossip spec requires us to do the above before deciding if this message is an ACCEPT, IGNORE or REJECT.
This is not how eth2.0 gossipsub works unfortunately |
From @wemeetagain message
Not really because if the validation time is infinte Lodestar queues will get full and drop incoming messages, preventing OOM. The problem with this queue at the libp2p level is that you are taking subjective decisions that have widely different trade-offs depending on the app layer is. |
Yeah, it's required for the eth protocol, and imo the async validation function a really nice feature to have in general. Any sort of crypto in the application validation might be async.
I think ideally, whatever machinery included in libp2p would be configurable enough (and performant enough) to suit most applications. Also related: ChainSafe/js-libp2p-gossipsub#86: this go-libp2p approach signals to gossipsub when messages are throttled for appropriate scoring. But if the validation queue embedded in libp2p isn't workable, could we do something else that is less opinionated? Clearly there needs to be some management of incoming rpc/message/topic validation processing or the node can be easily DoSed or killed. Can we expose that pipeline to applications somehow? Maybe use streams/async iterators, allowing users to wrap or replace parts of the pipeline? |
#103 subtlely changed the pubsub rpc/message processing pipeline from fully concurrent to fully sequential. This is subtle because it seems the only part of processing that may be asynchronous is message validation. So in many cases, when message validation is not heavy, or when few messages are received, no delay in rpc/message processing is detectable.
This PR restores the original behavior, of allowing rpc/messages to be processed concurrently, but with care of ensuring that errors are handled and will not propagate to a top-level unhandled exception.