fix: allow pubsub rpc to be processed concurrently #106

wemeetagain · 2021-09-13T20:54:01Z

#103 subtlely changed the pubsub rpc/message processing pipeline from fully concurrent to fully sequential. This is subtle because it seems the only part of processing that may be asynchronous is message validation. So in many cases, when message validation is not heavy, or when few messages are received, no delay in rpc/message processing is detectable.

This PR restores the original behavior, of allowing rpc/messages to be processed concurrently, but with care of ensuring that errors are handled and will not propagate to a top-level unhandled exception.

achingbrain

LGTM but it reintroduces unbounded concurrency which means a remote user could swamp a process that has slow/expensive message handling - please can you update this PR to use it-parallel (and make the parameter configurable) to let the user tune this behaviour and then we can get this in.

wemeetagain · 2021-09-16T20:18:47Z

How can we use it-parallel and effectively curb the concurrency?
just wrapping the stream in parallel doesn't seem like it will do anything? (at least while not awaiting each _processRpc)

To allow processing pubsub messages that have steps that are slow but async, process the messages in a queue. Makes the concurrency configurable with a default of 10x messages.

achingbrain · 2021-09-17T17:21:07Z

@wemeetagain I think you're right, wrapping _processRpcMessage invocations in it-parallel won't help if _processRpc gets invoked repeatedly - I've opened ChainSafe#1 against the branch that makes up this PR, it uses a queue with configurable concurrency to process incoming messages across multiple RPC messages, please let me know what you think.

fix: allow pubsub rpc to be processed concurrently

packages/interfaces/src/pubsub/index.js

Co-authored-by: Alex Potsides <[email protected]>

vasco-santos

this looks good to me!
@achingbrain any other input you would like to give or can I get this shipped?

achingbrain · 2021-09-20T14:38:47Z

All good!

vasco-santos · 2021-09-20T14:39:50Z

thank you all for figuring this out!

dapplion · 2021-12-02T13:25:49Z

Sorry for the late comment but I have some concerns with this:

Is this queue bounded in capacity? If validation time is Infinite (theoretical example), will this queue cause an OOM?
In Lodestar we have our own queues, so this is a queue on a queue adding more complexity and making this harder to reason about in our end.
Has this queue been bench-marked? What the performance overhead of this queue?

Also this introduces yet another dependency with its associated NPM security risks, performance costs, etc. I would love to see libp2p one day minimally simple and only with dependencies maintained by the libp2p team + required crypto libs

wemeetagain · 2021-12-02T14:30:58Z

Here's the implementation: https://github.com/sindresorhus/p-queue/blob/main/source/index.ts

It appears the capacity is not bounded
somewhat agree, but do you think that there should be some ability to throttle message processing/validation? The original case, pre fix: make tests more reliable #103, had no concurrency control at all, just processed all messages. (And still, would OOM if validation time is infinite)
I haven't seen a benchmark (not quite sure what the baseline comparison would be), tho there is a benchmark here: https://github.com/sindresorhus/p-queue/blob/main/bench.ts

achingbrain · 2021-12-02T14:50:03Z

validate is an async method. It has to be async as part of the flow we verify the signature of the message against the sending key. This is not slow so historically there's been no attempt to guard against the process getting overwhelmed.

The problem, I think, is that we allow the configurable message validation function to be async (perfectly reasonable on paper, since it's executed in the same flow as signature verification - we're async at that point to verify the signature, why not).

Correct me if I'm wrong but as I understand it in Lodestar the message validation function verifies all blocks in the message are available to the application which can take on the order of minutes or more.

I think the intention of message validation is just to say "this message is structurally correct", not to execute involved business logic that can then become a performance concern - if that's the requirement, it should happen after validation in a suitable manner managed by the application.

I'd suggest:

Make the configurable validation functions synchronous - this sends a clear signal to the developer that it should be fast
Remove configurable validation altogether - again, any further message pre-processing that is required can be done by the application

We can then remove the queue introduced here and the extra complexity it introduces.

dapplion · 2021-12-02T15:23:10Z

@achingbrain Some context of Lodestar spec requirements:

The eth2.0 spec forces us to have async validation functions that can potentially long for very long time. For example if you get a block that references a parent that's known but its associated state is not currently in the memory cache Ldoestar must read blocks from disk and replay the state transition function, which takes between 50-200ms per block. The whole process can be very long and must not be synchronous since it would freeze the node preventing it from attesting and doing other mandatory tasks.

The eth2 gossip spec requires us to do the above before deciding if this message is an ACCEPT, IGNORE or REJECT.

I think the intention of message validation is just to say "this message is structurally correct", not to execute involved business logic

This is not how eth2.0 gossipsub works unfortunately

dapplion · 2021-12-02T15:26:59Z

From @wemeetagain message

somewhat agree, but do you think that there should be some ability to throttle message processing/validation? The original case, pre fix: make tests more reliable #103, had no concurrency control at all, just processed all messages. (And still, would OOM if validation time is infinite)

Not really because if the validation time is infinte Lodestar queues will get full and drop incoming messages, preventing OOM.

The problem with this queue at the libp2p level is that you are taking subjective decisions that have widely different trade-offs depending on the app layer is.

wemeetagain · 2021-12-02T23:34:44Z

The eth2 gossip spec requires us to do the above before deciding if this message is an ACCEPT, IGNORE or REJECT.

Yeah, it's required for the eth protocol, and imo the async validation function a really nice feature to have in general. Any sort of crypto in the application validation might be async.

The problem with this queue at the libp2p level is that you are taking subjective decisions that have widely different trade-offs depending on the app layer is.

I think ideally, whatever machinery included in libp2p would be configurable enough (and performant enough) to suit most applications.
For some context, here's a look at what go-libp2p does (and why this PR doesn't seem so problematic, but incomplete)
https://github.com/libp2p/go-libp2p-pubsub/blob/master/validation.go - validation functions are go routines that execute concurrently in a queue managed by a pool of workers - configurable queue capacity, worker count, etc

Also related: ChainSafe/js-libp2p-gossipsub#86: this go-libp2p approach signals to gossipsub when messages are throttled for appropriate scoring.

But if the validation queue embedded in libp2p isn't workable, could we do something else that is less opinionated? Clearly there needs to be some management of incoming rpc/message/topic validation processing or the node can be easily DoSed or killed. Can we expose that pipeline to applications somehow? Maybe use streams/async iterators, allowing users to wrap or replace parts of the pipeline?

wemeetagain added 2 commits September 13, 2021 15:48

fix: allow pubsub rpc to be processed concurrently

a82bf6e

fix: unsyntactic continue

3a8ae07

dapplion mentioned this pull request Sep 13, 2021

Remove added awaits from libp2-gossipsub ChainSafe/lodestar#3130

Merged

chore: add comments, simplify

4ef25bb

achingbrain suggested changes Sep 14, 2021

View reviewed changes

fix: process incoming messages in a queue

0597156

To allow processing pubsub messages that have steps that are slow but async, process the messages in a queue. Makes the concurrency configurable with a default of 10x messages.

Merge pull request #1 from libp2p/fix/process-incoming-messages-in-queue

f9e577d

fix: allow pubsub rpc to be processed concurrently

achingbrain reviewed Sep 17, 2021

View reviewed changes

packages/interfaces/src/pubsub/index.js Outdated Show resolved Hide resolved

Update packages/interfaces/src/pubsub/index.js

9acc70d

Co-authored-by: Alex Potsides <[email protected]>

vasco-santos reviewed Sep 20, 2021

View reviewed changes

achingbrain approved these changes Sep 20, 2021

View reviewed changes

vasco-santos merged commit 52f96b3 into libp2p:master Sep 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: allow pubsub rpc to be processed concurrently #106

fix: allow pubsub rpc to be processed concurrently #106

wemeetagain commented Sep 13, 2021

achingbrain left a comment

wemeetagain commented Sep 16, 2021

achingbrain commented Sep 17, 2021

vasco-santos left a comment

achingbrain commented Sep 20, 2021

vasco-santos commented Sep 20, 2021

dapplion commented Dec 2, 2021 •

edited

Loading

wemeetagain commented Dec 2, 2021

achingbrain commented Dec 2, 2021 •

edited

Loading

dapplion commented Dec 2, 2021 •

edited

Loading

dapplion commented Dec 2, 2021 •

edited

Loading

wemeetagain commented Dec 2, 2021

fix: allow pubsub rpc to be processed concurrently #106

fix: allow pubsub rpc to be processed concurrently #106

Conversation

wemeetagain commented Sep 13, 2021

achingbrain left a comment

Choose a reason for hiding this comment

wemeetagain commented Sep 16, 2021

achingbrain commented Sep 17, 2021

vasco-santos left a comment

Choose a reason for hiding this comment

achingbrain commented Sep 20, 2021

vasco-santos commented Sep 20, 2021

dapplion commented Dec 2, 2021 • edited Loading

wemeetagain commented Dec 2, 2021

achingbrain commented Dec 2, 2021 • edited Loading

dapplion commented Dec 2, 2021 • edited Loading

dapplion commented Dec 2, 2021 • edited Loading

wemeetagain commented Dec 2, 2021

dapplion commented Dec 2, 2021 •

edited

Loading

achingbrain commented Dec 2, 2021 •

edited

Loading

dapplion commented Dec 2, 2021 •

edited

Loading

dapplion commented Dec 2, 2021 •

edited

Loading