feat(gossipsub): Add MessageBatch #607

MarcoPolo · 2025-04-15T18:56:48Z

to support batch publishing messages

Replaces #602.

Batch publishing lets the system know there are multiple related messages to be published so it can prioritize sending different messages before sending copies of messages. For example, with the default API, when you publish two messages A and B, under the hood A gets sent to D=8 peers first, before B gets sent out. With this MessageBatch api we can now send one copy of A and then one copy of B before sending multiple copies.

When a node has bandwidth constraints relative to the messages it is publishing this improves dissemination time.

For more context see this post: https://ethresear.ch/t/improving-das-performance-with-gossipsub-batch-publishing/21713

to support batch publishing messages

raulk

There's quite a bit of indirection here.

There's a top-level NewMessageBatch API that accepts a gossipsub router. This seems inverted. Intuitively, I'd expect to initiate a message batch from a router (similar to badger's NewWriteBatch). It seems more idiomatic to me.
The MessageBatch keeps track of pending RPCs, which are added by the GossipSubRouter. To propagate itself through the call stack, it sets an anonymous PublishOption wrapping itself.
There's special casing in several spots in the GossipSubRouter to identify if this is a batch, unwrap the object, and do special things on it like queue the RPCs instead of actually sending them. This results in the paradox that a call to the router's gossipsub Publish doesn't actually publish anything under this mode.
The fact that a Message now embeds a messageBatch is semantically counterintuitive.

All in all, I think this code is (a) hard to maintain, and (b) it exposes a confusing public API. I'm curious what alternative APIs you considered. I'd imagine this dance is a last resort to make it work under the current design/implementation constraints.

Did you consider:

Extending the PubSub, PubSubRouter, GossipSubRouter, etc. type hierarchy with NewBatch and PublishBatch methods?
Reorganizing the reusable code under GossipSubRouter#Publish so the relevant parts can be reused from PublishBatch?

I think this would simplify the whole thing.

For what it's worth, in @cskiraly's the EthResearch post, he hinted at a distinction between message batches and queuing/diffusion disciplines. I think that distinction was lost a bit later, but I'd like to recover it here.

In my head:

A message batch is no more than an organizational unit to bundle a set of messages that the router learns about at once (without atomicity guarantees, hence not a Transaction). It does not imply a concrete queuing/diffusion/scheduling discipline.
The user should provide a queuing/diffusion/scheduling discipline when calling PublishBatch, in the form of a function or an interface implementation. This discipline encapsulates the conversion to RPCs and handles their subsequent dispatch. In other words, the code that currently lives under MessageBatch#Publish should be modular in itself.

MarcoPolo · 2025-04-29T18:06:43Z

All in all, I think this code is (a) hard to maintain, and (b) it exposes a confusing public API. I'm curious what alternative APIs you considered. I'd imagine this dance is a last resort to make it work under the current design/implementation constraints.

On (a), you're the reviewer so I'll defer to you. I agree there are subtleties here that may introduce footguns in the future example.

On (b), I disagree about this being a confusing public API. To be clear, this is the api users use:

batch, err := NewMessageBatch(pubsub)
// Handle err

for _, msg := range msgs
  err := batch.Add(ctx, topic, msg)
  // Handle err
}
batch.Publish()

Whether this is NewMessageBatch(pubsub) or pubsub.NewMessageBatch() is a fairly minor point. Happy to change.

Whether this is batch.Publish or pubsub.PublishBatch also seems like a fairly minor point, but I'd prefer the former. It's also consistent with your example of WriteBatch.

Did you consider:

Extending the PubSub, PubSubRouter, GossipSubRouter, etc. type hierarchy with NewBatch and PublishBatch methods?

Reorganizing the reusable code under GossipSubRouter#Publish so the relevant parts can be reused from PublishBatch?

I did, but I feared it would end up as a larger refactor. It may be worth it anyways as it may generally improve the codebase and remove future footguns. I'll open a new PR along these lines.

For what it's worth, in @cskiraly's the EthResearch post, he hinted at a distinction between message batches and queuing/diffusion disciplines. I think that distinction was lost a bit later, but I'd like to recover it here.

In my head:

A message batch is no more than an organizational unit to bundle a set of messages that the router learns about at once (without atomicity guarantees, hence not a Transaction). It does not imply a concrete queuing/diffusion/scheduling discipline.

The user should provide a queuing/diffusion/scheduling discipline when calling PublishBatch, in the form of a function or an interface implementation. This discipline encapsulates the conversion to RPCs and handles their subsequent dispatch. In other words, the code that currently lives under MessageBatch#Publish should be modular in itself.

An earlier draft of this PR allowed users to define the publish strategy. I tested via simulation the rarest message first strategy and the "shuffle" strategy from #602. The rarest first performed better (which intuitively makes sense). To that end, I chose to keep the API simpler by only using the rarest-first strategy. We can always make this configurable, but I think it would be a mistake to prematurely add this extension point now. This is not to say that the current rarest-first implementation is optimal, just that the inputs to the optimal solution may be non-obvious and defining an extension point now when we only have n=1 options is premature.

vyzo · 2025-04-29T18:11:04Z

Please hold your horses in the larger refactor.
This something that is being actively discussed as part of the v2.0 initiative, and we shouldnt rush on it.

MarcoPolo · 2025-04-29T18:24:28Z

Consider my horses held. I'll explore a small refactor that hopes to remove some indirection here.

raulk · 2025-04-29T18:29:08Z

@MarcoPolo I'd love to see a version of this PR with the refactor. Happy to pair on it if you'd like!

Re: batch.Publish() vs. pubsub.PublishBatch(), I suspect the latter can reduce the "spooky action at a distance" effect. It accomplishes that by encapsulating the RPC planning, scheduling, and dispatch all in a single place vs. spread across the indirection. Makes it easier to follow; and I'm all for reducing complexity ;-) But hard to tell without taking a stab.

Re: queuing/dispatch disciplines, even if we only support the "prioritize rarest" one at the moment, it still makes sense to introduce the abstraction as long as we have some confidence that it can withstand future disciplines going forward. Deliberate API design signals intentionality and makes a difference in shaping how APIs evolve. However, if you feel strongly against this, I can live without it.

I agree with @vyzo that we don't want a major refactor here, but introducing this feature cleanly (in an already complex and organic codebase) is a win.

MarcoPolo · 2025-04-30T23:10:55Z

I made significant changes to the design. Thanks @sukunrt and @raulk for the feedback. I think this approach is clearer. I'd recommend initially reviewing with whitespace changes hidden (?w=1).

Care was taken to ensure batched messages and normal messages go through as much as the same code flows as possible.

Some refactors along the way worth highlighting:

Introduce a new validation.ValidateLocal method that does not send a message. This lets us validate a message when adding to a batch without also running the send message logic.
⚠️ Breaking change: validation.validate returns a ValidationError{Reason: RejectValidationIgnoredDuplicate} on duplicate instead of a nil error. The only time you would get this error is if you are publishing two duplicate messages, and it's probably better that you get this error instead of silently doing nothing.

MarcoPolo mentioned this pull request Apr 15, 2025

Allow Batch Publishing For Applications #602

Closed

4 tasks

gitToki mentioned this pull request Apr 23, 2025

feat(gossipsub): Add Message Batch Publishing in Gossipsub libp2p/rust-libp2p#6006

Open

sukunrt self-requested a review April 23, 2025 14:02

MarcoPolo force-pushed the marco/batch-publishing branch from a15aa24 to 8e804a5 Compare April 25, 2025 03:00

feat(gossipsub): Add MessageBatch

4059300

to support batch publishing messages

MarcoPolo force-pushed the marco/batch-publishing branch from 8e804a5 to 4059300 Compare April 25, 2025 03:13

raulk requested changes Apr 29, 2025

View reviewed changes

MarcoPolo mentioned this pull request Apr 30, 2025

feat(gossipsub): Add MessageBatch [alternative] #608

Closed

MarcoPolo added 2 commits April 30, 2025 10:09

Rework implementation

0cda186

Fix flakiness in TestMessageBatchAsyncAddMsg due to default D values

5f0218c

MarcoPolo marked this pull request as draft April 30, 2025 18:41

MarcoPolo added 5 commits April 30, 2025 14:02

refactor message batch to hold []Messages instead of []rpc

1cc1308

renaming

ff109a6

Bring back PushLocal to avoid a breaking change

d427b08

renaming

8227901

copy edit

7cc3ef8

MarcoPolo force-pushed the marco/batch-publishing branch from c0c96e6 to 7cc3ef8 Compare April 30, 2025 22:59

simplify

72a33a8

MarcoPolo requested a review from raulk April 30, 2025 23:11

MarcoPolo marked this pull request as ready for review May 1, 2025 00:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gossipsub): Add MessageBatch #607

feat(gossipsub): Add MessageBatch #607

MarcoPolo commented Apr 15, 2025

raulk left a comment •

edited

Loading

MarcoPolo commented Apr 29, 2025

vyzo commented Apr 29, 2025 •

edited

Loading

MarcoPolo commented Apr 29, 2025

raulk commented Apr 29, 2025 •

edited

Loading

MarcoPolo commented Apr 30, 2025

feat(gossipsub): Add MessageBatch #607

Are you sure you want to change the base?

feat(gossipsub): Add MessageBatch #607

Conversation

MarcoPolo commented Apr 15, 2025

raulk left a comment • edited Loading

Choose a reason for hiding this comment

MarcoPolo commented Apr 29, 2025

vyzo commented Apr 29, 2025 • edited Loading

MarcoPolo commented Apr 29, 2025

raulk commented Apr 29, 2025 • edited Loading

MarcoPolo commented Apr 30, 2025

raulk left a comment •

edited

Loading

vyzo commented Apr 29, 2025 •

edited

Loading

raulk commented Apr 29, 2025 •

edited

Loading