Skip to content

XCMP and DMP improvements #8860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

serban300
Copy link
Contributor

@serban300 serban300 commented Jun 16, 2025

Related to #489

This PR changes the parachain receiving logic for XCMP and DMP by adding some offchain processing before forwarding the messages to the parachain set_validation_data inherent. This enables us to relax the advancement rule.

@serban300 serban300 self-assigned this Jun 16, 2025
@serban300 serban300 added T9-cumulus This PR/Issue is related to cumulus. T18-zombienet_tests Trigger zombienet CI tests. labels Jun 16, 2025
@serban300 serban300 force-pushed the message-passing-adjustments branch 5 times, most recently from b65eafa to ae8fcc5 Compare June 16, 2025 11:23
@serban300 serban300 changed the title [WIP] XCMP and DMP improvements XCMP and DMP improvements Jun 16, 2025
@serban300 serban300 force-pushed the message-passing-adjustments branch from ae8fcc5 to c1811ae Compare June 16, 2025 16:15
Perform some offchain processing for the XCMP and DMP messages before
forwarding them to the parachain `set_validation_data` inherent.
@serban300 serban300 force-pushed the message-passing-adjustments branch from 3ee5165 to f797f1c Compare June 20, 2025 09:14
Copy link
Member

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looking good. Though, left comments on a lot of things :)

Copy link
Member

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looking good. There are still a lot of missing docs for public methods.

@@ -64,7 +66,7 @@ pub mod v0 {
PartialEq,
TypeInfo,
)]
pub struct ParachainInherentData {
pub struct RawParachainInherentData {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the name 🙈

Also while thinking about this, all the new stuff you introduced here does not need to be in this crate. All that can just live in the parachain-system crate. The ParachainInherentData (RawParachainInherentData right now) is the only thing that needs to stay here, because it is the thing that is exchanged between the node and the runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Changed the names back:
RawParachainInherentData -> ParachainInherentData
ParachainInherentData -> BasicParachainInherentData

Also moved all the new structures to parachain-system except HashedMessage because it simplifies some interactions with MessageQueueChain.

match new.hrmp_inbound.valid_watermarks.binary_search(&hrmp_watermark.watermark()) {
Ok(pos) => {
// Exact match, so this is OK in all cases.
let _ = new.hrmp_inbound.valid_watermarks.drain(..pos + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only change we need here is that we drop the + 1 or not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aka we don't need your new code? :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need an adjustment for the case:

				Err(pos) => match hrmp_watermark {
					HrmpWatermarkUpdate::Head(_) => {
						// Updates to Head are always OK.
						let _ = new.hrmp_inbound.valid_watermarks.drain(..pos);
					},
                                }

Here we should also retain the current watermark:

			// The current watermark will remain valid until updated.
			if new.hrmp_inbound.valid_watermarks.first() != Some(&hrmp_watermark) {
				new.hrmp_inbound.valid_watermarks.insert(0, hrmp_watermark)
			}

With my changes I tried to avoid having multiple branches and to do this in a more generic way. I can change it back if you prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should also retain the current watermark:

Sounds reasonable, but it would just mean that we need to add the insert to this branch? (I would keep the original code)

Also it would be nice to test these corner cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, adjusted the original code.

There are the constraints_disallowed_trunk_watermark and constraints_always_allow_head_watermark that cover these cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only change we need here is that we drop the + 1 or not?

It looks the same to me.

Why do we need to insert anything if we don't find the Head watermark? valid_watermarks should contain these watermarks coming from the relay chain runtime (for real hrmp messages). The Head watermark is always valid, regardless of whether we have any queued messages at that block. And if at one candidate we reach the Head (current relay chain number) without finding it in the valid_watermarks, this means we don't have any more messages to process. Therefore, the next candidate has no good reason to not advance the hrmp watermark to its new relay parent (if it has some messages, the relay chain will contain this valid hrmp watermark). Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, good point ! I missed this. Removed the additional insert.

Comment on lines 160 to 163
if sent_at < last_processed_msg.sent_at {
last_processed_msg_idx = Some(idx);
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be impossible to hit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in theory. But I think it's better to check.

@acatangiu acatangiu requested a review from alindima June 25, 2025 15:06
mqc_heads.entry(*sender).or_default().extend_with_hashed_msg(msg);

if msg.sent_at == last_processed_msg.sent_at {
last_processed_msg.reverse_idx += 1;
Copy link
Member

@ggwpez ggwpez Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not going in reverse but that does not matter, right? The meaning of reverse_idx is more like unprocessed_count.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not going in reverse here when counting, but it's actually a reverse index inside the group of messages with the provided sent_at. I added some examples to the InboundMessageId:

/// For the collection
/// `msgs = [{sent_at: 0}, {sent_at: 1}, {sent_at: 1}, {sent_at: 1}, {sent_at: 1}, {sent_at: 3}]`
///
/// `InboundMessageId {sent_at: 1, reverse_idx: 0}` points to `msgs[4]`
/// `InboundMessageId {sent_at: 1, reverse_idx: 3}` points to `msgs[1]`
/// `InboundMessageId {sent_at: 1, reverse_idx: 4}` points to `msgs[0]`

/// A malicious collator could provide an invalid collection. This function checks that the
/// current collection respects the advancement rule. Specifically, it checks that the current
/// collection contains as many full messages as possible.
pub fn check_advancement_rule(&self, collection_name: &str, max_size: usize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn check_advancement_rule(&self, collection_name: &str, max_size: usize) {
pub fn check_all_messages_included(&self, collection_name: &str, max_size: usize) {

? Since its not really advancement rule, more like censorship resistance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right, it's not really the advancement rule. Renamed it to check_enough_messages_included.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is panick-ing in this case better than just accepting this block?
Is the reasoning that the collator will not end up getting a block reward?

The most they can do is delay the message processing anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we need to have this for censorship resistance.

for msg in &self.full_messages {
size = size.saturating_add(msg.data().len());
}
let min_size = ((max_size as f64) * 0.75) as usize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a sanity check to this pallet that we do not shoot ourselves in the foot here.
There is no guarantee here that a parachain wont have a max message size configured of 500KiB or something stupid... actually not sure if we can even check this since its in the host config.
Maybe we can read the abridged host config? Not sure how complicated that is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a comment for the reasoning behind this 0.75 coefficient

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abridged host config doesn't contain the max downward message size. This value is only present on the relay chain from what I understand.

Also a sanity check would only check this on code changes right ? But I think the config can be changed dynamically on chain ? So it wouldn't cover all possible problems.

I think the safest thing would be to check only if there is at least 1 full message in the collection. Or to have min_size = 50% of max size. @ggwpez @bkchr WDYT ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said here: #8860 (comment), I'm wondering if it is worth panicking for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we should have at least a basic check here for censorship resistance. If we want to be super safe we can just check that self.full_messages.len() > 1. For the moment I'll change the coefficient to 0.5.

add a comment

Done

Copy link
Contributor

@alindima alindima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, nice work!

I assume it was tested end-to-end with zombienet?

for msg in &self.full_messages {
size = size.saturating_add(msg.data().len());
}
let min_size = ((max_size as f64) * 0.75) as usize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a comment for the reasoning behind this 0.75 coefficient

/// A malicious collator could provide an invalid collection. This function checks that the
/// current collection respects the advancement rule. Specifically, it checks that the current
/// collection contains as many full messages as possible.
pub fn check_advancement_rule(&self, collection_name: &str, max_size: usize) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is panick-ing in this case better than just accepting this block?
Is the reasoning that the collator will not end up getting a block reward?

The most they can do is delay the message processing anyway

match new.hrmp_inbound.valid_watermarks.binary_search(&hrmp_watermark.watermark()) {
Ok(pos) => {
// Exact match, so this is OK in all cases.
let _ = new.hrmp_inbound.valid_watermarks.drain(..pos + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only change we need here is that we drop the + 1 or not?

It looks the same to me.

Why do we need to insert anything if we don't find the Head watermark? valid_watermarks should contain these watermarks coming from the relay chain runtime (for real hrmp messages). The Head watermark is always valid, regardless of whether we have any queued messages at that block. And if at one candidate we reach the Head (current relay chain number) without finding it in the valid_watermarks, this means we don't have any more messages to process. Therefore, the next candidate has no good reason to not advance the hrmp watermark to its new relay parent (if it has some messages, the relay chain will contain this valid hrmp watermark). Am I missing something?

.collect();

// The current watermark will remain valid until updated.
if let Some(last_watermark) = HrmpWatermarks::<T>::get(&recipient) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as in the inclusion_emulator, do we really need this?
This is only really used in the inclusion emulator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's needed. If the hrmp watermark remains the same for a couple of blocks, check_modifications will always fail.

I tried without this and either the parachain was stallig or the parachain block time was increasing for a while. Don't remember exactly, but anyway, it's needed.

Copy link
Contributor Author

@serban300 serban300 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, nice work!

I assume it was tested end-to-end with zombienet?

@alindima Thanks for the review ! Yes, it was tested end-to-end with zombienet.

.collect();

// The current watermark will remain valid until updated.
if let Some(last_watermark) = HrmpWatermarks::<T>::get(&recipient) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's needed. If the hrmp watermark remains the same for a couple of blocks, check_modifications will always fail.

I tried without this and either the parachain was stallig or the parachain block time was increasing for a while. Don't remember exactly, but anyway, it's needed.

/// A malicious collator could provide an invalid collection. This function checks that the
/// current collection respects the advancement rule. Specifically, it checks that the current
/// collection contains as many full messages as possible.
pub fn check_advancement_rule(&self, collection_name: &str, max_size: usize) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we need to have this for censorship resistance.

for msg in &self.full_messages {
size = size.saturating_add(msg.data().len());
}
let min_size = ((max_size as f64) * 0.75) as usize;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we should have at least a basic check here for censorship resistance. If we want to be super safe we can just check that self.full_messages.len() > 1. For the moment I'll change the coefficient to 0.5.

add a comment

Done

match new.hrmp_inbound.valid_watermarks.binary_search(&hrmp_watermark.watermark()) {
Ok(pos) => {
// Exact match, so this is OK in all cases.
let _ = new.hrmp_inbound.valid_watermarks.drain(..pos + 1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, good point ! I missed this. Removed the additional insert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T9-cumulus This PR/Issue is related to cumulus. T18-zombienet_tests Trigger zombienet CI tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants