Remove the node from `still_have_not_sent_messages` if it sent an invalid message #99

fjarri · 2025-04-29T07:27:41Z

In draft, want to think a bit if we can only do this check once before branching out into correct/incorrect message processing.

…alid message

fjarri · 2025-04-29T23:36:55Z

Actually, I'm having second thoughts. The problem this PR attempts to fix was a problem in entropy-core where the nodes sent malformed messages (because of different settings used for serialization/deserialization), and since they didn't pass verification, still_have_not_sent_messages in RoundAccumulator was not updated (because anyone can send garbage, we can't just count it towards the node it says it's from). So the round could not finalize because it still expected correct messages from all the nodes.

I think this logic is sound, and we can only update still_have_not_sent_messages after the message passes verification (as it happens on master). entropy-core should handle this by imposing a timeout on the session (or perhaps we need to add timeout support in session::tokio functions?). @dvdplm, what is your opinion?

Also tagging @peg here to discuss the timeouts thing and who needs to be responsible for them.

ameba23 · 2025-04-30T08:40:30Z

Actually, I'm having second thoughts.

Do you mean second thoughts about adding the check in this PR? I would say any extra checks which can let us know strange things are happening are good, even if they can only happen when broken code (outside of this crate) is used.

I don't have strong opinions on whether to put a timeout in manul or entropy-core. We could easily wrap the call to par_run_session in a tokio::time::timeout. But also happy if manul does it for us.

fjarri · 2025-04-30T15:44:11Z

Do you mean second thoughts about adding the check in this PR?

Yes, whether the logic this PR is trying to add is actually consistent with the rest of the error handling.

I would say any extra checks which can let us know strange things are happening are good, even if they can only happen when broken code (outside of this crate) is used.

The check is being performed and the result is logged, the question here is whether it is possible that Session::preprocess_message receives something that's claimed to be from a certain node, but actually isn't. In other words, whether we can assume some authentication is going on at the level above or not. If not, that is, if we can only be sure the message is from a specific node after we verified the signature, then this PR should not go in.

The problem with the timeout is that I still want to return the report when the timeout expires, and I'm not sure you can have an externally set timeout in tokio and do that. In Python's trio you'd catch the TimeoutError and return normally after that.

ameba23 · 2025-04-30T15:55:58Z

If not, that is, if we can only be sure the message is from a specific node after we verified the signature, then this PR should not go in.

I would say that we can be sure that messages originate from the node specified in the from field. Assuming we implemented everything correctly (which i guess is a big if). So i guess this should not be merged. 🤷

The problem with the timeout is that I still want to return the report when the timeout expires, and I'm not sure you can have an externally set timeout in tokio and do that

Then probably it makes sense if the timeout is implemented in manul. Possibly with the desired Duration being passed in somewhere.

fjarri · 2025-04-30T18:58:09Z

I think a more flexible choice would be to take a cancellation token, allowing the user to set up the timeout or forced cancellation if needed. See #100

fjarri · 2025-04-30T19:12:43Z

I would say that we can be sure that messages originate from the node specified in the from field. Assuming we implemented everything correctly (which i guess is a big if).

This is an interesting point actually. manul does not assume the messages were previously authenticated because it already has authentication built in. So technically you don't need to do it.

You may still want to encrypt the messages, but the CGGMP scheme specifically does not require it (everything that needs to be encrypted is already encrypted by the scheme itself).

Remove the node from still_have_not_sent_messages if it sent an inv…

c561594

…alid message

fjarri self-assigned this Apr 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove the node from `still_have_not_sent_messages` if it sent an invalid message #99

Remove the node from `still_have_not_sent_messages` if it sent an invalid message #99

Uh oh!

fjarri commented Apr 29, 2025

Uh oh!

fjarri commented Apr 29, 2025 •

edited

Loading

Uh oh!

ameba23 commented Apr 30, 2025

Uh oh!

fjarri commented Apr 30, 2025

Uh oh!

ameba23 commented Apr 30, 2025

Uh oh!

fjarri commented Apr 30, 2025

Uh oh!

fjarri commented Apr 30, 2025

Uh oh!

Uh oh!

Remove the node from still_have_not_sent_messages if it sent an invalid message #99

Are you sure you want to change the base?

Remove the node from still_have_not_sent_messages if it sent an invalid message #99

Uh oh!

Conversation

fjarri commented Apr 29, 2025

Uh oh!

fjarri commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ameba23 commented Apr 30, 2025

Uh oh!

fjarri commented Apr 30, 2025

Uh oh!

ameba23 commented Apr 30, 2025

Uh oh!

fjarri commented Apr 30, 2025

Uh oh!

fjarri commented Apr 30, 2025

Uh oh!

Uh oh!

Remove the node from `still_have_not_sent_messages` if it sent an invalid message #99

Remove the node from `still_have_not_sent_messages` if it sent an invalid message #99

fjarri commented Apr 29, 2025 •

edited

Loading