Skip to content

Remove the node from still_have_not_sent_messages if it sent an invalid message #99

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fjarri
Copy link
Member

@fjarri fjarri commented Apr 29, 2025

In draft, want to think a bit if we can only do this check once before branching out into correct/incorrect message processing.

@fjarri
Copy link
Member Author

fjarri commented Apr 29, 2025

Actually, I'm having second thoughts. The problem this PR attempts to fix was a problem in entropy-core where the nodes sent malformed messages (because of different settings used for serialization/deserialization), and since they didn't pass verification, still_have_not_sent_messages in RoundAccumulator was not updated (because anyone can send garbage, we can't just count it towards the node it says it's from). So the round could not finalize because it still expected correct messages from all the nodes.

I think this logic is sound, and we can only update still_have_not_sent_messages after the message passes verification (as it happens on master). entropy-core should handle this by imposing a timeout on the session (or perhaps we need to add timeout support in session::tokio functions?). @dvdplm, what is your opinion?

Also tagging @peg here to discuss the timeouts thing and who needs to be responsible for them.

@ameba23
Copy link

ameba23 commented Apr 30, 2025

Actually, I'm having second thoughts.

Do you mean second thoughts about adding the check in this PR? I would say any extra checks which can let us know strange things are happening are good, even if they can only happen when broken code (outside of this crate) is used.

I don't have strong opinions on whether to put a timeout in manul or entropy-core. We could easily wrap the call to par_run_session in a tokio::time::timeout. But also happy if manul does it for us.

@fjarri
Copy link
Member Author

fjarri commented Apr 30, 2025

Do you mean second thoughts about adding the check in this PR?

Yes, whether the logic this PR is trying to add is actually consistent with the rest of the error handling.

I would say any extra checks which can let us know strange things are happening are good, even if they can only happen when broken code (outside of this crate) is used.

The check is being performed and the result is logged, the question here is whether it is possible that Session::preprocess_message receives something that's claimed to be from a certain node, but actually isn't. In other words, whether we can assume some authentication is going on at the level above or not. If not, that is, if we can only be sure the message is from a specific node after we verified the signature, then this PR should not go in.

The problem with the timeout is that I still want to return the report when the timeout expires, and I'm not sure you can have an externally set timeout in tokio and do that. In Python's trio you'd catch the TimeoutError and return normally after that.

@ameba23
Copy link

ameba23 commented Apr 30, 2025

If not, that is, if we can only be sure the message is from a specific node after we verified the signature, then this PR should not go in.

I would say that we can be sure that messages originate from the node specified in the from field. Assuming we implemented everything correctly (which i guess is a big if). So i guess this should not be merged. 🤷

The problem with the timeout is that I still want to return the report when the timeout expires, and I'm not sure you can have an externally set timeout in tokio and do that

Then probably it makes sense if the timeout is implemented in manul. Possibly with the desired Duration being passed in somewhere.

@fjarri fjarri self-assigned this Apr 30, 2025
@fjarri
Copy link
Member Author

fjarri commented Apr 30, 2025

I think a more flexible choice would be to take a cancellation token, allowing the user to set up the timeout or forced cancellation if needed. See #100

@fjarri
Copy link
Member Author

fjarri commented Apr 30, 2025

I would say that we can be sure that messages originate from the node specified in the from field. Assuming we implemented everything correctly (which i guess is a big if).

This is an interesting point actually. manul does not assume the messages were previously authenticated because it already has authentication built in. So technically you don't need to do it.

You may still want to encrypt the messages, but the CGGMP scheme specifically does not require it (everything that needs to be encrypted is already encrypted by the scheme itself).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants