Allow a previously reset node to rejoin its original cluster (backport #13643) (backport #13667) #13669
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If a cluster member for whatever reason gets its local state wiped, it has a hard time re-joining the cluster, as the old cluster members will think the node is already a member and reject the request (if mnesia is used).
Proposed Changes
Mnesia: On failure due to 'already a member', ask to leave the cluster first and retry.
Khepri: no-op. Khepri is less strict already, and rabbit_khepri:can_join would accept a join request from a node that is already a member
Types of Changes
What types of changes does your code introduce to this project?
Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply.You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask on the mailing list.
We're here to help!
This is simply a reminder of what we are going to look for before merging your code.
CONTRIBUTING.md
documentFurther Comments
I would like early feedback here, as to if this naive approach is even OK, if there should be a limited set of retries, and if the logic should live in
rabbit_mnesia
or inrabbit_db_cluster
?It feels a bit wonky that a function called
can_join_cluster
would also try to leave a cluster and try again, so perhaps it would be better ifrabbit_db_cluster:join
instead initiates the leave and retry request?This is an automatic backport of pull request #13643 done by [Mergify](https://mergify.com).
This is an automatic backport of pull request #13667 done by [Mergify](https://mergify.com).