Skip to content

Allow a previously reset node to rejoin its original cluster (backport #13643) (backport #13667) #13669

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 1, 2025

Conversation

mergify[bot]
Copy link

@mergify mergify bot commented Apr 1, 2025

If a cluster member for whatever reason gets its local state wiped, it has a hard time re-joining the cluster, as the old cluster members will think the node is already a member and reject the request (if mnesia is used).

Proposed Changes

Mnesia: On failure due to 'already a member', ask to leave the cluster first and retry.
Khepri: no-op. Khepri is less strict already, and rabbit_khepri:can_join would accept a join request from a node that is already a member

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)
  • Build system and/or CI

Checklist

Put an x in the boxes that apply.
You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask on the mailing list.
We're here to help!
This is simply a reminder of what we are going to look for before merging your code.

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • I have added tests that prove my fix is effective or that my feature works
  • All tests pass locally with my changes
  • If relevant, I have added necessary documentation to https://github.com/rabbitmq/rabbitmq-website
  • If relevant, I have added this change to the first version(s) in release-notes that I expect to introduce it

Further Comments

I would like early feedback here, as to if this naive approach is even OK, if there should be a limited set of retries, and if the logic should live in rabbit_mnesia or in rabbit_db_cluster?
It feels a bit wonky that a function called can_join_cluster would also try to leave a cluster and try again, so perhaps it would be better if rabbit_db_cluster:join instead initiates the leave and retry request?


This is an automatic backport of pull request #13643 done by [Mergify](https://mergify.com).
This is an automatic backport of pull request #13667 done by [Mergify](https://mergify.com).

SimonUnge and others added 6 commits April 1, 2025 17:23
…onsider node a member.

Khepri: no-op. Khepri is less strict already, and rabbit_khepri:can_join would accept a join request from a node that is already a member

(cherry picked from commit dd49cbe)
(cherry picked from commit 6d464f9)
(cherry picked from commit 9ba545c)
(cherry picked from commit 6592ebd)
(cherry picked from commit e1f2865)
(cherry picked from commit 73742e4)
(cherry picked from commit cdeabe2)
(cherry picked from commit ae171b5)
(cherry picked from commit 36eb6ca)
(cherry picked from commit d809dff)
(cherry picked from commit e6bc6a4)
(cherry picked from commit b0eaa57)
@michaelklishin michaelklishin added this to the 4.0.8 milestone Apr 1, 2025
@michaelklishin michaelklishin merged commit 8e998c4 into v4.0.x Apr 1, 2025
270 checks passed
@michaelklishin michaelklishin deleted the mergify/bp/v4.0.x/pr-13667 branch April 1, 2025 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants