Return accepted if a new servier is already in cluster #634
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We found a corner case that the new member rejected the join_cluster_req during adding new member. Here is what happened:
T1. The first time leader invited new member to join cluster. The follower received the request, then saved state and called reconfigure to apply cluster config. However the leader didn't receive the resp(timeout).
leader's logs
follower's logs
T2 We retried add member operation, leader sent join_cluster_req, while the follower thought it's already in the cluster, and return response with
accept=false. The leader receivedaccept=falseand considered that the follower rejected the req. Then we are trapped into an endless retry.follower:
leader:
Since the follower also saved
is_catching_up=true, it skipped vote in handle_election_timeout, as a result, it doesn't have a chance to realize it is not in the cluster at all.