Fix queue corruption in memberlist's TransmitLimitedQueue #324
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What I ran into
While running the following integration test (Go 1.21+) I hit a 100 % reproducible
timeout waiting for update broadcastwhenever several nodes callSetTagsconcurrently:will panic: node 1: timeout waiting for update broadcast
Root cause analysis
SetTagsturns the tag update into aNamedBroadcastheld inside aTransmitLimitedQueue(TLQ).item1,id = 1) istaken out → sent → deleted and re-inserted into the queue.
idGenis reset to 0.The re-inserted item still keeps its old id = 1.
a peer. A new TLQ entry (
item2) is created and gets the sameid = 1 (idGen restarted).
ReplaceOrInserttreatsitem1anditem2as the same key,silently overwriting the in-flight broadcast without calling
Finished().The goroutine waiting in
SetTagsis never unblocked → timeout.Sequence:
The fix
Simply remove the line that resets
idGenwhen the queue becomesempty
All updated unit & integration tests pass.