[fix][broker] Remove consumer from cache if closed before creation#24638
[fix][broker] Remove consumer from cache if closed before creation#24638nodece wants to merge 2 commits intoapache:masterfrom
Conversation
| response = getResponse(); | ||
| assertEquals(response.getClass(), CommandError.class); | ||
| assertEquals(((CommandError) response).getRequestId(), 3); | ||
| // We should not receive response for 1st consumer, since it was cancelled by the close |
There was a problem hiding this comment.
It's better to respond to the client even if the subscribe request was cancelled by a close command, otherwise the client will be left hanging.
This is a TODO.
lhotari
left a comment
There was a problem hiding this comment.
This happens because the consumer remains in the connection's cache even after it is closed before the subscription is fully established. As a result, any new subscription attempt with the same consumer ID fails with
A failed consumer with id is already present on the connection.
I think this PR makes sense to me, but I'm just wondering why would a client perform a new subscription attempt with the same consumer ID if it already closed the consumer? Explaining the scenario in the PR description would be useful context.
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java
Show resolved
Hide resolved
@lhotari Yes, that's correct. I guess that the issue occurs when:
The producer has same bugfix: https://github.com/apache/pulsar/blob/v3.0.12/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L2073 |
|
I personally think we don't need to remove the Consumer Future unless it's done. |
This is an async(improvement) issue and is not related to this PR. When the consumer sends a close request, the broker will complete the future. We must remove the future at this point; Otherwise, the broker cache retains the failed consumer, and subsequent subscribe requests will hit the error: Please see #24638 (comment). |
@nodece I think that we need @poorbarcode to explain why #22283 was needed since that seems to intentionally change the behavior and let the failed consumer future remain in the map. |
|
@nodece @poorbarcode Instead of leaving the failed future in the map to prevent race conditions, perhaps |
Once a subscription (including close) has completed, the consumer can reuse the same consumer ID (final value) to send a new subscription request. The main goal here seems to be preventing a consumer from sending multiple concurrent subscription requests. However, the old code also blocks reuse of the same consumer ID even after the subscription has done, which looks more like a bug since at that point the consumer should be allowed to resume with the same ID. I also noticed that we already handle similar concurrency cases in the ledger module, so we could apply the same approach here for |
Signed-off-by: Zixuan Liu <nodeces@gmail.com>
c726ab7 to
df530aa
Compare
|
A non-durable cursor bugfix: #24643 |
|
/pulsarbot rerun-failure-checks |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #24638 +/- ##
============================================
- Coverage 74.28% 74.28% -0.01%
+ Complexity 33164 32784 -380
============================================
Files 1882 1882
Lines 146855 146858 +3
Branches 16867 16867
============================================
- Hits 109094 109089 -5
+ Misses 29088 29087 -1
- Partials 8673 8682 +9
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Could you write a test for this case? |
@poorbarcode We only need to verify the logic in The existing test |
pulsar-broker/src/test/java/org/apache/pulsar/broker/service/ServerCnxTest.java
Show resolved
Hide resolved
Signed-off-by: Zixuan Liu <nodeces@gmail.com>
|
/pulsarbot rerun-failure-checks |
|
Thanks for the review and discussion. After re-evaluating the logic, I realized that the current behavior is correct: The consumer should only be removed from the cache once the subscription completes successfully. Removing it earlier could lead to inconsistent states. I'll close this PR. |
Motivation
When a consumer sends a subscribe command and then immediately sends a close command while the subscription is still in progress, the broker logs messages like:
This happens because the consumer remains in the connection's cache even after it is closed before the subscription is fully established. As a result, any new subscription attempt with the same consumer ID fails with
A failed consumer with id is already present on the connection.This regression was introduced by #22283.
Modifications
handleCloseConsumer.Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: