-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Search before reporting
- I searched in the issues and found nothing similar.
Read release policy
- I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.
User environment
- broker version: 4.0.8
- broker os: Linux pulsar-broker-1a-0 6.12.40-64.114.amzn2023.aarch64 javascript client #1 SMP Tue Aug 26 05:25:54 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux
- java: openjdk version "17.0.12" 2024-07-16
- client: golang
- client version: 0.17.0
- client os: same as broker
- client java version: NaN
Issue Description
One of our scenario is check user's payment with variadic delay, from 10s to 1h indifferent.
My observation is that when the individuallyDeletedMessages becomes quite big (100,000+, and the setting managedLedgerMaxUnackedRangesToPersist is 100,000 too), dispatching of messages become strange. The message dispatch is very slow and most messages don't get dispatched.
Checking the internal-stats, I can see something as such:
"numberOfEntriesSinceFirstNotAckedMessage": 751170,
"totalNonContiguousDeletedMessagesRange": 105911,
No more error message on both client and server side.
I see there's a similar issue #23200, yet we're using Shared subscription type.
Error messages
The suspicious message I got is:
client side tries to reconnect to the broker with:
INFO[0960] Connecting to broker remote_addr="pulsar://pulsar-broker.pulsar1.svc.cluster.local:6650"
INFO[0960] TCP connection established local_addr="10.120.147.140:56018" remote_addr="pulsar://pulsar-broker.pulsar1.svc.cluster.local:6650"
INFO[0960] Connection is ready local_addr="10.120.147.140:56018" remote_addr="pulsar://pulsar-broker.pulsar1.svc.cluster.local:6650"
And the server has a shedding performed.
Since it is very costy to have the DEBUG level log turned on, I didn't have the chance to catch debug level messages.
Reproducing the issue
I've written two parts that can reproduce such issue.
Producer that would delivery messages with variadic delay (from 10s to 1h).
Consumer that would receive messages.
Wait for the message cumulate until the expected number, the consumer hangs with very little message received.
Additional information
It might relates to the setting of managedLedgerMaxUnackedRangesToPersist but for our usage type, it is not possible to increase this setting infinitely because the message would grow.
Also I've notice that when the individuallyDeletedMessages is quite big, every time a consumer reconnect to the broker would cause both broker and zookeeper to have a peak CPU usage, I assume it is because pulsar was trying to compute the actual messages that shall be dispatched.
I wonder if there's a way to optimize such issue or a way to tune it ? Or this is not the correct way of using pulsar ?
Are you willing to submit a PR?
- I'm willing to submit a PR!