[Bug] big individuallyDeletedMessages causes message dispatching hangs

### Search before reporting

- [x] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar.


### Read release policy

- [x] I understand that [unsupported versions](https://pulsar.apache.org/contribute/release-policy/#supported-versions) don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.


### User environment

- broker version: 4.0.8
- broker os: Linux pulsar-broker-1a-0 6.12.40-64.114.amzn2023.aarch64 #1 SMP Tue Aug 26 05:25:54 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux
- java: openjdk version "17.0.12" 2024-07-16
- client: golang
- client version: 0.17.0
- client os: same as broker
- client java version: NaN

### Issue Description

One of our scenario is check user's payment with variadic delay, from 10s to 1h indifferent.
My observation is that when the individuallyDeletedMessages becomes quite big (100,000+, and the setting `managedLedgerMaxUnackedRangesToPersist` is 100,000 too), dispatching of messages become strange. The message dispatch is very slow and most messages don't get dispatched.
Checking the internal-stats, I can see something as such:
```
      "numberOfEntriesSinceFirstNotAckedMessage": 751170,
      "totalNonContiguousDeletedMessagesRange": 105911,
```

No more error message on both client and server side.

I see there's a similar issue https://github.com/apache/pulsar/issues/23200, yet we're using Shared subscription type.

### Error messages

```text
The suspicious message I got is:
client side tries to reconnect to the broker with:

INFO[0960] Connecting to broker                          remote_addr="pulsar://pulsar-broker.pulsar1.svc.cluster.local:6650"
INFO[0960] TCP connection established                    local_addr="10.120.147.140:56018" remote_addr="pulsar://pulsar-broker.pulsar1.svc.cluster.local:6650"
INFO[0960] Connection is ready                           local_addr="10.120.147.140:56018" remote_addr="pulsar://pulsar-broker.pulsar1.svc.cluster.local:6650"


And the server has a shedding performed.

Since it is very costy to have the DEBUG level log turned on, I didn't have the chance to catch debug level messages.
```

### Reproducing the issue

I've written two parts that can reproduce such issue.
Producer that would delivery messages with variadic delay (from 10s to 1h).
Consumer that would receive messages.

Wait for the message cumulate until the expected number, the consumer hangs with very little message received.

### Additional information

It might relates to the setting of `managedLedgerMaxUnackedRangesToPersist` but for our usage type, it is not possible to increase this setting infinitely because the message would grow.
Also I've notice that when the `individuallyDeletedMessages` is quite big, every time a consumer reconnect to the broker would cause both broker and zookeeper to have a peak CPU usage, I assume it is because pulsar was trying to compute the actual messages that shall be dispatched.
I wonder if there's a way to optimize such issue or a way to tune it ? Or this is not the correct way of using pulsar ?

### Are you willing to submit a PR?

- [ ] I'm willing to submit a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] big individuallyDeletedMessages causes message dispatching hangs #25028

Search before reporting

Read release policy

User environment

Issue Description

Error messages

Reproducing the issue

Additional information

Are you willing to submit a PR?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] big individuallyDeletedMessages causes message dispatching hangs #25028

Description

Search before reporting

Read release policy

User environment

Issue Description

Error messages

Reproducing the issue

Additional information

Are you willing to submit a PR?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions