Skip to content

Add a metric for geo replication for tracking replicated subscriptions snapshot timeouts #21793

@lhotari

Description

@lhotari

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Geo replication replicated subscriptions (PIP-33) snapshot creation might time out.
The code contains a debug log message when this happens:

log.debug("[{}] Snapshot creation timed out for {}", topic.getName(), entry.getKey());

When this happens, the subscription state won't be reflected on the remote side and a backlog would build up.
There's no metric to detect this situation.

Solution

Add a new metric pulsar_replicated_subscriptions_snapshot_timeouts which is a counter (that only resets when the broker restarts).

Alternatives

No response

Anything else?

Increasing the timeout threshold replicatedSubscriptionsSnapshotTimeoutSeconds=30 -> replicatedSubscriptionsSnapshotTimeoutSeconds=60 could help resolve the situation. This metric would help detect when it would be necessary.

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/enhancementThe enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions