Skip to content

Conversation

@alexr17
Copy link
Contributor

@alexr17 alexr17 commented Jan 21, 2026

Describe the issue this Pull Request addresses

kafka delay metric not reporting when count drops to zero

Summary and Changelog

Always report the metric

Impact

Consistently reports the kafka delay metric

Risk Level

Low

Documentation Update

None

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Jan 22, 2026
Copy link
Collaborator

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexr17 Thanks for working on this! I have minor comments.

@alexr17 alexr17 force-pushed the kafka-delay-metric-fix branch from 2f46494 to c997148 Compare January 30, 2026 01:36
@alexr17 alexr17 force-pushed the kafka-delay-metric-fix branch from ec4cbbc to 66d7d2b Compare January 30, 2026 01:49
@alexr17
Copy link
Contributor Author

alexr17 commented Feb 2, 2026

@hudi-bot run azure

Use range-based assertion instead of exact value because Kafka's
message distribution across partitions is non-deterministic when
messages don't have explicit partition keys. The test now validates:
- Delay count is positive (lag exists)
- Delay count is reasonable (≤ total messages)
@alexr17
Copy link
Contributor Author

alexr17 commented Feb 4, 2026

@hudi-bot run azure

Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +321 to +323
// Always emit the Kafka delay count metric (even when 0)
metrics.updateStreamerSourceDelayCount(METRIC_NAME_KAFKA_DELAY_COUNT, kafkaDelayCount);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we want to account for else branch below to account for EARLIEST and GROUP where the delay count should also be calculated?

Another nit: if the check point string does not follow the format (checkTopicCheckpoint(lastCheckpointStr)), after the change it can fail the calculation.

@apache apache deleted a comment from hudi-bot Feb 10, 2026
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants