Skip to content

[Task]: Kafka IO: Remove custom latency timer in ReadFromKafkaDoFn in favor Kafka's native metrics #37704

@junaiddshaukat

Description

@junaiddshaukat

What needs to happen?

In ReadFromKafkaDoFn.java, a Guava Stopwatch is currently used to measure the latency of consumer.poll() and report it to the RpcLatency metric.

As noted in an existing TODO comment in the codebase, this timer uses System.nanoTime(). When a consumer has prefetches waiting to be returned immediately, the overhead of System.nanoTime() can contribute more latency than it actually measures (see nanotrusting-nanotime).

To fix this, we should:

  1. Remove the pollTimer (Stopwatch) from the while loop in ReadFromKafkaDoFn.
  2. Stop manually reporting to updateSuccessfulRpcMetrics in this hot path, and instead rely on Kafka's native fetch-latency-avg JMX metric which users can already monitor.
  3. Replace the remainingTimeout calculation with a lower-overhead System.currentTimeMillis() check to ensure we still respect the consumerPollingTimeout without the nanoTime penalty.

Issue Priority

Priority: 2 (default / most normal work should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions