Skip to content

Allow configuring custom histogram buckets for HTTP latency metrics #3612

@serodriguez68

Description

@serodriguez68

Problem

Misk's built-in HTTP latency histograms (histo_http_request_latency_ms, histo_client_http_request_latency_ms, etc.) use defaultBuckets optimized for sub-second latencies with fine granularity up to ~1 second, then coarser 500ms increments above that.

For applications with LLM-powered endpoints where typical latencies are 1-60 seconds, this bucket distribution results in:

  • Poor resolution in the range that matters (most observations fall into a few large buckets)
  • Inaccurate percentile calculations (p50, p95, p99)
  • Limited observability into latency distribution

Proposed Solution

Add a histogram_bucket_overrides field to PrometheusConfig that allows apps to specify custom buckets for specific metrics by name:

prometheus:
  histogram_bucket_overrides:
    histo_http_request_latency_ms: [100, 500, 1000, 2000, 5000, 10000, 20000, 30000, 45000, 60000, 90000, 120000]
    histo_client_http_request_latency_ms: [100, 500, 1000, 2000, 5000, 10000, 20000, 30000, 45000, 60000, 90000, 120000]

Scope

Update 4 metrics to support bucket overrides:

  • histo_http_request_latency_ms (inbound HTTP)
  • histo_http_request_exclusive_latency_ms (exclusive processing time)
  • histo_client_http_request_latency_ms (outbound HTTP)
  • mcp_tool_handler_latency (MCP tool calls)

Backward Compatibility

  • Fully backward compatible
  • Empty/missing histogram_bucket_overrides preserves current defaultBuckets behavior
  • No breaking changes to public APIs

Notes on Cross-App Aggregation

Different bucket boundaries across apps sharing a Datadog account should be safe when:

  1. Using DDSketch distributions (Datadog handles merging), OR
  2. Querying the metric by a tag in which the bucketing strategy is always uniform. (e.g. filtering by service tag)

We've verified our organization uses DDSketch distributions, so this change is safe for our use case.

Next steps

I am happy to do the work to implement this change. I opened the issue to get alignment with the maintainers. Once I get a 🟢 I will make the change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions