feat: Add inference metrics by gyliu513 · Pull Request #5320 · llamastack/llama-stack

gyliu513 · 2026-03-26T15:46:05Z

What does this PR do?

Summary

Add three new OpenTelemetry inference metrics to track LLM serving performance:
- llama_stack.inference.duration_seconds end-to-end inference latency (streaming and non-streaming)
- llama_stack.inference.time_to_first_token_seconds time to first content token (streaming only)
- llama_stack.inference.tokens_per_second output token throughput (completion_tokens / duration)
All metrics carry model, provider, stream, and status attributes
Add Grafana dashboard and kind cluster deployment script

Query Examples

# P50 Tokens Per Second
curl -s --get 'http://localhost:9090/api/v1/query' \
  --data-urlencode 'query=histogram_quantile(0.50, sum by (le) (rate(llama_stack_llama_stack_inference_tokens_per_second_bucket[5m])))'

# P95 Inference Duration
curl -s --get 'http://localhost:9090/api/v1/query' \
  --data-urlencode 'query=histogram_quantile(0.95, sum by (le) (rate(llama_stack_llama_stack_inference_duration_seconds_bucket[5m])))'

# P95 Time to First Token
curl -s --get 'http://localhost:9090/api/v1/query' \
  --data-urlencode 'query=histogram_quantile(0.95, sum by (le) (rate(llama_stack_llama_stack_inference_time_to_first_token_seconds_bucket{stream="true"}[5m])))'

gyliu513 requested review from ashwinb, bbrowning, cdoern, ehhuang, franciscojavierarceo, leseb, mattf and raghotham as code owners March 26, 2026 15:46

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 26, 2026

gyliu513 marked this pull request as draft March 26, 2026 15:46

gyliu513 force-pushed the inference branch from 4e11d9a to b817839 Compare March 26, 2026 19:36

feat: Add inference metrics

527cb5a

gyliu513 force-pushed the inference branch from b817839 to 527cb5a Compare March 26, 2026 19:38

gyliu513 marked this pull request as ready for review March 26, 2026 19:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add inference metrics#5320

feat: Add inference metrics#5320
gyliu513 wants to merge 1 commit intollamastack:mainfrom
gyliu513:inference

gyliu513 commented Mar 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gyliu513 commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Summary

Query Examples

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gyliu513 commented Mar 26, 2026 •

edited

Loading