-
Notifications
You must be signed in to change notification settings - Fork 0
feat(attribution): request-to-token attribution map with correlation propagation #62
Description
Summary
- Add request-to-token attribution telemetry so model spend can be traced to request/session/endpoint/workflow/stage.
- Required correlation fields : request_id, session_id, operation_id, correlation_id, endpoint_name, workflow_name, stage_name, provider, model_name, deployment_name, user_id/actor_id.
Required event shape
Acceptance criteria
- 100% of LLM calls emit token telemetry with request_id + operation_id.
- 100% include workflow + stage.
- Provide request-completion rollup totals (total_tokens, llm_calls).
- Support KQL joins requests↔token events by operation_Id/request_id.
Why
shared resources + fan-out + retries make billing non-attributable; incidents need defensible causation.
Reference
pvc-costops-analytics PRD 04_request_to_token_attribution_prd.md
Request-to-Token Attribution Map — PRD
Module: Token telemetry and attribution
Marketing name: Request-to-Token Attribution
Priority: P1 — High. Enables defensible incident attribution and cost accountability across shared AI resources.
Status: Draft — depends on per-call token telemetry and correlation propagation.
TL;DR
Implement request/session/workflow/stage correlation and per-call token telemetry so model spend can be attributed back to the request and code path that caused it. This is the bridge between cost dashboards and technical responsibility.
This PRD is tracked for implementation in (ticket to be created there).
Problem statement
Billing systems aggregate model spend by resource, model, meter, and time window. That is insufficient for incident response and accountability when:
- a model resource is shared across multiple apps
- a single endpoint fans out into multiple model calls
- background tasks detach from the originating request
- streaming/retry logic multiplies calls
Without request-to-token attribution, teams argue from narrative rather than evidence.
Goals and non-goals
Goals
- Attribute token consumption to: resource → deployment → endpoint → workflow → stage → session → request.
- Ensure correlation identifiers propagate end-to-end (HTTP request → orchestrator → LLM call → telemetry).
- Provide an event shape that supports KQL joining and aggregation.
- Provide "confidence levels" for attribution (high/medium/low) based on telemetry quality.
Non-goals (Phase 1)
- Storing prompt content in telemetry by default.
- Building a UI for attribution (log/metrics first; UI later).
Required correlation fields
Minimum required correlation metadata for any request that can trigger an LLM call:
- (if applicable)
- (App Insights)
- (cross-service propagation)
- / (where appropriate)
Required telemetry event shape
For every model call, capture a record like:
Functional requirements
1) Correlation propagation
- Ensure identifiers survive through the pipeline:
- HTTP request id → controller → service → orchestrator → LLM client → telemetry
- When work detaches (background tasks), explicitly record:
- and
2) Per-request aggregation
- Emit a per-request summary at request completion:
- , , ,
3) Queryability
- KQL can join to token events by and/or .
- Provide rollups by:
- endpoint, workflow, stage, model, deployment, user_id
4) Attribution confidence labels
- High confidence: per-call token telemetry contains request_id + operation_id + workflow/stage.
- Medium confidence: request telemetry + workflow reconstruction aligns with billing totals.
- Low confidence: only aggregate billing and code inference available.
Success metrics
| Metric | Target |
|---|---|
| % LLM calls with request_id + operation_id | 100% |
| % LLM calls with workflow + stage | 100% |
| Attribution time during incident | < 30 minutes |
| Shared resource ambiguity | Reduced to explicit "unknown source" cases only |
Risks and mitigations
| Risk | Impact | Mitigation |
|---|---|---|
| Missing ids in detached execution | Ambiguous attribution | Require parent ids on detach; fail safe |
| Privacy concerns | Compliance risk | No prompt content by default; limit identifiers |
| Token measurement availability varies | Misleading counts | Log measured vs estimated internally |