Skip to content

feat(attribution): request-to-token attribution map with correlation propagation #62

@JustAGhosT

Description

@JustAGhosT

Summary

  • Add request-to-token attribution telemetry so model spend can be traced to request/session/endpoint/workflow/stage.
  • Required correlation fields : request_id, session_id, operation_id, correlation_id, endpoint_name, workflow_name, stage_name, provider, model_name, deployment_name, user_id/actor_id.

Required event shape

Acceptance criteria

  • 100% of LLM calls emit token telemetry with request_id + operation_id.
  • 100% include workflow + stage.
  • Provide request-completion rollup totals (total_tokens, llm_calls).
  • Support KQL joins requests↔token events by operation_Id/request_id.

Why

shared resources + fan-out + retries make billing non-attributable; incidents need defensible causation.

Reference

pvc-costops-analytics PRD 04_request_to_token_attribution_prd.md


Request-to-Token Attribution Map — PRD

Module: Token telemetry and attribution
Marketing name: Request-to-Token Attribution
Priority: P1 — High. Enables defensible incident attribution and cost accountability across shared AI resources.
Status: Draft — depends on per-call token telemetry and correlation propagation.


TL;DR

Implement request/session/workflow/stage correlation and per-call token telemetry so model spend can be attributed back to the request and code path that caused it. This is the bridge between cost dashboards and technical responsibility.

This PRD is tracked for implementation in (ticket to be created there).


Problem statement

Billing systems aggregate model spend by resource, model, meter, and time window. That is insufficient for incident response and accountability when:

  • a model resource is shared across multiple apps
  • a single endpoint fans out into multiple model calls
  • background tasks detach from the originating request
  • streaming/retry logic multiplies calls

Without request-to-token attribution, teams argue from narrative rather than evidence.


Goals and non-goals

Goals

  • Attribute token consumption to: resource → deployment → endpoint → workflow → stage → session → request.
  • Ensure correlation identifiers propagate end-to-end (HTTP request → orchestrator → LLM call → telemetry).
  • Provide an event shape that supports KQL joining and aggregation.
  • Provide "confidence levels" for attribution (high/medium/low) based on telemetry quality.

Non-goals (Phase 1)

  • Storing prompt content in telemetry by default.
  • Building a UI for attribution (log/metrics first; UI later).

Required correlation fields

Minimum required correlation metadata for any request that can trigger an LLM call:

  • (if applicable)
  • (App Insights)
  • (cross-service propagation)
  • / (where appropriate)

Required telemetry event shape

For every model call, capture a record like:


Functional requirements

1) Correlation propagation

  • Ensure identifiers survive through the pipeline:
    • HTTP request id → controller → service → orchestrator → LLM client → telemetry
  • When work detaches (background tasks), explicitly record:
    • and

2) Per-request aggregation

  • Emit a per-request summary at request completion:
    • , , ,

3) Queryability

  • KQL can join to token events by and/or .
  • Provide rollups by:
    • endpoint, workflow, stage, model, deployment, user_id

4) Attribution confidence labels

  • High confidence: per-call token telemetry contains request_id + operation_id + workflow/stage.
  • Medium confidence: request telemetry + workflow reconstruction aligns with billing totals.
  • Low confidence: only aggregate billing and code inference available.

Success metrics

Metric Target
% LLM calls with request_id + operation_id 100%
% LLM calls with workflow + stage 100%
Attribution time during incident < 30 minutes
Shared resource ambiguity Reduced to explicit "unknown source" cases only

Risks and mitigations

Risk Impact Mitigation
Missing ids in detached execution Ambiguous attribution Require parent ids on detach; fail safe
Privacy concerns Compliance risk No prompt content by default; limit identifiers
Token measurement availability varies Misleading counts Log measured vs estimated internally

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions