From 8cb4585a81f2765868e1a36bade42cf3f2e4679b Mon Sep 17 00:00:00 2001 From: JustAGhosT Date: Sun, 15 Mar 2026 03:36:35 +0200 Subject: [PATCH 1/2] docs: update Phase 3 status and acceptance criteria - Document Phase 3 as future enhancement with multiple implementation options - Update acceptance criteria with current status - Update action items with completed/pending items --- docs/planning/request_to_token_attribution.md | 47 +++++++++++++++---- 1 file changed, 37 insertions(+), 10 deletions(-) diff --git a/docs/planning/request_to_token_attribution.md b/docs/planning/request_to_token_attribution.md index 14645bd..ad77603 100644 --- a/docs/planning/request_to_token_attribution.md +++ b/docs/planning/request_to_token_attribution.md @@ -99,10 +99,28 @@ For clients that can only send headers, a future enhancement would add middlewar This requires a custom LiteLLM wrapper or sidecar (not yet implemented). -### Phase 3: Per-Request Rollup +### Phase 3: Per-Request Rollup (Future Enhancement) -- Track tokens per request_id in memory or Redis -- Emit summary event when request completes +**Status: Not Started** + +To provide request-completion rollup totals (total_tokens, llm_calls), we need to aggregate token counts per request_id. This requires: + +1. **Option A: Custom LiteLLM Image** + - Build a custom LiteLLM image with a callback that tracks token counts per request_id + - Emit a summary event when request completes + - Most control, but requires image build/deploy pipeline + +2. **Option B: OTEL Collector Aggregation** + - Configure an OTEL collector to aggregate spans by request_id + - Emit rollup events from the collector + - Leverages existing OTEL infrastructure + +3. **Option C: Downstream Aggregation** + - Have pvc-costops-analytics aggregate OTEL spans by request_id + - No changes to gateway required + - Relies on span duration for "request complete" detection + +**Recommendation:** Start with Option C (downstream aggregation) as it requires no changes to the gateway. If latency is an issue, consider Option B. ## What We Need from Other Repos @@ -193,10 +211,12 @@ _Note: Method B requires additional LiteLLM configuration or middleware._ ## Acceptance Criteria -- 100% of LLM calls emit token telemetry with request_id + operation_id -- 100% include workflow + stage -- Provide request-completion rollup totals (total_tokens, llm_calls) -- Support KQL joins requests↔token events by operation_Id/request_id +| Criterion | Status | Notes | +| -------------------------------------------- | ---------- | ----------------------------------------- | +| 100% of LLM calls emit token telemetry | ✅ Done | Via OTEL callback | +| 100% include workflow + stage | ⚠️ Partial | Requires upstream to pass metadata | +| Support KQL joins by operation_Id/request_id | ✅ Done | OTEL spans include metadata | +| Request-completion rollup totals | 🔜 Future | Requires Phase 3 (downstream aggregation) | ## Dependencies @@ -206,9 +226,16 @@ _Note: Method B requires additional LiteLLM configuration or middleware._ ## Action Items -1. ai-gateway: Build custom LiteLLM image with token telemetry callback -2. cognitive-mesh: Ensure correlation headers are passed to gateway -3. pvc-costops-analytics: Prepare KQL queries for new event shape +### Completed + +1. ✅ ai-gateway: Add OTEL callback for token telemetry (Phase 1) +2. ✅ ai-gateway: Document correlation ID requirements (Phase 2) + +### Pending + +3. cognitive-mesh: Pass correlation IDs in request metadata +4. pvc-costops-analytics: Create KQL queries for OTEL span joins +5. pvc-costops-analytics: Implement request rollup aggregation (Phase 3) --- From 78e1cec9abeeea499018b11e81303f41a9b11309 Mon Sep 17 00:00:00 2001 From: JustAGhosT Date: Sun, 15 Mar 2026 03:52:49 +0200 Subject: [PATCH 2/2] docs: add OTEL endpoint requirement, PII note, and infra ownership recommendation - Add note about OTLP collector endpoint requirement for Phase 1 - Add privacy note about user_id/actor_id (consider hashing/pseudonymizing) - Update Phase 3 recommendation to prefer Option C (downstream aggregation) - Update Terraform integration notes per team recommendation: ai-gateway keeps owning its Terraform --- docs/planning/request_to_token_attribution.md | 32 ++++++++----------- 1 file changed, 14 insertions(+), 18 deletions(-) diff --git a/docs/planning/request_to_token_attribution.md b/docs/planning/request_to_token_attribution.md index ad77603..34a282c 100644 --- a/docs/planning/request_to_token_attribution.md +++ b/docs/planning/request_to_token_attribution.md @@ -47,6 +47,8 @@ Instead of a custom callback (which requires a custom LiteLLM image), we're usin - `otel_exporter_endpoint` - OTLP collector URL - `otel_service_name` - custom service name +> **Note:** Phase 1 requires an OTLP collector endpoint to be configured. This can be a dedicated collector app, or you can send directly to a backend that supports OTLP (e.g., Application Insights, Grafana Tempo). + **How It Works:** LiteLLM's OTEL callback automatically emits spans with: @@ -80,11 +82,13 @@ Pass correlation IDs in the request body `metadata` field: "workflow": "manual_orchestration", "stage": "writer", "endpoint": "/api/manual-orchestration/sessions/start", - "user_id": "user_abc" + "user_id": "user_abc" // Consider hashing/pseudonymizing for privacy } } ``` +> **Note:** `user_id` / `actor_id` can become PII. Consider hashing or using pseudonymous identifiers. + LiteLLM automatically passes `metadata` through to OTEL spans, making these fields available in traces. **Method B: Via HTTP Headers (Future Enhancement)** @@ -99,28 +103,20 @@ For clients that can only send headers, a future enhancement would add middlewar This requires a custom LiteLLM wrapper or sidecar (not yet implemented). -### Phase 3: Per-Request Rollup (Future Enhancement) +### Phase 3: Per-Request Rollup **Status: Not Started** -To provide request-completion rollup totals (total_tokens, llm_calls), we need to aggregate token counts per request_id. This requires: +Provide request-completion rollup totals (total_tokens, llm_calls) by aggregating token counts per request_id. -1. **Option A: Custom LiteLLM Image** - - Build a custom LiteLLM image with a callback that tracks token counts per request_id - - Emit a summary event when request completes - - Most control, but requires image build/deploy pipeline +**Recommendation: Option C (Downstream Aggregation)** -2. **Option B: OTEL Collector Aggregation** - - Configure an OTEL collector to aggregate spans by request_id - - Emit rollup events from the collector - - Leverages existing OTEL infrastructure +Start with downstream aggregation in pvc-costops-analytics - the cheapest and fastest approach. Roll up tokens by request_id/operation_id from OTEL spans without changing the gateway. -3. **Option C: Downstream Aggregation** - - Have pvc-costops-analytics aggregate OTEL spans by request_id - - No changes to gateway required - - Relies on span duration for "request complete" detection +**When to consider alternatives:** -**Recommendation:** Start with Option C (downstream aggregation) as it requires no changes to the gateway. If latency is an issue, consider Option B. +- **Option B (Collector aggregation)**: Only if you need near-real-time rollups emitted as first-class events/metrics +- **Option A (Custom LiteLLM image)**: Only if LiteLLM's built-in OTEL data is incomplete or you need strict "request complete" semantics that can't be reliably inferred downstream ## What We Need from Other Repos @@ -250,9 +246,9 @@ _Note: Method B requires additional LiteLLM configuration or middleware._ ### Terraform Integration Notes -When integrating with shared-infra (Mystira): +**Recommendation:** ai-gateway keeps owning its Terraform in its repo. Mystira workspace treats ai-gateway as an external product that consumes shared-infra via an explicit "shared resource contract". -1. **Module location**: Add product under `infra/terraform/products/ai-gateway/` +1. **Module location**: ai-gateway owns its Terraform in this repo 2. **Shared resources to consume**: - Log Analytics workspace from shared outputs - Key Vault for secrets (use managed identity to read)