From 8cb4585a81f2765868e1a36bade42cf3f2e4679b Mon Sep 17 00:00:00 2001
From: JustAGhosT <smit.jurie@gmail.com>
Date: Sun, 15 Mar 2026 03:36:35 +0200
Subject: [PATCH 1/2] docs: update Phase 3 status and acceptance criteria

- Document Phase 3 as future enhancement with multiple implementation options
- Update acceptance criteria with current status
- Update action items with completed/pending items
---
 docs/planning/request_to_token_attribution.md | 47 +++++++++++++++----
 1 file changed, 37 insertions(+), 10 deletions(-)

diff --git a/docs/planning/request_to_token_attribution.md b/docs/planning/request_to_token_attribution.md
index 14645bd..ad77603 100644
--- a/docs/planning/request_to_token_attribution.md
+++ b/docs/planning/request_to_token_attribution.md
@@ -99,10 +99,28 @@ For clients that can only send headers, a future enhancement would add middlewar
 
 This requires a custom LiteLLM wrapper or sidecar (not yet implemented).
 
-### Phase 3: Per-Request Rollup
+### Phase 3: Per-Request Rollup (Future Enhancement)
 
-- Track tokens per request_id in memory or Redis
-- Emit summary event when request completes
+**Status: Not Started**
+
+To provide request-completion rollup totals (total_tokens, llm_calls), we need to aggregate token counts per request_id. This requires:
+
+1. **Option A: Custom LiteLLM Image**
+   - Build a custom LiteLLM image with a callback that tracks token counts per request_id
+   - Emit a summary event when request completes
+   - Most control, but requires image build/deploy pipeline
+
+2. **Option B: OTEL Collector Aggregation**
+   - Configure an OTEL collector to aggregate spans by request_id
+   - Emit rollup events from the collector
+   - Leverages existing OTEL infrastructure
+
+3. **Option C: Downstream Aggregation**
+   - Have pvc-costops-analytics aggregate OTEL spans by request_id
+   - No changes to gateway required
+   - Relies on span duration for "request complete" detection
+
+**Recommendation:** Start with Option C (downstream aggregation) as it requires no changes to the gateway. If latency is an issue, consider Option B.
 
 ## What We Need from Other Repos
 
@@ -193,10 +211,12 @@ _Note: Method B requires additional LiteLLM configuration or middleware._
 
 ## Acceptance Criteria
 
-- 100% of LLM calls emit token telemetry with request_id + operation_id
-- 100% include workflow + stage
-- Provide request-completion rollup totals (total_tokens, llm_calls)
-- Support KQL joins requests↔token events by operation_Id/request_id
+| Criterion                                    | Status     | Notes                                     |
+| -------------------------------------------- | ---------- | ----------------------------------------- |
+| 100% of LLM calls emit token telemetry       | ✅ Done    | Via OTEL callback                         |
+| 100% include workflow + stage                | ⚠️ Partial | Requires upstream to pass metadata        |
+| Support KQL joins by operation_Id/request_id | ✅ Done    | OTEL spans include metadata               |
+| Request-completion rollup totals             | 🔜 Future  | Requires Phase 3 (downstream aggregation) |
 
 ## Dependencies
 
@@ -206,9 +226,16 @@ _Note: Method B requires additional LiteLLM configuration or middleware._
 
 ## Action Items
 
-1. ai-gateway: Build custom LiteLLM image with token telemetry callback
-2. cognitive-mesh: Ensure correlation headers are passed to gateway
-3. pvc-costops-analytics: Prepare KQL queries for new event shape
+### Completed
+
+1. ✅ ai-gateway: Add OTEL callback for token telemetry (Phase 1)
+2. ✅ ai-gateway: Document correlation ID requirements (Phase 2)
+
+### Pending
+
+3. cognitive-mesh: Pass correlation IDs in request metadata
+4. pvc-costops-analytics: Create KQL queries for OTEL span joins
+5. pvc-costops-analytics: Implement request rollup aggregation (Phase 3)
 
 ---
 

From 78e1cec9abeeea499018b11e81303f41a9b11309 Mon Sep 17 00:00:00 2001
From: JustAGhosT <smit.jurie@gmail.com>
Date: Sun, 15 Mar 2026 03:52:49 +0200
Subject: [PATCH 2/2] docs: add OTEL endpoint requirement, PII note, and infra
 ownership recommendation

- Add note about OTLP collector endpoint requirement for Phase 1
- Add privacy note about user_id/actor_id (consider hashing/pseudonymizing)
- Update Phase 3 recommendation to prefer Option C (downstream aggregation)
- Update Terraform integration notes per team recommendation: ai-gateway keeps owning its Terraform
---
 docs/planning/request_to_token_attribution.md | 32 ++++++++-----------
 1 file changed, 14 insertions(+), 18 deletions(-)

diff --git a/docs/planning/request_to_token_attribution.md b/docs/planning/request_to_token_attribution.md
index ad77603..34a282c 100644
--- a/docs/planning/request_to_token_attribution.md
+++ b/docs/planning/request_to_token_attribution.md
@@ -47,6 +47,8 @@ Instead of a custom callback (which requires a custom LiteLLM image), we're usin
   - `otel_exporter_endpoint` - OTLP collector URL
   - `otel_service_name` - custom service name
 
+> **Note:** Phase 1 requires an OTLP collector endpoint to be configured. This can be a dedicated collector app, or you can send directly to a backend that supports OTLP (e.g., Application Insights, Grafana Tempo).
+
 **How It Works:**
 
 LiteLLM's OTEL callback automatically emits spans with:
@@ -80,11 +82,13 @@ Pass correlation IDs in the request body `metadata` field:
     "workflow": "manual_orchestration",
     "stage": "writer",
     "endpoint": "/api/manual-orchestration/sessions/start",
-    "user_id": "user_abc"
+    "user_id": "user_abc" // Consider hashing/pseudonymizing for privacy
   }
 }
 ```
 
+> **Note:** `user_id` / `actor_id` can become PII. Consider hashing or using pseudonymous identifiers.
+
 LiteLLM automatically passes `metadata` through to OTEL spans, making these fields available in traces.
 
 **Method B: Via HTTP Headers (Future Enhancement)**
@@ -99,28 +103,20 @@ For clients that can only send headers, a future enhancement would add middlewar
 
 This requires a custom LiteLLM wrapper or sidecar (not yet implemented).
 
-### Phase 3: Per-Request Rollup (Future Enhancement)
+### Phase 3: Per-Request Rollup
 
 **Status: Not Started**
 
-To provide request-completion rollup totals (total_tokens, llm_calls), we need to aggregate token counts per request_id. This requires:
+Provide request-completion rollup totals (total_tokens, llm_calls) by aggregating token counts per request_id.
 
-1. **Option A: Custom LiteLLM Image**
-   - Build a custom LiteLLM image with a callback that tracks token counts per request_id
-   - Emit a summary event when request completes
-   - Most control, but requires image build/deploy pipeline
+**Recommendation: Option C (Downstream Aggregation)**
 
-2. **Option B: OTEL Collector Aggregation**
-   - Configure an OTEL collector to aggregate spans by request_id
-   - Emit rollup events from the collector
-   - Leverages existing OTEL infrastructure
+Start with downstream aggregation in pvc-costops-analytics - the cheapest and fastest approach. Roll up tokens by request_id/operation_id from OTEL spans without changing the gateway.
 
-3. **Option C: Downstream Aggregation**
-   - Have pvc-costops-analytics aggregate OTEL spans by request_id
-   - No changes to gateway required
-   - Relies on span duration for "request complete" detection
+**When to consider alternatives:**
 
-**Recommendation:** Start with Option C (downstream aggregation) as it requires no changes to the gateway. If latency is an issue, consider Option B.
+- **Option B (Collector aggregation)**: Only if you need near-real-time rollups emitted as first-class events/metrics
+- **Option A (Custom LiteLLM image)**: Only if LiteLLM's built-in OTEL data is incomplete or you need strict "request complete" semantics that can't be reliably inferred downstream
 
 ## What We Need from Other Repos
 
@@ -250,9 +246,9 @@ _Note: Method B requires additional LiteLLM configuration or middleware._
 
 ### Terraform Integration Notes
 
-When integrating with shared-infra (Mystira):
+**Recommendation:** ai-gateway keeps owning its Terraform in its repo. Mystira workspace treats ai-gateway as an external product that consumes shared-infra via an explicit "shared resource contract".
 
-1. **Module location**: Add product under `infra/terraform/products/ai-gateway/`
+1. **Module location**: ai-gateway owns its Terraform in this repo
 2. **Shared resources to consume**:
    - Log Analytics workspace from shared outputs
    - Key Vault for secrets (use managed identity to read)