GritCheck · NoelOsiro · Nov 21, 2025 · Nov 22, 2025 · Nov 22, 2025 · Nov 22, 2025
diff --git a/.env.template b/.env.template
@@ -22,6 +22,12 @@ AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT =
 # NEW: chat deployment name for generator
 AZURE_OPENAI_CHAT_DEPLOYMENT = 
 
+#for Azure Speech and Translator
+AZURE_SPEECH_KEY=your-speech-key
+AZURE_SPEECH_REGION=westus
+AZURE_SPEECH_TTS_VOICE=en-US-JennyNeural
+AZURE_TRANSLATOR_KEY=your-translator-key
+
 # Optional: non-Azure OpenAI fallback
 OPENAI_API_KEY = 
 OPENAI_MODEL_NAME = 

diff --git a/.gitignore b/.gitignore
@@ -39,3 +39,4 @@ env/
 # OS
 .DS_Store
 Thumbs.db
+backend/.langsmith_local_runs/
diff --git a/backend/README.md b/backend/README.md
@@ -0,0 +1,165 @@
+# Backend - Debug Endpoints
+
+This document describes the debug endpoints for development and testing of the personalization pipeline.
+
+**Dev-only:** These routers are mounted only when the `ECHO_DEBUG` environment variable is set to `1`, `true`, or `yes`.
+
+---
+
+## 1. GET /debug/deliveries - Email Previews for UI
+
+**Purpose:** Run the orchestrator for a small set of mock customers and return minimal email preview objects (subject and body) for UI preview.
+
+### Query parameters
+
+- `mock` (optional, boolean): When `true`, returns precomputed previews without running the pipeline.
+
+## Response shape
+
+The endpoint returns JSON with the top-level key `previews`, an array of preview objects.
+Each preview contains:
+
+- `user_id` (string)
+- `email` (string)
+- `subject` (string | null)
+- `body` (string | null)
+- `variant_id` (string | null)
+- `blocked` (boolean) — true when no safe variant is available
+- `error` (string | null) — set when pipeline execution fails for that user
+
+Example (mock response):
+
+```json
+{
+  "previews": [
+    {
+      "user_id": "U001",
+      "email": "emma@example.com",
+      "subject": "Hi Emma, quick note about running shoes",
+      "body": "Hi Emma,\n\nWe thought you might like this: …\n\n— Team",
+      "variant_id": "A",
+      "blocked": false,
+      "error": null
+    },
+    {
+      "user_id": "U002",
+      "email": "liam@example.com",
+      "subject": "Liam, more on the Acme plan",
+      "body": "Hello Liam,\n\nDetails: …\nLearn more on our site.",
+      "variant_id": "B",
+      "blocked": false,
+      "error": null
+    }
+  ]
+}
+```
+
+Example (live response with a pipeline error for a user):
+
+```json
+{
+  "previews": [
+    {
+      "user_id": "U001",
+      "email": "emma@example.com",
+      "subject": "S A",
+      "body": "B A",
+      "variant_id": "A",
+      "blocked": false,
+      "error": null
+    },
+    {
+      "user_id": "U002",
+      "email": "liam@example.com",
+      "subject": null,
+      "body": null,
+      "variant_id": null,
+      "blocked": false,
+      "error": "pipeline failed"
+    }
+  ]
+}
+```
+
+## How to use locally
+
+1. Enable the debug router and start the server (PowerShell):
+
+```powershell
+$env:ECHO_DEBUG = '1'
+E:/EchoAI/EchoVoice-AI/venv/Scripts/python.exe -m uvicorn backend.app.main:app --reload
+```
+
+2. Test GET /debug/deliveries (mock):
+
+```powershell
+curl "http://127.0.0.1:8000/debug/deliveries?mock=true"
+```
+
+3. Test GET /debug/deliveries (run pipeline):
+
+```powershell
+curl "http://127.0.0.1:8000/debug/deliveries"
+```
+
+4. Test POST /debug/run (full pipeline debug):
+
+```powershell
+$body = @{customer = @{id = "U001"; name = "Emma"; email = "emma@example.com"}} | ConvertTo-Json
+Invoke-RestMethod -Method POST -Uri "http://127.0.0.1:8000/debug/run" -ContentType "application/json" -Body $body
+```
+
+---
+
+## 2. POST /debug/run - Full Pipeline Debug
+
+**Purpose:** Run the full orchestrator pipeline for a single customer and return the complete MessageState (all intermediate results) for debugging.
+
+### Request body
+
+```json
+{
+  "customer": {
+    "id": "U001",
+    "name": "Emma",
+    "email": "emma@example.com",
+    "last_event": "viewed_product",
+    "properties": {
+      "segment": "high_value"
+    }
+  }
+}
+```
+
+### Response
+
+Returns the full orchestrator result including all pipeline stages:
+
+```json
+{
+  "segment": {"category": "high_value", "confidence": 0.95},
+  "citations": ["Knowledge article #123", "Brand guideline v2.1"],
+  "variants": [
+    {"id": "V1", "subject": "Hi Emma...", "body": "Dear Emma..."},
+    {"id": "V2", "subject": "Emma, don't miss...", "body": "Hello Emma..."}
+  ],
+  "safety": {
+    "safe": [{"id": "V1", "subject": "Hi Emma...", "body": "Dear Emma..."}],
+    "blocked": [{"id": "V2", "reason": "policy_violation"}]
+  },
+  "analysis": {"winner": {"variant_id": "V1", "score": 0.87}},
+  "delivery": {"status": "sent", "message_id": "msg_abc123"}
+}
+```
+
+---
+
+## Notes & recommendations
+
+- These endpoints are for development and debugging only. Disable in production by not setting `ECHO_DEBUG`.
+- **GET /debug/deliveries**: Use `mock=true` for fast UI iteration. Use without `mock` to test actual pipeline.
+- **POST /debug/run**: Inspect complete pipeline execution including all intermediate stages.
+- The `body_text` field in `/debug/deliveries` is a compatibility alias for `body`.
+- Consider using `ECHO_DEBUG_CACHE_TTL` for caching to speed up UI development.
+
+That's it — let me know if you'd like me to add this snippet into the repo's root `README.md` as well or create a short example component that fetches and renders the previews.
diff --git a/backend/README_LANGSMITH.md b/backend/README_LANGSMITH.md
@@ -0,0 +1,213 @@
+LangSmith monitoring (opt-in)
+=================================
+
+This project includes a lightweight, opt-in LangSmith instrumentation wrapper at `backend/services/langsmith_monitor.py`.
+
+Purpose
+-------
+- Provide safe, non-blocking telemetry hooks for agents (generator, retriever, etc.).
+- No-op by default so local dev and CI are unaffected.
+- When enabled, the wrapper either forwards to the LangSmith SDK (if installed and configured) or writes local JSON run files under `backend/.langsmith_local_runs/`.
+
+How to enable
+-------------
+1. Set the environment variable `LANGSMITH_ENABLED=1` or `LANGSMITH_API_KEY=<your_key>`.
+2. (Optional) Install the LangSmith SDK in your Python environment: `pip install langsmith`.
+
+Behavior
+--------
+- If `LANGSMITH_ENABLED` is not present, the wrapper functions (`start_run`, `log_event`, `finish_run`) are no-ops.
+- If enabled but the SDK is not installed, the wrapper writes JSON files to `backend/.langsmith_local_runs/` for inspection.
+- Instrumented agents: `backend/agents/generator.py` and `backend/agents/retriever.py` call the wrapper at start/finish/error points.
+
+Next steps
+----------
+1. Review the small changes in `backend/services/langsmith_monitor.py` and the agent instrumentation.
+2. Run a smoke test locally (no secrets required):
+
+```bash
+# from repo root
+backend/.venv/bin/python -c "import sys; sys.path.insert(0,'backend'); from services.langsmith_monitor import LANGSMITH_ENABLED; print('LANGSMITH_ENABLED=', LANGSMITH_ENABLED)"
+```
+
+3. To fully integrate with LangSmith UI, set `LANGSMITH_API_KEY` and install the SDK. We can then update `langsmith_monitor.py` to use the SDK client directly.
+
+4. Coordinate with the team on run naming, metadata shape, and whether to prefer a central tracer vs per-agent instrumentation.
+
+
+Naming & metadata conventions (recommended)
+-----------------------------------------
+This project recommends the following minimal conventions for recorded runs and events so telemetry is consistent and safe across agents.
+
+- Run name pattern
+  - Format: <agent>.<operation>[:<brief-context>]
+  - Examples:
+    - segmenter.segment_user
+    - retriever.retrieve_citations:payment_plans
+    - generator.generate_variants:default_personalization
+    - safety_gate.safety_check_and_filter
+    - analytics.evaluate_variants
+
+- Required top-level metadata fields
+  - run_id: UUID (generated by agent/wrapper)
+  - run_name: string (matches the Run name pattern)
+  - agent: string (agent short name, e.g., "segmenter")
+  - start_time / end_time: ISO 8601 UTC timestamps
+  - status: "running" | "success" | "error"
+  - version: code version or commit SHA (optional but recommended)
+  - tags: list[str] (optional short tags, e.g., ["dev","experiment-42"]) 
+
+- Input/PII policy (allowlist + pseudonymization)
+  - Always avoid recording raw PII (email, full name, SSN, phone, address).
+  - Record a pseudonymized identifier instead:
+    - customer_id_hash: deterministic HMAC/SHA256 of the internal id, using a team secret (do not commit the secret).
+  - Safe inputs: last_event, allowlisted properties (explicitly list safe keys in code), cohort labels.
+  - For any potentially sensitive text, store only a redacted snippet or omit it.
+
+- Outputs to record
+  - Short structured outputs (e.g., segment label, number of citations, variant count).
+  - Metrics: latency_ms, token_usage, counts.
+  - For full text outputs (LLM responses) prefer storing an artifact reference or a redacted snippet — avoid inline PII.
+
+- Event naming
+  - Use consistent event names: "input_received", "llm_call", "citations_fetched", "variants_generated", "safety_result", "evaluation_done", "error".
+  - Each event should include a timestamp and a small payload with non-PII fields.
+
+Example run (segmenter)
+-----------------------
+Given a customer record (from `data/customers.json`):
+
+```json
+{
+  "id": "cust_002",
+  "name": "Bob",
+  "email": "bob@example.com",
+  "last_event": "payment_plans",
+  "properties": {
+    "form_started": "yes",
+    "scheduled": "no",
+    "attended": "no"
+  }
+}
+```
+
+Store only safe fields and a pseudonymized id. Example recorded run:
+
+```json
+{
+  "run_id": "c7f6f3d7-1d2b-4a45-9f09-1e2b3c4d5e6f",
+  "run_name": "segmenter.segment_user",
+  "agent": "segmenter",
+  "start_time": "2025-11-23T08:12:06.123Z",
+  "end_time": "2025-11-23T08:12:06.234Z",
+  "status": "success",
+  "version": "main@b81971d",
+  "tags": ["dev"],
+  "inputs": {
+    "customer_id_hash": "sha256:c2f9...ab12",
+    "last_event": "payment_plans",
+    "properties": {"form_started":"yes","scheduled":"no","attended":"no"}
+  },
+  "outputs": {"segment":"payment_plans:StartedFormOrFlow","intent_level":"medium","reasons_count":3},
+  "events": [
+    {"time":"2025-11-23T08:12:06.130Z","name":"segment_computed","payload":{"segment":"payment_plans:StartedFormOrFlow","intent_level":"medium"}}
+  ]
+}
+```
+
+Hashing guidance
+----------------
+- Use a deterministic HMAC or SHA256 with a team secret to produce pseudonymous IDs usable for joins but not reversible. Example: `sha256(team_secret + customer_id)`.
+- Store the team secret in a secure secret store / env var and do not commit it.
+
+Env var: LANGSMITH_HMAC_SECRET
+--------------------------------
+- Purpose: supply a secret used to compute deterministic HMAC-SHA256 pseudonymous IDs for any internal identifiers (e.g., customer ids). When present, the monitor will compute `customer_id_hash` as an HMAC-SHA256 of the raw id using this secret.
+- How to set (example):
+
+```bash
+export LANGSMITH_HMAC_SECRET="your-team-secret-very-long-and-random"
+```
+
+- Example output recorded in run metadata (truncated for readability):
+
+```json
+"inputs": {
+  "customer_id_hash": "sha256:3a1f5b8c9d4e2f7a1b2c3d4e5f67890abcdef1234567890abcdef1234567890",
+  "last_event": "payment_plans"
+}
+```
+
+Notes:
+- If `LANGSMITH_HMAC_SECRET` is not set the wrapper falls back to a plain SHA256 digest of the id. This still avoids storing raw PII but is less secure for deterministic joins across systems. Prefer setting the HMAC secret in a secure store.
+- Keep the secret out of version control and CI logs. Use your environment/secret manager (GitHub Secrets, AWS Parameter Store, Azure Key Vault, etc.).
+
+Team checklist to finalize
+-------------------------
+- [ ] Approve run_name pattern and list of agents to instrument
+- [ ] Approve required metadata fields and allowlisted input keys
+- [ ] Decide HMAC/secret location for deterministic hashing
+- [ ] Decide retention policy for run artifacts and full-text captures
+- [ ] Plan LangSmith SDK wiring once conventions are finalized
+
+---
+
+Paste this section into the PR body or the issue comment to capture the agreed conventions. Once agreed I can wire a deterministic hash into `langsmith_monitor.start_run` and add tests that assert recorded runs follow the schema.
+
+# LangSmith monitor (opt-in)
+
+This folder contains a lightweight, opt-in LangSmith monitoring wrapper and example instrumentation for generator and retriever agents.
+
+How it works
+- The wrapper is `backend/services/langsmith_monitor.py`.
+- By default the wrapper is disabled and is a no-op. To enable set one of:
+  - `LANGSMITH_API_KEY` (preferred)
+  - `LANGSMITH_ENABLED=1` (for local testing; writes local JSON files)
+- When enabled and the `langsmith` SDK is installed, the wrapper can be extended to forward runs to LangSmith.
+
+Local testing
+- Without enabling Langsmith, instrumentation will not affect runtime.
+- If you want to inspect local runs instead of sending to LangSmith:
+  ```bash
+  export LANGSMITH_ENABLED=1
+  # run your agent (from repo root):
+  cd backend
+  ./venv/bin/python -c "from services import langsmith_monitor; print(langsmith_monitor.LANGSMITH_ENABLED)"
+  # local run files are written to backend/.langsmith_local_runs/
+  ```
+
+Next steps
+- Optionally wire the wrapper to the real `langsmith` SDK when team is ready.
+- Decide a naming convention for run names and metadata, and extend the wrapper to include team-specific fields.
+
+Notes
+- The wrapper is intentionally minimal to avoid adding runtime risks. It writes local JSON files when enabled and the SDK is not installed.
+LangSmith integration (opt-in)
+=================================
+
+This folder contains a lightweight, opt-in wrapper to record agent runs locally
+or forward to LangSmith when enabled.
+
+How to enable (local testing)
+- By default the monitor is disabled. To enable local JSON recording set:
+
+```bash
+export LANGSMITH_ENABLED=1
+```
+
+This will create run files under `backend/.langsmith_local_runs/` for each
+instrumented agent run.
+
+How to enable real LangSmith (team)
+- Install the `langsmith` package into your Python environment.
+- Set an API key:
+
+```bash
+export LANGSMITH_API_KEY=sk_...your_key...
+```
+
+Notes
+- The wrapper is intentionally minimal and no-op by default to avoid
+  introducing runtime behavior changes. Once enabled, it records run start,
+  events, and finish status. The team can later replace or extend the wrapper
+  to call the official LangSmith SDK or to normalize metadata.