Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
df8b739
Add HITL node and LangGraph flow integration
Nov 21, 2025
ad6b5d4
feat(langsmith): add opt-in langsmith_monitor, instrument generator &…
lillian0624 Nov 22, 2025
235be79
feat(langsmith): add opt-in langsmith_monitor, instrument generator &…
lillian0624 Nov 22, 2025
28034b7
feat(langsmith): instrument segmenter, safety_gate, analytics; add op…
lillian0624 Nov 22, 2025
7da34aa
test(langsmith): add tests for opt-in monitor and segmenter instrumen…
lillian0624 Nov 22, 2025
f9bdc3f
deterministic pseudonymization
lillian0624 Nov 23, 2025
3bed10b
Update backend/run_swagger_light.py
lillian0624 Nov 23, 2025
eb32d83
Merge branch 'develop' into 21-integrate-langsmith-or-log-to-monitor-…
lillian0624 Nov 23, 2025
900c79d
chore: fix absolute imports, make langchain/FAISS optional, and ignor…
lillian0624 Nov 23, 2025
56fc5b1
Update backend/README_SWAGGER.md
lillian0624 Nov 23, 2025
318ce23
Update backend/README_SWAGGER.md
lillian0624 Nov 23, 2025
eeece9f
Update backend/app/routers/orchestrator.py
lillian0624 Nov 23, 2025
3ee43e3
Update backend/app/graph/orchestrator.py
lillian0624 Nov 23, 2025
0adb36e
Update backend/agents/safety_gate.py
lillian0624 Nov 23, 2025
712901e
Update backend/services/langsmith_monitor.py
lillian0624 Nov 23, 2025
ef713fb
fix(tests): ensure repo root in sys.path; orchestrator: invoke LangGr…
lillian0624 Nov 23, 2025
7c7e0e9
Merge pull request #68 from EchoVoice-AI/21-integrate-langsmith-or-lo…
lillian0624 Nov 23, 2025
65a3c68
feat(assignment): add A/B assignment agent, node adapter, and wire in…
lillian0624 Nov 23, 2025
472f06e
Merge pull request #78 from EchoVoice-AI/assignment-agent
lillian0624 Nov 23, 2025
7ae5a35
Add media endpoints and HITL review store with tests
Nov 23, 2025
12b956f
Merge pull request #84 from EchoVoice-AI/feature/media-endpoints
selvicim45 Nov 23, 2025
9eb770a
Add HITL audit logging and audit log tests
Nov 24, 2025
e15f993
Merge pull request #85 from EchoVoice-AI/feature/media-endpoints
selvicim45 Nov 24, 2025
69b125e
WIP: local changes
SushmaGandham Nov 24, 2025
510de4f
Issue #11: Add debug endpoint to run pipeline and return email previe…
SushmaGandham Nov 24, 2025
cdf0a16
Merge branch 'develop' into 11-add-debugdeliveries-endpoint
NoelOsiro Nov 24, 2025
8e80826
Merge pull request #87 from EchoVoice-AI/11-add-debugdeliveries-endpoint
NoelOsiro Nov 24, 2025
641dfbe
Issue #10: Add POST /debug/run endpoint for full pipeline debugging
SushmaGandham Nov 25, 2025
daeed2d
Implement Azure STT/TTS with fallback and update env template
Nov 25, 2025
c0e9619
Merge pull request #89 from EchoVoice-AI/feature/media-endpoints
selvicim45 Nov 25, 2025
0982d93
Add HITL review router to expose review data and accept human decisions
Nov 25, 2025
bd8b747
Merge pull request #88 from EchoVoice-AI/10-add-debugrun-endpoint
NoelOsiro Nov 26, 2025
3f10093
Merge pull request #90 from EchoVoice-AI/feature/media-endpoints
NoelOsiro Nov 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .env.template
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,12 @@ AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT =
# NEW: chat deployment name for generator
AZURE_OPENAI_CHAT_DEPLOYMENT =

#for Azure Speech and Translator
AZURE_SPEECH_KEY=your-speech-key
AZURE_SPEECH_REGION=westus
AZURE_SPEECH_TTS_VOICE=en-US-JennyNeural
AZURE_TRANSLATOR_KEY=your-translator-key

# Optional: non-Azure OpenAI fallback
OPENAI_API_KEY =
OPENAI_MODEL_NAME =
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,4 @@ env/
# OS
.DS_Store
Thumbs.db
backend/.langsmith_local_runs/
165 changes: 165 additions & 0 deletions backend/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
# Backend - Debug Endpoints

This document describes the debug endpoints for development and testing of the personalization pipeline.

**Dev-only:** These routers are mounted only when the `ECHO_DEBUG` environment variable is set to `1`, `true`, or `yes`.

---

## 1. GET /debug/deliveries - Email Previews for UI

**Purpose:** Run the orchestrator for a small set of mock customers and return minimal email preview objects (subject and body) for UI preview.

### Query parameters

- `mock` (optional, boolean): When `true`, returns precomputed previews without running the pipeline.

## Response shape

The endpoint returns JSON with the top-level key `previews`, an array of preview objects.
Each preview contains:

- `user_id` (string)
- `email` (string)
- `subject` (string | null)
- `body` (string | null)
- `variant_id` (string | null)
- `blocked` (boolean) — true when no safe variant is available
- `error` (string | null) — set when pipeline execution fails for that user

Example (mock response):

```json
{
"previews": [
{
"user_id": "U001",
"email": "emma@example.com",
"subject": "Hi Emma, quick note about running shoes",
"body": "Hi Emma,\n\nWe thought you might like this: …\n\n— Team",
"variant_id": "A",
"blocked": false,
"error": null
},
{
"user_id": "U002",
"email": "liam@example.com",
"subject": "Liam, more on the Acme plan",
"body": "Hello Liam,\n\nDetails: …\nLearn more on our site.",
"variant_id": "B",
"blocked": false,
"error": null
}
]
}
```

Example (live response with a pipeline error for a user):

```json
{
"previews": [
{
"user_id": "U001",
"email": "emma@example.com",
"subject": "S A",
"body": "B A",
"variant_id": "A",
"blocked": false,
"error": null
},
{
"user_id": "U002",
"email": "liam@example.com",
"subject": null,
"body": null,
"variant_id": null,
"blocked": false,
"error": "pipeline failed"
}
]
}
```

## How to use locally

1. Enable the debug router and start the server (PowerShell):

```powershell
$env:ECHO_DEBUG = '1'
E:/EchoAI/EchoVoice-AI/venv/Scripts/python.exe -m uvicorn backend.app.main:app --reload
```

2. Test GET /debug/deliveries (mock):

```powershell
curl "http://127.0.0.1:8000/debug/deliveries?mock=true"
```

3. Test GET /debug/deliveries (run pipeline):

```powershell
curl "http://127.0.0.1:8000/debug/deliveries"
```

4. Test POST /debug/run (full pipeline debug):

```powershell
$body = @{customer = @{id = "U001"; name = "Emma"; email = "emma@example.com"}} | ConvertTo-Json
Invoke-RestMethod -Method POST -Uri "http://127.0.0.1:8000/debug/run" -ContentType "application/json" -Body $body
```

---

## 2. POST /debug/run - Full Pipeline Debug

**Purpose:** Run the full orchestrator pipeline for a single customer and return the complete MessageState (all intermediate results) for debugging.

### Request body

```json
{
"customer": {
"id": "U001",
"name": "Emma",
"email": "emma@example.com",
"last_event": "viewed_product",
"properties": {
"segment": "high_value"
}
}
}
```

### Response

Returns the full orchestrator result including all pipeline stages:

```json
{
"segment": {"category": "high_value", "confidence": 0.95},
"citations": ["Knowledge article #123", "Brand guideline v2.1"],
"variants": [
{"id": "V1", "subject": "Hi Emma...", "body": "Dear Emma..."},
{"id": "V2", "subject": "Emma, don't miss...", "body": "Hello Emma..."}
],
"safety": {
"safe": [{"id": "V1", "subject": "Hi Emma...", "body": "Dear Emma..."}],
"blocked": [{"id": "V2", "reason": "policy_violation"}]
},
"analysis": {"winner": {"variant_id": "V1", "score": 0.87}},
"delivery": {"status": "sent", "message_id": "msg_abc123"}
}
```

---

## Notes & recommendations

- These endpoints are for development and debugging only. Disable in production by not setting `ECHO_DEBUG`.
- **GET /debug/deliveries**: Use `mock=true` for fast UI iteration. Use without `mock` to test actual pipeline.
- **POST /debug/run**: Inspect complete pipeline execution including all intermediate stages.
- The `body_text` field in `/debug/deliveries` is a compatibility alias for `body`.
- Consider using `ECHO_DEBUG_CACHE_TTL` for caching to speed up UI development.

That's it — let me know if you'd like me to add this snippet into the repo's root `README.md` as well or create a short example component that fetches and renders the previews.
213 changes: 213 additions & 0 deletions backend/README_LANGSMITH.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
LangSmith monitoring (opt-in)
=================================

This project includes a lightweight, opt-in LangSmith instrumentation wrapper at `backend/services/langsmith_monitor.py`.

Purpose
-------
- Provide safe, non-blocking telemetry hooks for agents (generator, retriever, etc.).
- No-op by default so local dev and CI are unaffected.
- When enabled, the wrapper either forwards to the LangSmith SDK (if installed and configured) or writes local JSON run files under `backend/.langsmith_local_runs/`.

How to enable
-------------
1. Set the environment variable `LANGSMITH_ENABLED=1` or `LANGSMITH_API_KEY=<your_key>`.
2. (Optional) Install the LangSmith SDK in your Python environment: `pip install langsmith`.

Behavior
--------
- If `LANGSMITH_ENABLED` is not present, the wrapper functions (`start_run`, `log_event`, `finish_run`) are no-ops.
- If enabled but the SDK is not installed, the wrapper writes JSON files to `backend/.langsmith_local_runs/` for inspection.
- Instrumented agents: `backend/agents/generator.py` and `backend/agents/retriever.py` call the wrapper at start/finish/error points.

Next steps
----------
1. Review the small changes in `backend/services/langsmith_monitor.py` and the agent instrumentation.
2. Run a smoke test locally (no secrets required):

```bash
# from repo root
backend/.venv/bin/python -c "import sys; sys.path.insert(0,'backend'); from services.langsmith_monitor import LANGSMITH_ENABLED; print('LANGSMITH_ENABLED=', LANGSMITH_ENABLED)"
```

3. To fully integrate with LangSmith UI, set `LANGSMITH_API_KEY` and install the SDK. We can then update `langsmith_monitor.py` to use the SDK client directly.

4. Coordinate with the team on run naming, metadata shape, and whether to prefer a central tracer vs per-agent instrumentation.


Naming & metadata conventions (recommended)
-----------------------------------------
This project recommends the following minimal conventions for recorded runs and events so telemetry is consistent and safe across agents.

- Run name pattern
- Format: <agent>.<operation>[:<brief-context>]
- Examples:
- segmenter.segment_user
- retriever.retrieve_citations:payment_plans
- generator.generate_variants:default_personalization
- safety_gate.safety_check_and_filter
- analytics.evaluate_variants

- Required top-level metadata fields
- run_id: UUID (generated by agent/wrapper)
- run_name: string (matches the Run name pattern)
- agent: string (agent short name, e.g., "segmenter")
- start_time / end_time: ISO 8601 UTC timestamps
- status: "running" | "success" | "error"
- version: code version or commit SHA (optional but recommended)
- tags: list[str] (optional short tags, e.g., ["dev","experiment-42"])

- Input/PII policy (allowlist + pseudonymization)
- Always avoid recording raw PII (email, full name, SSN, phone, address).
- Record a pseudonymized identifier instead:
- customer_id_hash: deterministic HMAC/SHA256 of the internal id, using a team secret (do not commit the secret).
- Safe inputs: last_event, allowlisted properties (explicitly list safe keys in code), cohort labels.
- For any potentially sensitive text, store only a redacted snippet or omit it.

- Outputs to record
- Short structured outputs (e.g., segment label, number of citations, variant count).
- Metrics: latency_ms, token_usage, counts.
- For full text outputs (LLM responses) prefer storing an artifact reference or a redacted snippet — avoid inline PII.

- Event naming
- Use consistent event names: "input_received", "llm_call", "citations_fetched", "variants_generated", "safety_result", "evaluation_done", "error".
- Each event should include a timestamp and a small payload with non-PII fields.

Example run (segmenter)
-----------------------
Given a customer record (from `data/customers.json`):

```json
{
"id": "cust_002",
"name": "Bob",
"email": "bob@example.com",
"last_event": "payment_plans",
"properties": {
"form_started": "yes",
"scheduled": "no",
"attended": "no"
}
}
```

Store only safe fields and a pseudonymized id. Example recorded run:

```json
{
"run_id": "c7f6f3d7-1d2b-4a45-9f09-1e2b3c4d5e6f",
"run_name": "segmenter.segment_user",
"agent": "segmenter",
"start_time": "2025-11-23T08:12:06.123Z",
"end_time": "2025-11-23T08:12:06.234Z",
"status": "success",
"version": "main@b81971d",
"tags": ["dev"],
"inputs": {
"customer_id_hash": "sha256:c2f9...ab12",
"last_event": "payment_plans",
"properties": {"form_started":"yes","scheduled":"no","attended":"no"}
},
"outputs": {"segment":"payment_plans:StartedFormOrFlow","intent_level":"medium","reasons_count":3},
"events": [
{"time":"2025-11-23T08:12:06.130Z","name":"segment_computed","payload":{"segment":"payment_plans:StartedFormOrFlow","intent_level":"medium"}}
]
}
```

Hashing guidance
----------------
- Use a deterministic HMAC or SHA256 with a team secret to produce pseudonymous IDs usable for joins but not reversible. Example: `sha256(team_secret + customer_id)`.
- Store the team secret in a secure secret store / env var and do not commit it.

Env var: LANGSMITH_HMAC_SECRET
--------------------------------
- Purpose: supply a secret used to compute deterministic HMAC-SHA256 pseudonymous IDs for any internal identifiers (e.g., customer ids). When present, the monitor will compute `customer_id_hash` as an HMAC-SHA256 of the raw id using this secret.
- How to set (example):

```bash
export LANGSMITH_HMAC_SECRET="your-team-secret-very-long-and-random"
```

- Example output recorded in run metadata (truncated for readability):

```json
"inputs": {
"customer_id_hash": "sha256:3a1f5b8c9d4e2f7a1b2c3d4e5f67890abcdef1234567890abcdef1234567890",
"last_event": "payment_plans"
}
```

Notes:
- If `LANGSMITH_HMAC_SECRET` is not set the wrapper falls back to a plain SHA256 digest of the id. This still avoids storing raw PII but is less secure for deterministic joins across systems. Prefer setting the HMAC secret in a secure store.
- Keep the secret out of version control and CI logs. Use your environment/secret manager (GitHub Secrets, AWS Parameter Store, Azure Key Vault, etc.).

Team checklist to finalize
-------------------------
- [ ] Approve run_name pattern and list of agents to instrument
- [ ] Approve required metadata fields and allowlisted input keys
- [ ] Decide HMAC/secret location for deterministic hashing
- [ ] Decide retention policy for run artifacts and full-text captures
- [ ] Plan LangSmith SDK wiring once conventions are finalized

---

Paste this section into the PR body or the issue comment to capture the agreed conventions. Once agreed I can wire a deterministic hash into `langsmith_monitor.start_run` and add tests that assert recorded runs follow the schema.

# LangSmith monitor (opt-in)

This folder contains a lightweight, opt-in LangSmith monitoring wrapper and example instrumentation for generator and retriever agents.

How it works
- The wrapper is `backend/services/langsmith_monitor.py`.
- By default the wrapper is disabled and is a no-op. To enable set one of:
- `LANGSMITH_API_KEY` (preferred)
- `LANGSMITH_ENABLED=1` (for local testing; writes local JSON files)
- When enabled and the `langsmith` SDK is installed, the wrapper can be extended to forward runs to LangSmith.

Local testing
- Without enabling Langsmith, instrumentation will not affect runtime.
- If you want to inspect local runs instead of sending to LangSmith:
```bash
export LANGSMITH_ENABLED=1
# run your agent (from repo root):
cd backend
./venv/bin/python -c "from services import langsmith_monitor; print(langsmith_monitor.LANGSMITH_ENABLED)"
# local run files are written to backend/.langsmith_local_runs/
```

Next steps
- Optionally wire the wrapper to the real `langsmith` SDK when team is ready.
- Decide a naming convention for run names and metadata, and extend the wrapper to include team-specific fields.

Notes
- The wrapper is intentionally minimal to avoid adding runtime risks. It writes local JSON files when enabled and the SDK is not installed.
LangSmith integration (opt-in)
=================================

This folder contains a lightweight, opt-in wrapper to record agent runs locally
or forward to LangSmith when enabled.

How to enable (local testing)
- By default the monitor is disabled. To enable local JSON recording set:

```bash
export LANGSMITH_ENABLED=1
```

This will create run files under `backend/.langsmith_local_runs/` for each
instrumented agent run.

How to enable real LangSmith (team)
- Install the `langsmith` package into your Python environment.
- Set an API key:

```bash
export LANGSMITH_API_KEY=sk_...your_key...
```

Notes
- The wrapper is intentionally minimal and no-op by default to avoid
introducing runtime behavior changes. Once enabled, it records run start,
events, and finish status. The team can later replace or extend the wrapper
to call the official LangSmith SDK or to normalize metadata.
Loading
Loading