adititakale01 · 0xsaltylollipop · Jan 16, 2026 · Jan 16, 2026 · Jan 17, 2026 · Jan 17, 2026
diff --git a/hackathon 2/INTEGRATION_GUIDE.md b/hackathon 2/INTEGRATION_GUIDE.md
diff --git a/hackathon 2/docs/plans/2026-01-16-extraction-design.md b/hackathon 2/docs/plans/2026-01-16-extraction-design.md
@@ -0,0 +1,173 @@
+# Freight Agent - Step 1+2: Extraction Design
+
+**Date:** 2026-01-16
+**Status:** Ready to implement
+
+---
+
+## Overview
+
+Build a GPT-powered extraction step that reads freight quote request emails and outputs structured data.
+
+**Approach:** Use OpenAI GPT to parse emails into a structured schema. Keep extracted data raw (no normalization) - fuzzy matching happens in later steps.
+
+---
+
+## Input
+
+Raw email JSON from `hackathon_data/emails/`:
+
+```json
+{
+    "from": "sarah.chen@globalimports.com",
+    "to": "quotes@freightco.com",
+    "subject": "Quote Request: Shanghai to Rotterdam",
+    "body": "Hi,\n\nWe need a quote for:\n\nOrigin: Shanghai\nDestination: Rotterdam\nContainer: 2 x 40ft\nCommodity: Electronics\n\nPlease send your best rate.\n\nThanks,\nSarah"
+}
+```
+
+---
+
+## Output Schema
+
+```python
+{
+    "sender_email": str,           # From email "from" field - needed for SOP lookup
+
+    "shipments": [                 # Array - emails can have multiple routes (e.g., email_06)
+        {
+            "mode": "sea" | "air" | null,      # Inferred from context
+
+            # Location (raw - no normalization yet)
+            "origin_raw": str | null,           # "HCMC (Saigon)", "ningbo", etc.
+            "destination_raw": str | null,      # "Tokyo Narita", "felixstowe", etc.
+
+            # Sea freight specific
+            "container_size_ft": 20 | 40 | null,
+            "quantity": int | null,             # Number of containers
+
+            # Air freight specific
+            "actual_weight_kg": float | null,
+            "volume_cbm": float | null,
+
+            # Optional
+            "commodity": str | null
+        }
+    ],
+
+    "missing_fields": list[str],   # ["origin city", "container size", "mode"]
+    "needs_clarification": bool    # True if we can't quote without more info
+}
+```
+
+---
+
+## Mode Detection Logic
+
+GPT should infer mode from these signals:
+
+| Signal | Mode |
+|--------|------|
+| "container", "20ft", "40ft", "FCL" | Sea |
+| "kg", "weight", "CBM", "volume" | Air |
+| "ocean", "sea freight" | Sea |
+| "air", "air freight", "cargo" | Air |
+| Airport codes (SFO, FRA, NRT) | Air |
+| Port names only | Sea |
+
+If unclear → set `mode: null` and add to `missing_fields`.
+
+---
+
+## Multi-Route Handling
+
+Email 06 example has multiple routes in one request:
+```
+Rates from Busan, South Korea to:
+1. Hamburg - 2 x 40ft
+2. Rotterdam - 1 x 20ft
+```
+
+GPT must return multiple shipment objects in the `shipments` array.
+
+---
+
+## Missing Information Detection
+
+If any of these are missing, add to `missing_fields`:
+
+**Sea freight requires:**
+- origin (specific city/port, not just "China")
+- destination (specific city/port)
+- container_size_ft (20 or 40)
+- quantity
+
+**Air freight requires:**
+- origin
+- destination
+- actual_weight_kg
+- volume_cbm
+
+**Email 03 example** ("ship from China to Poland"):
+```python
+{
+    "sender_email": "anna.kowalski@eurotrade.pl",
+    "shipments": [{
+        "mode": null,
+        "origin_raw": "China",       # Too vague!
+        "destination_raw": "Poland", # Too vague!
+        ...
+    }],
+    "missing_fields": ["origin city", "destination city", "mode", "container size", "quantity"],
+    "needs_clarification": true
+}
+```
+
+---
+
+## Implementation Plan
+
+1. **models.py** - Define dataclasses: `Email`, `Shipment`, `ExtractionResult`
+2. **extraction.py** - GPT extraction function:
+   - Load email JSON
+   - Build prompt with schema
+   - Call OpenAI API with structured output
+   - Parse response into dataclasses
+3. **test_extraction.py** - Test against all 10 emails, compare to expected outputs
+
+---
+
+## GPT Prompt Structure
+
+```
+System: You are a freight quote extraction assistant. Extract shipping request details from emails.
+
+User: Extract shipment details from this email:
+From: {sender}
+Subject: {subject}
+Body: {body}
+
+Return JSON matching this schema: {schema}
+
+Rules:
+- Extract ALL routes if multiple are mentioned
+- Keep location names exactly as written (no normalization)
+- Infer mode from context (container=sea, kg/CBM=air)
+- Set needs_clarification=true if origin/destination are too vague (just country names)
+```
+
+---
+
+## Success Criteria
+
+- [ ] Correctly extracts all 10 hackathon emails
+- [ ] Multi-route email (06) returns multiple shipments
+- [ ] Incomplete email (03) sets `needs_clarification: true`
+- [ ] Fuzzy locations kept raw: "HCMC (Saigon)" not normalized yet
+- [ ] Mode correctly inferred for all emails
+
+---
+
+## Next Step
+
+After extraction is built and tested, move to **Step 3: Customer Identification** (SOP lookup by sender email).
diff --git a/hackathon 2/freight_agent/docs/enrichment_v2_design.md b/hackathon 2/freight_agent/docs/enrichment_v2_design.md
@@ -0,0 +1,207 @@
+# Enrichment v2: Batched + Tool Calling Design
+
+## Overview
+
+Refactored enrichment that:
+1. Batches all Qontext queries (REST API, no GPT cost)
+2. Single GPT call to parse ALL context
+3. Uses tool calling for deterministic validation
+4. GPT handles fuzzy matching (names, locations)
+
+## Architecture
+
+```
+ExtractionResult
+       │
+       ▼
+┌─────────────────────────────────────────────────────────────┐
+│  QONTEXT QUERIES (REST API - no GPT)                       │
+│                                                             │
+│  1. Query: "Customer with domain @{domain}?"                │
+│  2. Query: "Rules for {customer}?"                          │
+│  3. Query: "Surcharges for {destination}?" (for each dest)  │
+│                                                             │
+│  All responses collected as strings                         │
+└─────────────────────────────────────────────────────────────┘
+       │
+       ▼
+┌─────────────────────────────────────────────────────────────┐
+│  GPT CALL #2 (with tool calling)                           │
+│                                                             │
+│  Input:                                                     │
+│    - All Qontext responses (combined)                       │
+│    - Shipment details (from extraction)                     │
+│                                                             │
+│  GPT Tasks:                                                 │
+│    1. Parse customer name from context                      │
+│    2. Parse SOP rules into structured format                │
+│    3. Parse surcharges per destination                      │
+│    4. Normalize locations (HCMC = Saigon = Ho Chi Minh)     │
+│    5. Call validate_shipment tool for each shipment         │
+│                                                             │
+│  Output:                                                    │
+│    - customer_name                                          │
+│    - customer_sop (structured)                              │
+│    - enriched_shipments (with surcharges)                   │
+│    - validation_errors                                      │
+│    - validation_warnings                                    │
+│    - is_valid                                               │
+└─────────────────────────────────────────────────────────────┘
+       │
+       ▼
+EnrichedAndValidatedRequest
+```
+
+## Tool Definition
+
+```python
+VALIDATION_TOOL = {
+    "type": "function",
+    "function": {
+        "name": "validate_shipment",
+        "description": "Check if a shipment passes customer SOP restrictions. Call this for EACH shipment after parsing the SOP rules.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "shipment_index": {
+                    "type": "integer",
+                    "description": "Index of the shipment (0-based)"
+                },
+                "shipment_mode": {
+                    "type": "string",
+                    "enum": ["sea", "air"],
+                    "description": "The shipping mode requested"
+                },
+                "normalized_origin": {
+                    "type": "string",
+                    "description": "Origin normalized to standard name (e.g., 'HCMC' not 'Saigon', 'Ho Chi Minh City')"
+                },
+                "mode_restriction": {
+                    "type": ["string", "null"],
+                    "description": "Customer's mode restriction from SOP, or null if none"
+                },
+                "origin_restriction": {
+                    "type": ["string", "null"],
+                    "description": "Customer's origin restriction from SOP (normalized), or null if none"
+                },
+                "customer_name": {
+                    "type": "string",
+                    "description": "Customer name for error messages"
+                }
+            },
+            "required": ["shipment_index", "shipment_mode", "normalized_origin", "mode_restriction", "origin_restriction", "customer_name"]
+        }
+    }
+}
+```
+
+## Tool Implementation
+
+```python
+def validate_shipment(
+    shipment_index: int,
+    shipment_mode: str,
+    normalized_origin: str,
+    mode_restriction: str | None,
+    origin_restriction: str | None,
+    customer_name: str
+) -> dict:
+    """
+    Deterministic validation - no fuzzy logic, just exact checks.
+    GPT already normalized the values before calling.
+    """
+    errors = []
+
+    # Check mode restriction
+    if mode_restriction and shipment_mode != mode_restriction:
+        errors.append({
+            "error_type": "mode_restriction",
+            "message": f"Per your account agreement, {customer_name} is set up for {mode_restriction} freight only.",
+            "suggestion": f"Would you like a {mode_restriction} freight quote instead?"
+        })
+
+    # Check origin restriction
+    if origin_restriction and normalized_origin.upper() != origin_restriction.upper():
+        errors.append({
+            "error_type": "origin_restriction",
+            "message": f"Per your account agreement, {customer_name} shipments must originate from {origin_restriction}.",
+            "suggestion": f"Would you like a quote from {origin_restriction} instead?"
+        })
+
+    return {
+        "shipment_index": shipment_index,
+        "is_valid": len(errors) == 0,
+        "errors": errors
+    }
+```
+
+## GPT System Prompt
+
+```
+You are parsing freight customer data from a knowledge graph and validating shipments.
+
+TASKS:
+1. Parse the customer name from the context
+2. Parse the SOP rules (discounts, margins, restrictions, output requirements)
+3. Parse any surcharges that apply to destinations
+4. For each shipment, normalize the origin location to a standard name:
+   - "Ho Chi Minh City", "Saigon", "SGN" → "HCMC"
+   - "Shanghai", "Pudong" → "Shanghai"
+   - etc.
+5. Call the validate_shipment tool for EACH shipment to check restrictions
+
+IMPORTANT:
+- Normalize locations BEFORE calling the validation tool
+- The tool does exact string matching, so normalization is critical
+- Call the tool once per shipment
+```
+
+## Output Schema
+
+```python
+@dataclass(frozen=True)
+class EnrichedAndValidatedRequest:
+    """Combined enrichment + validation result."""
+    sender_email: str
+    customer_name: str
+    customer_sop: CustomerSOP
+    shipments: tuple[EnrichedShipment, ...]
+
+    # Validation results
+    is_valid: bool
+    validation_errors: tuple[ValidationError, ...] = ()
+    validation_warnings: tuple[ValidationWarning, ...] = ()
+
+    # Carried forward
+    missing_fields: tuple[str, ...] = ()
+    needs_clarification: bool = False
+```
+
+## Benefits
+
+| Aspect | Before (3+ calls) | After (1 call + tools) |
+|--------|-------------------|------------------------|
+| GPT calls | 3+ | 1 |
+| Location matching | Hardcoded | GPT (flexible) |
+| Validation logic | GPT (might err) | Tool (deterministic) |
+| Error messages | GPT (might vary) | Tool (consistent) |
+
+## Flow Summary
+
+```
+Extraction (GPT #1)
+       │
+       ▼
+Qontext queries (REST, free)
+       │
+       ▼
+Enrichment + Validation (GPT #2 with tools)
+       │
+       ├─► GPT parses context
+       ├─► GPT normalizes locations
+       ├─► GPT calls validate_shipment tool (per shipment)
+       └─► GPT compiles final result
+       │
+       ▼
+EnrichedAndValidatedRequest
+```