A production-ready customer message triage system using rules-first, AI-assisted decision making.
AI Provider: Google Gemini 1.5 Flash
Authentication: API Key (Bearer tokens)
Rate Limiting: Multi-tier pricing with usage tracking
# 1. Get Gemini API key from https://makersuite.google.com/app/apikey
export GOOGLE_API_KEY="your_key_here"
# 2. Install and run
pip install -r requirements.txt
uvicorn main:app --reload
# 3. Use demo API key for testing
curl -X POST http://localhost:8000/v1/decision \
-H "Authorization: Bearer sk_test_demo_pro_key_123456789012345678" \
-H "Content-Type: application/json" \
-d '{"message": "I will sue you", "user_plan": "enterprise"}'
# 4. Check usage
curl http://localhost:8000/v1/usage \
-H "Authorization: Bearer sk_test_demo_pro_key_123456789012345678"See PAID_API_GUIDE.md for authentication & billing.
See GEMINI_SETUP.md for AI setup.
| Tier | Price/Month | Requests/Min | Requests/Day | Requests/Month |
|---|---|---|---|---|
| Free | $0 | 10 | 100 | 1,000 |
| Starter | $29 | 60 | 5,000 | 100,000 |
| Professional | $99 | 300 | 50,000 | 1,000,000 |
| Enterprise | $499 | 1,000 | 500,000 | 10,000,000 |
Get pricing details: GET /v1/pricing
Request → Rules Engine → AI Layer → Validation → Response
(critical) (advisory) (safety)
- Rules First: Critical cases (legal threats, spam) handled deterministically
- AI Advisory: Nuanced analysis for edge cases and churn prediction
- Safe Degradation: AI failure → intelligent fallback, never crash
- Confidence Scoring: Every decision includes reliability metric
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Development mode
uvicorn main:app --reload
# Production mode
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4- Local: http://localhost:8000
- Docs: http://localhost:8000/docs (interactive Swagger UI)
- Health: http://localhost:8000/health
Request Body:
{
"message": "I'm extremely disappointed and considering legal action",
"user_plan": "enterprise",
"channel": "email",
"history": ["previous complaint", "unresolved issue"]
}Response:
{
"decision": "immediate_escalation",
"priority": "critical",
"churn_risk": 0.95,
"confidence": 0.92,
"recommended_action": "Escalate to legal team immediately. Customer threatening legal action."
}1. Legal Threat (Rule-Based Escalation)
curl -X POST http://localhost:8000/v1/decision \
-H "Content-Type: application/json" \
-d '{
"message": "This is unacceptable. I will contact my lawyer if not resolved.",
"user_plan": "pro",
"channel": "email"
}'2. Enterprise Customer Issue
curl -X POST http://localhost:8000/v1/decision \
-H "Content-Type: application/json" \
-d '{
"message": "We are considering switching to a competitor. Service has been unreliable.",
"user_plan": "enterprise",
"channel": "email",
"history": ["complaint 1", "complaint 2", "complaint 3"]
}'3. Standard Question
curl -X POST http://localhost:8000/v1/decision \
-H "Content-Type: application/json" \
-d '{
"message": "How do I reset my password?",
"user_plan": "free",
"channel": "chat"
}'4. Spam/Noise
curl -X POST http://localhost:8000/v1/decision \
-H "Content-Type: application/json" \
-d '{
"message": "Click here now!",
"user_plan": "free",
"channel": "social"
}'- ignore: Spam, noise, very low-signal messages
- standard_response: Normal customer inquiries
- priority_response: Paying customers with issues, multiple complaints
- immediate_escalation: Legal threats, critical issues
- low: Routine, no urgency
- medium: Standard customer service queue
- high: Paying customer issues, churn risk
- critical: Legal/compliance, severe customer issues
- Legal Keywords:
lawsuit,lawyer,attorney,sue→ Immediate escalation - Spam Detection: Very short messages, promotional language → Ignore
- Length Check: < 10 characters → Ignore (likely noise)
- Enterprise + Negative: Enterprise customer with threat keywords → Priority minimum
- History Patterns: 5+ previous interactions → Confidence boost
Combines multiple signals:
- ✅ Rule matches (+0.15 to +0.3)
- ✅ Message quality (+0.1 for detailed, -0.1 for vague)
- ✅ Question marks (+0.05)
- ✅ History context (+0.03 per interaction, max +0.15)
- ✅ Enterprise plan (+0.1)
- ✅ AI alignment with churn risk (+0.05 to +0.1)
- ❌ AI failure (-0.2)
If confidence < 0.4:
ignore→standard_responsestandard_response→priority_response- Message flagged for manual review
-
Get API Key
- Go to Google AI Studio
- Create a new API key
- Copy the key
-
Set Environment Variable
# Linux/Mac export GOOGLE_API_KEY="your_api_key_here" # Windows (PowerShell) $env:GOOGLE_API_KEY="your_api_key_here" # Or add to .env file echo "GOOGLE_API_KEY=your_api_key_here" > .env
-
Install Dependencies
pip install -r requirements.txt
Why Flash?
- ⚡ Fast: ~200ms response time
- 💰 Cheap: $0.35 per 1M tokens (input), $1.05 per 1M tokens (output)
- 🎯 Accurate: Good enough for decision logic
- 📊 Long Context: 1M token window (not needed here, but available)
Alternative: Gemini 1.5 Pro
If you need higher accuracy, change in ai_decision.py:
model = genai.GenerativeModel(model_name='gemini-1.5-pro')- More accurate but slower and 3x more expensive
- Use for complex edge cases or if Flash quality isn't sufficient
Edit config.py to tune Gemini behavior:
AI_TEMPERATURE = 0.2 # 0.0-1.0 (lower = more deterministic)
AI_MAX_TOKENS = 500 # Response length limit
AI_TIMEOUT_SECONDS = 10 # API timeoutThe API will gracefully fallback to rule-based decisions if:
GOOGLE_API_KEYnot set- Gemini API is down
- Rate limits hit
- Any other error
Test fallback behavior:
# Don't set API key
unset GOOGLE_API_KEY
# Run API - will use fallback logic
uvicorn main:app --reload- AI Fails: Falls back to rule-based + conservative escalation
- Schema Invalid: AI returns bad JSON → Fallback decision
- Timeout: 10 second limit → Fallback
- Unexpected Error: Emergency escalation + manual review flag
- All errors logged
- All failures trigger safe fallback
- Low confidence scores flag uncertain decisions
# Terminal 1: Start server
uvicorn main:app --reload
# Terminal 2: Test endpoints
bash test_api.sh # Create this with curl commands above- ✅ Legal keyword → Should escalate immediately
- ✅ Spam message → Should ignore
- ✅ Enterprise + negative → Should prioritize
- ✅ Short message → Should ignore
- ✅ Normal question → Standard response
- ❌ Invalid JSON → Should return 422
- ❌ Empty message → Should return 422
Edit config.py to tune behavior:
# Add more legal keywords
LEGAL_KEYWORDS.add("injunction")
# Adjust confidence thresholds
LOW_CONFIDENCE_THRESHOLD = 0.5 # More conservative
# Change AI temperature
AI_TEMPERATURE = 0.1 # Even more deterministic- Replace mock AI with real API client
- Add API key management (env vars, secrets)
- Set up logging/monitoring (Datadog, Sentry)
- Add rate limiting (per user/IP)
- Configure CORS if needed
- Add authentication/authorization
- Set up CI/CD pipeline
- Load test with realistic traffic
- Monitor AI costs and latency
- Set up alerting for AI failures
decision_api/
├── main.py # FastAPI app + endpoint
├── models.py # Pydantic schemas (request/response)
├── rules.py # Rule engine logic
├── ai_decision.py # AI integration layer
├── confidence.py # Confidence scoring
├── config.py # Constants and settings
├── requirements.txt # Python dependencies
└── README.md # This file
- Legal compliance: Can't afford to miss legal threats
- Cost efficiency: Filtering spam before AI saves $$
- Predictability: Rules are debuggable, AI is not
- Decisions need consistency
- We're not doing creative writing
- 0.2-0.3 gives good balance of reasoning + determinism
- Honest about uncertainty
- Enables human-in-the-loop for edge cases
- Improves over time (feedback loop)
- AI is advisory, not critical path
- Network/API failures happen
- Better to escalate than miss important messages
MIT
Questions? Issues? Feedback?
- Check
/docsendpoint for interactive API docs - Review logs for debugging
- Adjust
config.pyfor tuning