This document explains the system design, data flow, technical decisions, trade-offs, and scaling strategy for StreamMind AI.
StreamMind AI follows a modular monorepo architecture with clear package boundaries. Each package has a single responsibility and communicates through well-defined TypeScript interfaces.
┌─────────────────────────────────────────────────────────────┐
│ PRESENTATION LAYER │
│ │
│ Chrome Extension (Manifest V3) │
│ React + TypeScript + Vite + Tailwind CSS + shadcn/ui │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌───────────────────┐ │
│ │ SetupView │ │ SearchView │ │ Movie Cards │ │
│ │ (BYOK Flow) │ │ (Query UI) │ │ (Result Display) │ │
│ └─────────────┘ └──────┬──────┘ └───────────────────┘ │
│ │ │
│ ┌───────────────────────┴──────────────────────────────┐ │
│ │ API Client (fetch) — sends API key per-request │ │
│ └───────────────────────┬──────────────────────────────┘ │
└──────────────────────────┼──────────────────────────────────┘
│ HTTPS
┌──────────────────────────┼──────────────────────────────────┐
│ APPLICATION LAYER │
│ │
│ Fastify API Server │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Route Handlers (validate request, call services) │ │
│ │ ├── POST /api/recommend │ │
│ │ └── GET /health │ │
│ └──────────────────────┬───────────────────────────────┘ │
│ │ │
│ ┌──────────────────────┴───────────────────────────────┐ │
│ │ Recommendation Service (business logic coordinator) │ │
│ │ 1. Sanitize input │ │
│ │ 2. Fetch catalog → TMDB │ │
│ │ 3. Orchestrate LLM → OpenAI / Anthropic / Google │ │
│ │ 4. Validate output (Zod + catalog cross-check) │ │
│ │ 5. Enrich results with full movie details │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
┌──────────────────────────┼──────────────────────────────────┐
│ DOMAIN LAYER (Packages) │
│ │
│ ┌─────────────────┐ ┌──────────────────────────────────┐ │
│ │ catalog-core │ │ llm-adapter │ │
│ │ │ │ │ │
│ │ CatalogProvider │ │ LLMProvider (interface) │ │
│ │ TMDBService │ │ LLMOrchestrator │ │
│ │ MovieNormalizer │ │ PromptBuilder │ │
│ │ │ │ ResponseValidator │ │
│ │ │ │ OpenAIAdapter │ │
│ └────────┬────────┘ └──────────────┬───────────────────┘ │
│ │ │ │
│ ┌────────┴──────────────────────────┴───────────────────┐ │
│ │ shared-types │ │
│ │ Zod schemas + TypeScript types + Config schemas │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
1. User types: "Mind-bending sci-fi like Inception"
2. Extension → POST /api/recommend
{
query: "Mind-bending sci-fi like Inception",
apiKey: "sk-...", // Passed per-request, NEVER stored
provider: "openai",
model: "gpt-4o-mini",
maxResults: 5
}
3. Backend → InputSanitizer
- Trim whitespace
- Check length limits
- Detect prompt injection patterns
- Remove control characters
4. Backend → CatalogProvider (TMDB)
- Fetch top 100 popular movies
- Normalize to internal Movie format
- Extract MovieReference[] (id + title)
5. Backend → LLMOrchestrator
a. PromptBuilder creates system + user prompts
- System prompt constrains to JSON + catalog only
- User prompt includes query + full catalog list
b. OpenAIAdapter sends to OpenAI API
c. ResponseValidator:
- Parse JSON from LLM output
- Validate against Zod schema
- Cross-check movie IDs against catalog
- Detect hallucinations
d. If invalid → retry once → fail gracefully
6. Backend → Enrichment
- Fetch full movie details for recommended IDs
- Include poster paths, genres, ratings
7. Response → Extension
{
success: true,
data: {
recommendations: [...],
enrichedMovies: [...],
provider: "openai",
model: "gpt-4o-mini",
processingTimeMs: 2340,
tokensUsed: 1847
}
}
8. Extension renders MovieCards with animated entrance
Decision: Use a pnpm + Turborepo monorepo with strict package boundaries.
Why:
- Each package can be tested and built independently.
- Clear dependency graph (shared-types → catalog-core, llm-adapter → recommendation-core → api).
- Easy to refactor into separate services later (SaaS migration path).
- Single repo for development velocity.
Trade-off: Slightly more complex initial setup vs. long-term maintainability.
Decision: Users provide their own LLM API key. No server-side key storage.
Why:
- Zero infrastructure cost for AI inference.
- User controls their spending.
- No key management liability.
- Open-source friendly — anyone can run it.
Trade-off: Requires user to have an API key. Mitigated by clear setup UX.
Decision: Define an LLMProvider interface. All providers implement it.
Why:
- Adding new providers = adding one adapter class.
- Orchestrator logic is provider-agnostic.
- Easy to test with mock providers.
- SOLID's Dependency Inversion Principle.
Trade-off: Slightly more abstraction overhead vs. direct API calls.
Decision: Every LLM response is validated with Zod AND cross-checked against the catalog.
Why:
- LLMs hallucinate. Period.
- A movie recommendation that doesn't exist is worse than no recommendation.
- This is the #1 differentiator from "just call ChatGPT" projects.
- Demonstrates responsible AI engineering.
Trade-off: Extra processing time + potential retry. Worth it for correctness.
Decision: Chrome Extension (Manifest V3) as the primary interface.
Why:
- Meets streaming users where they are (in the browser).
- Manifest V3 is the current Chrome extension standard.
- Service workers provide lifecycle management.
- Local storage encryption for API keys.
Trade-off: Platform-specific (Chrome initially). Future: Firefox, web app.
| Threat | Mitigation |
|---|---|
| API key theft | Keys stored in chrome.storage.local (encrypted). Never sent to our servers. |
| Prompt injection | InputSanitizer detects injection patterns. Combined with output validation. |
| LLM hallucination | ResponseValidator cross-checks every movie ID against TMDB catalog. |
| API abuse | Rate limiting (configurable per-route). |
| XSS via LLM output | All LLM output is parsed as JSON, never rendered as HTML. |
| Man-in-the-middle | HTTPS enforced for all external API calls. |
1. User enters key in extension popup
2. Key saved to chrome.storage.local (encrypted by Chrome)
3. On each request: key read → sent to backend → forwarded to LLM → discarded
4. Backend NEVER stores, logs, or caches the key
5. User can revoke at any time via the extension settings
Extension → Fastify API → TMDB + OpenAI
Extension → Fastify API → Redis Cache → TMDB
→ OpenAI
Extension/Web → API Gateway → Recommendation Service
→ User Service
→ Billing Service
→ Vector DB (embeddings)
→ Redis (cache + rate limit)
→ PostgreSQL (user data)
The monorepo structure enables this migration path: each package can become an independent deployable service.
| Component | Technology | Why |
|---|---|---|
| Monorepo | pnpm + Turborepo | Fast, efficient, great workspace support |
| Backend | Fastify | 2x faster than Express, built-in validation, TypeScript-first |
| Types | Zod | Runtime validation + TypeScript inference in one |
| Extension | React + Vite + Manifest V3 | Modern DX, fast builds, current Chrome standard |
| Styling | Tailwind + shadcn pattern | Utility-first, composable, dark mode native |
| LLM | OpenAI (initial) | Best JSON mode support, most reliable structured output |
| Catalog | TMDB | Free tier, comprehensive data, well-documented API |
- Vector Database (Pinecone/Weaviate): For semantic movie matching beyond keyword search.
- User Taste Graph: Track preferences over time for personalized recommendations.
- Hybrid Engine: Combine collaborative filtering + LLM reasoning.
- Multi-Region: Deploy API closer to users with edge functions.
- A/B Testing: Compare LLM provider quality for recommendations.
Last updated: February 2026