Skip to content

Latest commit

 

History

History
275 lines (214 loc) · 12.4 KB

File metadata and controls

275 lines (214 loc) · 12.4 KB

Architecture — StreamMind AI

This document explains the system design, data flow, technical decisions, trade-offs, and scaling strategy for StreamMind AI.


1. High-Level Architecture

StreamMind AI follows a modular monorepo architecture with clear package boundaries. Each package has a single responsibility and communicates through well-defined TypeScript interfaces.

┌─────────────────────────────────────────────────────────────┐
│                     PRESENTATION LAYER                      │
│                                                             │
│  Chrome Extension (Manifest V3)                             │
│  React + TypeScript + Vite + Tailwind CSS + shadcn/ui       │
│                                                             │
│  ┌─────────────┐  ┌─────────────┐  ┌───────────────────┐    │
│  │ SetupView   │  │ SearchView  │  │ Movie Cards       │    │
│  │ (BYOK Flow) │  │ (Query UI)  │  │ (Result Display)  │    │
│  └─────────────┘  └──────┬──────┘  └───────────────────┘    │
│                          │                                  │
│  ┌───────────────────────┴──────────────────────────────┐   │
│  │  API Client (fetch) — sends API key per-request      │   │
│  └───────────────────────┬──────────────────────────────┘   │
└──────────────────────────┼──────────────────────────────────┘
                           │ HTTPS
┌──────────────────────────┼──────────────────────────────────┐
│                     APPLICATION LAYER                       │
│                                                             │
│  Fastify API Server                                         │
│                                                             │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Route Handlers (validate request, call services)    │   │
│  │  ├── POST /api/recommend                             │   │
│  │  └── GET  /health                                    │   │
│  └──────────────────────┬───────────────────────────────┘   │
│                         │                                   │
│  ┌──────────────────────┴───────────────────────────────┐   │
│  │  Recommendation Service (business logic coordinator) │   │
│  │  1. Sanitize input                                   │   │
│  │  2. Fetch catalog → TMDB                             │   │
│  │  3. Orchestrate LLM → OpenAI / Anthropic / Google    │   │
│  │  4. Validate output (Zod + catalog cross-check)      │   │
│  │  5. Enrich results with full movie details           │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                           │
┌──────────────────────────┼──────────────────────────────────┐
│                     DOMAIN LAYER (Packages)                 │
│                                                             │
│  ┌─────────────────┐  ┌──────────────────────────────────┐  │
│  │ catalog-core    │  │ llm-adapter                      │  │
│  │                 │  │                                  │  │
│  │ CatalogProvider │  │ LLMProvider (interface)          │  │
│  │ TMDBService     │  │ LLMOrchestrator                  │  │
│  │ MovieNormalizer │  │ PromptBuilder                    │  │
│  │                 │  │ ResponseValidator                │  │
│  │                 │  │ OpenAIAdapter                    │  │
│  └────────┬────────┘  └──────────────┬───────────────────┘  │
│           │                          │                      │
│  ┌────────┴──────────────────────────┴───────────────────┐  │
│  │ shared-types                                          │  │
│  │ Zod schemas + TypeScript types + Config schemas       │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

2. Data Flow

Recommendation Request Flow

1. User types: "Mind-bending sci-fi like Inception"

2. Extension → POST /api/recommend
   {
     query: "Mind-bending sci-fi like Inception",
     apiKey: "sk-...",          // Passed per-request, NEVER stored
     provider: "openai",
     model: "gpt-4o-mini",
     maxResults: 5
   }

3. Backend → InputSanitizer
   - Trim whitespace
   - Check length limits
   - Detect prompt injection patterns
   - Remove control characters

4. Backend → CatalogProvider (TMDB)
   - Fetch top 100 popular movies
   - Normalize to internal Movie format
   - Extract MovieReference[] (id + title)

5. Backend → LLMOrchestrator
   a. PromptBuilder creates system + user prompts
      - System prompt constrains to JSON + catalog only
      - User prompt includes query + full catalog list
   b. OpenAIAdapter sends to OpenAI API
   c. ResponseValidator:
      - Parse JSON from LLM output
      - Validate against Zod schema
      - Cross-check movie IDs against catalog
      - Detect hallucinations
   d. If invalid → retry once → fail gracefully

6. Backend → Enrichment
   - Fetch full movie details for recommended IDs
   - Include poster paths, genres, ratings

7. Response → Extension
   {
     success: true,
     data: {
       recommendations: [...],
       enrichedMovies: [...],
       provider: "openai",
       model: "gpt-4o-mini",
       processingTimeMs: 2340,
       tokensUsed: 1847
     }
   }

8. Extension renders MovieCards with animated entrance

3. Key Design Decisions

3.1 Monorepo with Package Boundaries

Decision: Use a pnpm + Turborepo monorepo with strict package boundaries.

Why:

  • Each package can be tested and built independently.
  • Clear dependency graph (shared-types → catalog-core, llm-adapter → recommendation-core → api).
  • Easy to refactor into separate services later (SaaS migration path).
  • Single repo for development velocity.

Trade-off: Slightly more complex initial setup vs. long-term maintainability.

3.2 BYOK (Bring Your Own Key)

Decision: Users provide their own LLM API key. No server-side key storage.

Why:

  • Zero infrastructure cost for AI inference.
  • User controls their spending.
  • No key management liability.
  • Open-source friendly — anyone can run it.

Trade-off: Requires user to have an API key. Mitigated by clear setup UX.

3.3 LLM Provider Abstraction (Dependency Inversion)

Decision: Define an LLMProvider interface. All providers implement it.

Why:

  • Adding new providers = adding one adapter class.
  • Orchestrator logic is provider-agnostic.
  • Easy to test with mock providers.
  • SOLID's Dependency Inversion Principle.

Trade-off: Slightly more abstraction overhead vs. direct API calls.

3.4 Strict Output Validation

Decision: Every LLM response is validated with Zod AND cross-checked against the catalog.

Why:

  • LLMs hallucinate. Period.
  • A movie recommendation that doesn't exist is worse than no recommendation.
  • This is the #1 differentiator from "just call ChatGPT" projects.
  • Demonstrates responsible AI engineering.

Trade-off: Extra processing time + potential retry. Worth it for correctness.

3.5 Extension-First Architecture

Decision: Chrome Extension (Manifest V3) as the primary interface.

Why:

  • Meets streaming users where they are (in the browser).
  • Manifest V3 is the current Chrome extension standard.
  • Service workers provide lifecycle management.
  • Local storage encryption for API keys.

Trade-off: Platform-specific (Chrome initially). Future: Firefox, web app.


4. Security Architecture

Threat Model

Threat Mitigation
API key theft Keys stored in chrome.storage.local (encrypted). Never sent to our servers.
Prompt injection InputSanitizer detects injection patterns. Combined with output validation.
LLM hallucination ResponseValidator cross-checks every movie ID against TMDB catalog.
API abuse Rate limiting (configurable per-route).
XSS via LLM output All LLM output is parsed as JSON, never rendered as HTML.
Man-in-the-middle HTTPS enforced for all external API calls.

API Key Lifecycle

1. User enters key in extension popup
2. Key saved to chrome.storage.local (encrypted by Chrome)
3. On each request: key read → sent to backend → forwarded to LLM → discarded
4. Backend NEVER stores, logs, or caches the key
5. User can revoke at any time via the extension settings

5. Scaling Strategy

Current (v0.1) — Single Backend

Extension → Fastify API → TMDB + OpenAI

v0.2 — Add Caching

Extension → Fastify API → Redis Cache → TMDB
                        → OpenAI

v1.0 — SaaS Architecture

Extension/Web → API Gateway → Recommendation Service
                            → User Service
                            → Billing Service
                            → Vector DB (embeddings)
                            → Redis (cache + rate limit)
                            → PostgreSQL (user data)

The monorepo structure enables this migration path: each package can become an independent deployable service.


6. Technology Choices

Component Technology Why
Monorepo pnpm + Turborepo Fast, efficient, great workspace support
Backend Fastify 2x faster than Express, built-in validation, TypeScript-first
Types Zod Runtime validation + TypeScript inference in one
Extension React + Vite + Manifest V3 Modern DX, fast builds, current Chrome standard
Styling Tailwind + shadcn pattern Utility-first, composable, dark mode native
LLM OpenAI (initial) Best JSON mode support, most reliable structured output
Catalog TMDB Free tier, comprehensive data, well-documented API

7. Future Considerations

  • Vector Database (Pinecone/Weaviate): For semantic movie matching beyond keyword search.
  • User Taste Graph: Track preferences over time for personalized recommendations.
  • Hybrid Engine: Combine collaborative filtering + LLM reasoning.
  • Multi-Region: Deploy API closer to users with edge functions.
  • A/B Testing: Compare LLM provider quality for recommendations.

Last updated: February 2026