Skip to content

Latest commit

 

History

History
238 lines (199 loc) · 12.1 KB

File metadata and controls

238 lines (199 loc) · 12.1 KB

Architecture

Overview

Reddit AI Curator is an advanced, AI-powered information retrieval system that combines professional Boolean search logic with Large Language Model (LLM) analysis to find high-quality Reddit discussions.

System Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                     Reddit AI Curator                                   │
├─────────────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐   ┌─────────────┐   ┌─────────────────────┐           │
│  │   CLI       │   │   Web       │   │   V2 API            │           │
│  │   Interface │   │   Interface │   │   (JWT Auth)        │           │
│  └──────┬──────┘   └──────┬──────┘   └──────────┬──────────┘           │
│         │                 │                      │                      │
│         └────────────────┼──────────────────────┘                      │
│                          ▼                                              │
│              ┌─────────────────────┐                                    │
│              │   DI Container      │                                    │
│              │   (app/core/)       │                                    │
│              └──────────┬──────────┘                                    │
│                         │                                               │
│         ┌───────────────┼───────────────┐                               │
│         ▼               ▼               ▼                               │
│            ┌────────────┐  ┌────────────┐  ┌─────────────────┐                   │
│  │  LLM       │  │   Search   │  │   Tag Learning  │                   │
│  │  Providers │  │   Engine   │  │   System        │                   │
│  │(Mistral/   │  │            │  │                 │                   │
│  │ Gemini/    │  │            │  │                 │                   │
│  │  Mock)     │  │            │  │                 │                   │
│  └────────────┘  └─────┬──────┘  └─────────────────┘                   │
│                        │                                                │
│                        ▼                                                │
│         ┌───────────────────────────────┐                               │
│         │      Intent Services          │                               │
│         │ ┌──────────┐   ┌────────────┐ │                               │
│         │ │Clarifier │   │ Intent     │ │                               │
│         │ │          │   │ Matcher    │ │                               │
│         │ └──────────┘   └────────────┘ │                               │
│         └──────────────┬────────────────┘                               │
│                        │                                                │
│         ┌──────────────┼──────────────┐                                │
│         ▼              ▼              ▼                                │
│  ┌────────────┐ ┌────────────┐ ┌─────────────┐                         │
│  │  Reddit    │ │  Query     │ │  AI Score   │                         │
│  │  API       │ │ Tournament │ │  Analyzer   │                         │
│  │  (PRAW)    │ │            │ │             │                         │
│  └────────────┘ └────────────┘ └─────────────┘                         │
└─────────────────────────────────────────────────────────────────────────┘

Dependency Injection Container

Overview

The DI Container (app/core/container.py) manages all service dependencies, providing:

  • Service registration and resolution
  • Singleton lifecycle management
  • Easy testing with MockLLMProvider
  • Thread-safe access for Flask

Container Structure

app/core/
├── container.py           # Main DI container implementation
└── service_registration.py # Service registration functions

Service Types

Service Interface Description
llm_provider LLMProvider LLM interface (Mistral, Gemini, or Mock)
reddit_engine RedditSearchEngine Reddit API client via PRAW
search_engine SearchEngine Main search orchestration

Usage

from app.core.container import container

# Get services (auto-initialized on first access)
llm = container.llm_provider
search_engine = container.search_engine

# Use in tests
container.register_mock_llm_provider()

JWT Authentication

Overview

The V2 API uses JWT (JSON Web Tokens) for authentication:

┌────────────────────────────────────────┐
│           JWT Flow                     │
├────────────────────────────────────────┤
│  1. Client POST /api/v2/auth/token     │
│     with username/password             │
│                                        │
│  2. Server validates credentials       │
│     and returns JWT token              │
│                                        │
│  3. Client includes token in header:   │
│     Authorization: Bearer <token>      │
│                                        │
│  4. Server validates token on each     │
│     protected request                  │
└────────────────────────────────────────┘

Token Configuration

Variable Description Default
JWT_SECRET_KEY Secret for signing tokens Required
JWT_ALGORITHM Signing algorithm HS256
JWT_EXPIRATION_HOURS Token validity 24

Protected Endpoints

All /api/v2/* endpoints require JWT authentication except:

  • /api/v2/auth/token - Token generation
  • /api/v2/health - Health check

Component Details

Core Application (app.py)

  • Entry Point: Handles both CLI and web server modes
  • Search Engine: Implements multi-query tournament and smart search cascade
  • Subreddit Discovery: Finds relevant subreddits based on keywords

V2 API Routes (app/routes_v2.py)

  • JWT Authentication: Token generation and validation
  • Search Endpoint: /api/v2/search - Main search API
  • Intent Search: /api/v2/search/intent/* - Interactive intent-based search
  • Query Generation: /api/v2/llm/generate-queries - LLM query variants
  • Post Scoring: /api/v2/llm/score - AI-powered post scoring

Intent Services (app/services/)

  • intent_clarifier.py: Manages AI-user dialogue and session state
  • intent_matcher.py: Implements 5-stage scoring algorithm
  • semantic_query_generator.py: Generates Boolean queries from structured intent
  • search_intent.py: Data models for intent, criteria, and preferences

DI Container (app/core/)

  • container.py: Main service container with lazy initialization
  • service_registration.py: Service registration and mock provider setup

LLM Providers (app/services/)

  • llm_base.py: Abstract base class for LLM providers
  • llm_mistral.py: Mistral AI implementation
  • llm_gemini.py: Google Gemini implementation
  • mock_llm_provider.py: Mock provider for testing (zero API calls)

Tag Learning System (tag_learning.py)

  • Extracts semantic tags from high-scoring results
  • Manages favorites for AI training
  • Auto-blacklist management for fresh content

Report Generator (report_generator.py)

  • Generates standalone HTML reports
  • Formats search results with rich metadata

Configuration (config/)

  • Centralized JSON data storage for:
    • Favorites
    • Learning database
    • Query history
    • Blacklist

Frontend (static/, templates/, frontend-new/)

  • Web dashboard for interactive searches
  • Result browsing and management
  • Favorites management

Data Flow

  1. User provides search description or keywords
  2. LLM generates query variations (Broad, Specific, Narrative, Jargon)
  3. Query tournament evaluates variations on sample
  4. Smart cascade searches with best query, falling back as needed
  5. Results scored and ranked by AI
  6. Tags extracted and learning system updated
  7. Results presented via CLI or web interface

Technology Stack

Layer Technology
Language Python 3.12+
Web Framework Flask
Reddit API PRAW
LLM Mistral AI / Google Gemini
Authentication PyJWT
Dependency Injection Custom container (no external DI library)
Frontend HTML/JS (Flask templates + frontend-new)
Configuration python-dotenv, JSON

File Structure

reddit/
├── app.py                      # Main application (CLI + Web)
├── app/
│   ├── __init__.py             # Flask app factory
│   ├── core/                   # Core architecture
│   │   ├── container.py        # DI container
│   │   └── service_registration.py  # Service registration
│   ├── routes.py               # Legacy routes (v1)
│   ├── routes_v2.py            # V2 API (JWT authenticated)
│   ├── routes_auth.py          # Authentication routes
│   ├── schemas.py              # Request/response schemas
│   ├── services/               # Business logic
│   │   ├── __init__.py
│   │   ├── llm_base.py         # LLM provider interface
│   │   ├── llm_mistral.py      # Mistral implementation
│   │   ├── llm_gemini.py       # Gemini implementation
│   │   ├── mock_llm_provider.py # Mock for testing
│   │   └── search_engine.py    # Search orchestration
│   └── models.py               # SQLAlchemy models
├── tag_learning.py             # AI learning system
├── report_generator.py         # HTML report generation
├── config/                     # JSON configuration files
├── static/                     # Flask static assets
├── templates/                  # Flask templates
├── frontend-new/               # Alternative frontend
├── tests/                      # Test suite
│   └── integration/
│       └── test_search_flow.py # Zero-API integration tests
├── results/                    # Output directory
└── .env                        # Environment variables