Architecture

Overview

Reddit AI Curator is an advanced, AI-powered information retrieval system that combines professional Boolean search logic with Large Language Model (LLM) analysis to find high-quality Reddit discussions.

System Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                     Reddit AI Curator                                   │
├─────────────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐   ┌─────────────┐   ┌─────────────────────┐           │
│  │   CLI       │   │   Web       │   │   V2 API            │           │
│  │   Interface │   │   Interface │   │   (JWT Auth)        │           │
│  └──────┬──────┘   └──────┬──────┘   └──────────┬──────────┘           │
│         │                 │                      │                      │
│         └────────────────┼──────────────────────┘                      │
│                          ▼                                              │
│              ┌─────────────────────┐                                    │
│              │   DI Container      │                                    │
│              │   (app/core/)       │                                    │
│              └──────────┬──────────┘                                    │
│                         │                                               │
│         ┌───────────────┼───────────────┐                               │
│         ▼               ▼               ▼                               │
│            ┌────────────┐  ┌────────────┐  ┌─────────────────┐                   │
│  │  LLM       │  │   Search   │  │   Tag Learning  │                   │
│  │  Providers │  │   Engine   │  │   System        │                   │
│  │(Mistral/   │  │            │  │                 │                   │
│  │ Gemini/    │  │            │  │                 │                   │
│  │  Mock)     │  │            │  │                 │                   │
│  └────────────┘  └─────┬──────┘  └─────────────────┘                   │
│                        │                                                │
│                        ▼                                                │
│         ┌───────────────────────────────┐                               │
│         │      Intent Services          │                               │
│         │ ┌──────────┐   ┌────────────┐ │                               │
│         │ │Clarifier │   │ Intent     │ │                               │
│         │ │          │   │ Matcher    │ │                               │
│         │ └──────────┘   └────────────┘ │                               │
│         └──────────────┬────────────────┘                               │
│                        │                                                │
│         ┌──────────────┼──────────────┐                                │
│         ▼              ▼              ▼                                │
│  ┌────────────┐ ┌────────────┐ ┌─────────────┐                         │
│  │  Reddit    │ │  Query     │ │  AI Score   │                         │
│  │  API       │ │ Tournament │ │  Analyzer   │                         │
│  │  (PRAW)    │ │            │ │             │                         │
│  └────────────┘ └────────────┘ └─────────────┘                         │
└─────────────────────────────────────────────────────────────────────────┘

Dependency Injection Container

Overview

The DI Container (app/core/container.py) manages all service dependencies, providing:

Service registration and resolution
Singleton lifecycle management
Easy testing with MockLLMProvider
Thread-safe access for Flask

Container Structure

app/core/
├── container.py           # Main DI container implementation
└── service_registration.py # Service registration functions

Service Types

Service	Interface	Description
`llm_provider`	`LLMProvider`	LLM interface (Mistral, Gemini, or Mock)
`reddit_engine`	`RedditSearchEngine`	Reddit API client via PRAW
`search_engine`	`SearchEngine`	Main search orchestration

Usage

from app.core.container import container

# Get services (auto-initialized on first access)
llm = container.llm_provider
search_engine = container.search_engine

# Use in tests
container.register_mock_llm_provider()

JWT Authentication

Overview

The V2 API uses JWT (JSON Web Tokens) for authentication:

┌────────────────────────────────────────┐
│           JWT Flow                     │
├────────────────────────────────────────┤
│  1. Client POST /api/v2/auth/token     │
│     with username/password             │
│                                        │
│  2. Server validates credentials       │
│     and returns JWT token              │
│                                        │
│  3. Client includes token in header:   │
│     Authorization: Bearer <token>      │
│                                        │
│  4. Server validates token on each     │
│     protected request                  │
└────────────────────────────────────────┘

Token Configuration

Variable	Description	Default
`JWT_SECRET_KEY`	Secret for signing tokens	Required
`JWT_ALGORITHM`	Signing algorithm	HS256
`JWT_EXPIRATION_HOURS`	Token validity	24

Protected Endpoints

All /api/v2/* endpoints require JWT authentication except:

/api/v2/auth/token - Token generation
/api/v2/health - Health check

Component Details

Core Application (`app.py`)

Entry Point: Handles both CLI and web server modes
Search Engine: Implements multi-query tournament and smart search cascade
Subreddit Discovery: Finds relevant subreddits based on keywords

V2 API Routes (`app/routes_v2.py`)

JWT Authentication: Token generation and validation
Search Endpoint: /api/v2/search - Main search API
Intent Search: /api/v2/search/intent/* - Interactive intent-based search
Query Generation: /api/v2/llm/generate-queries - LLM query variants
Post Scoring: /api/v2/llm/score - AI-powered post scoring

Intent Services (`app/services/`)

intent_clarifier.py: Manages AI-user dialogue and session state
intent_matcher.py: Implements 5-stage scoring algorithm
semantic_query_generator.py: Generates Boolean queries from structured intent
search_intent.py: Data models for intent, criteria, and preferences

DI Container (`app/core/`)

container.py: Main service container with lazy initialization
service_registration.py: Service registration and mock provider setup

LLM Providers (`app/services/`)

llm_base.py: Abstract base class for LLM providers
llm_mistral.py: Mistral AI implementation
llm_gemini.py: Google Gemini implementation
mock_llm_provider.py: Mock provider for testing (zero API calls)

Tag Learning System (`tag_learning.py`)

Extracts semantic tags from high-scoring results
Manages favorites for AI training
Auto-blacklist management for fresh content

Report Generator (`report_generator.py`)

Generates standalone HTML reports
Formats search results with rich metadata

Configuration (`config/`)

Centralized JSON data storage for:
- Favorites
- Learning database
- Query history
- Blacklist

Frontend (`static/`, `templates/`, `frontend-new/`)

Web dashboard for interactive searches
Result browsing and management
Favorites management

Data Flow

User provides search description or keywords
LLM generates query variations (Broad, Specific, Narrative, Jargon)
Query tournament evaluates variations on sample
Smart cascade searches with best query, falling back as needed
Results scored and ranked by AI
Tags extracted and learning system updated
Results presented via CLI or web interface

Technology Stack

Layer	Technology
Language	Python 3.12+
Web Framework	Flask
Reddit API	PRAW
LLM	Mistral AI / Google Gemini
Authentication	PyJWT
Dependency Injection	Custom container (no external DI library)
Frontend	HTML/JS (Flask templates + frontend-new)
Configuration	python-dotenv, JSON

File Structure

reddit/
├── app.py                      # Main application (CLI + Web)
├── app/
│   ├── __init__.py             # Flask app factory
│   ├── core/                   # Core architecture
│   │   ├── container.py        # DI container
│   │   └── service_registration.py  # Service registration
│   ├── routes.py               # Legacy routes (v1)
│   ├── routes_v2.py            # V2 API (JWT authenticated)
│   ├── routes_auth.py          # Authentication routes
│   ├── schemas.py              # Request/response schemas
│   ├── services/               # Business logic
│   │   ├── __init__.py
│   │   ├── llm_base.py         # LLM provider interface
│   │   ├── llm_mistral.py      # Mistral implementation
│   │   ├── llm_gemini.py       # Gemini implementation
│   │   ├── mock_llm_provider.py # Mock for testing
│   │   └── search_engine.py    # Search orchestration
│   └── models.py               # SQLAlchemy models
├── tag_learning.py             # AI learning system
├── report_generator.py         # HTML report generation
├── config/                     # JSON configuration files
├── static/                     # Flask static assets
├── templates/                  # Flask templates
├── frontend-new/               # Alternative frontend
├── tests/                      # Test suite
│   └── integration/
│       └── test_search_flow.py # Zero-API integration tests
├── results/                    # Output directory
└── .env                        # Environment variables

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture

Overview

System Architecture

Dependency Injection Container

Overview

Container Structure

Service Types

Usage

JWT Authentication

Overview

Token Configuration

Protected Endpoints

Component Details

Core Application (`app.py`)

V2 API Routes (`app/routes_v2.py`)

Intent Services (`app/services/`)

DI Container (`app/core/`)

LLM Providers (`app/services/`)

Tag Learning System (`tag_learning.py`)

Report Generator (`report_generator.py`)

Configuration (`config/`)

Frontend (`static/`, `templates/`, `frontend-new/`)

Data Flow

Technology Stack

File Structure

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Architecture

Overview

System Architecture

Dependency Injection Container

Overview

Container Structure

Service Types

Usage

JWT Authentication

Overview

Token Configuration

Protected Endpoints

Component Details

Core Application (app.py)

V2 API Routes (app/routes_v2.py)

Intent Services (app/services/)

DI Container (app/core/)

LLM Providers (app/services/)

Tag Learning System (tag_learning.py)

Report Generator (report_generator.py)

Configuration (config/)

Frontend (static/, templates/, frontend-new/)

Data Flow

Technology Stack

File Structure

Core Application (`app.py`)

V2 API Routes (`app/routes_v2.py`)

Intent Services (`app/services/`)

DI Container (`app/core/`)

LLM Providers (`app/services/`)

Tag Learning System (`tag_learning.py`)

Report Generator (`report_generator.py`)

Configuration (`config/`)

Frontend (`static/`, `templates/`, `frontend-new/`)