Reddit AI Curator is an advanced, AI-powered information retrieval system that combines professional Boolean search logic with Large Language Model (LLM) analysis to find high-quality Reddit discussions.
┌─────────────────────────────────────────────────────────────────────────┐
│ Reddit AI Curator │
├─────────────────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ CLI │ │ Web │ │ V2 API │ │
│ │ Interface │ │ Interface │ │ (JWT Auth) │ │
│ └──────┬──────┘ └──────┬──────┘ └──────────┬──────────┘ │
│ │ │ │ │
│ └────────────────┼──────────────────────┘ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ DI Container │ │
│ │ (app/core/) │ │
│ └──────────┬──────────┘ │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ ┌────────────┐ ┌────────────┐ ┌─────────────────┐ │
│ │ LLM │ │ Search │ │ Tag Learning │ │
│ │ Providers │ │ Engine │ │ System │ │
│ │(Mistral/ │ │ │ │ │ │
│ │ Gemini/ │ │ │ │ │ │
│ │ Mock) │ │ │ │ │ │
│ └────────────┘ └─────┬──────┘ └─────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────┐ │
│ │ Intent Services │ │
│ │ ┌──────────┐ ┌────────────┐ │ │
│ │ │Clarifier │ │ Intent │ │ │
│ │ │ │ │ Matcher │ │ │
│ │ └──────────┘ └────────────┘ │ │
│ └──────────────┬────────────────┘ │
│ │ │
│ ┌──────────────┼──────────────┐ │
│ ▼ ▼ ▼ │
│ ┌────────────┐ ┌────────────┐ ┌─────────────┐ │
│ │ Reddit │ │ Query │ │ AI Score │ │
│ │ API │ │ Tournament │ │ Analyzer │ │
│ │ (PRAW) │ │ │ │ │ │
│ └────────────┘ └────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
The DI Container (app/core/container.py) manages all service dependencies, providing:
- Service registration and resolution
- Singleton lifecycle management
- Easy testing with MockLLMProvider
- Thread-safe access for Flask
app/core/
├── container.py # Main DI container implementation
└── service_registration.py # Service registration functions
| Service | Interface | Description |
|---|---|---|
llm_provider |
LLMProvider |
LLM interface (Mistral, Gemini, or Mock) |
reddit_engine |
RedditSearchEngine |
Reddit API client via PRAW |
search_engine |
SearchEngine |
Main search orchestration |
from app.core.container import container
# Get services (auto-initialized on first access)
llm = container.llm_provider
search_engine = container.search_engine
# Use in tests
container.register_mock_llm_provider()The V2 API uses JWT (JSON Web Tokens) for authentication:
┌────────────────────────────────────────┐
│ JWT Flow │
├────────────────────────────────────────┤
│ 1. Client POST /api/v2/auth/token │
│ with username/password │
│ │
│ 2. Server validates credentials │
│ and returns JWT token │
│ │
│ 3. Client includes token in header: │
│ Authorization: Bearer <token> │
│ │
│ 4. Server validates token on each │
│ protected request │
└────────────────────────────────────────┘
| Variable | Description | Default |
|---|---|---|
JWT_SECRET_KEY |
Secret for signing tokens | Required |
JWT_ALGORITHM |
Signing algorithm | HS256 |
JWT_EXPIRATION_HOURS |
Token validity | 24 |
All /api/v2/* endpoints require JWT authentication except:
/api/v2/auth/token- Token generation/api/v2/health- Health check
- Entry Point: Handles both CLI and web server modes
- Search Engine: Implements multi-query tournament and smart search cascade
- Subreddit Discovery: Finds relevant subreddits based on keywords
- JWT Authentication: Token generation and validation
- Search Endpoint:
/api/v2/search- Main search API - Intent Search:
/api/v2/search/intent/*- Interactive intent-based search - Query Generation:
/api/v2/llm/generate-queries- LLM query variants - Post Scoring:
/api/v2/llm/score- AI-powered post scoring
- intent_clarifier.py: Manages AI-user dialogue and session state
- intent_matcher.py: Implements 5-stage scoring algorithm
- semantic_query_generator.py: Generates Boolean queries from structured intent
- search_intent.py: Data models for intent, criteria, and preferences
- container.py: Main service container with lazy initialization
- service_registration.py: Service registration and mock provider setup
- llm_base.py: Abstract base class for LLM providers
- llm_mistral.py: Mistral AI implementation
- llm_gemini.py: Google Gemini implementation
- mock_llm_provider.py: Mock provider for testing (zero API calls)
- Extracts semantic tags from high-scoring results
- Manages favorites for AI training
- Auto-blacklist management for fresh content
- Generates standalone HTML reports
- Formats search results with rich metadata
- Centralized JSON data storage for:
- Favorites
- Learning database
- Query history
- Blacklist
- Web dashboard for interactive searches
- Result browsing and management
- Favorites management
- User provides search description or keywords
- LLM generates query variations (Broad, Specific, Narrative, Jargon)
- Query tournament evaluates variations on sample
- Smart cascade searches with best query, falling back as needed
- Results scored and ranked by AI
- Tags extracted and learning system updated
- Results presented via CLI or web interface
| Layer | Technology |
|---|---|
| Language | Python 3.12+ |
| Web Framework | Flask |
| Reddit API | PRAW |
| LLM | Mistral AI / Google Gemini |
| Authentication | PyJWT |
| Dependency Injection | Custom container (no external DI library) |
| Frontend | HTML/JS (Flask templates + frontend-new) |
| Configuration | python-dotenv, JSON |
reddit/
├── app.py # Main application (CLI + Web)
├── app/
│ ├── __init__.py # Flask app factory
│ ├── core/ # Core architecture
│ │ ├── container.py # DI container
│ │ └── service_registration.py # Service registration
│ ├── routes.py # Legacy routes (v1)
│ ├── routes_v2.py # V2 API (JWT authenticated)
│ ├── routes_auth.py # Authentication routes
│ ├── schemas.py # Request/response schemas
│ ├── services/ # Business logic
│ │ ├── __init__.py
│ │ ├── llm_base.py # LLM provider interface
│ │ ├── llm_mistral.py # Mistral implementation
│ │ ├── llm_gemini.py # Gemini implementation
│ │ ├── mock_llm_provider.py # Mock for testing
│ │ └── search_engine.py # Search orchestration
│ └── models.py # SQLAlchemy models
├── tag_learning.py # AI learning system
├── report_generator.py # HTML report generation
├── config/ # JSON configuration files
├── static/ # Flask static assets
├── templates/ # Flask templates
├── frontend-new/ # Alternative frontend
├── tests/ # Test suite
│ └── integration/
│ └── test_search_flow.py # Zero-API integration tests
├── results/ # Output directory
└── .env # Environment variables