Tested on Ubuntu, macOS, and Windows with Python 3.11, 3.12, and 3.13.
An educational application designed to demonstrate the implementation of a Chat interface with Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG). Interested in a RAG workshop for your team? Contact info@alteredcraft.com. See past workshop deliveries for examples.
This project uses Flask for the backend, OpenRouter for LLM access (supporting models like GPT-4, Claude 3, Llama 3, etc.), and vanilla JavaScript for a clean, streaming chat interface. The app can start without an API key configured, displaying helpful setup instructions in the UI.
- Python 3.11+
- uv (for package management)
- An OpenRouter API Key
-
Clone the repository
git clone https://github.com/yourusername/chat-rag-explorer.git cd chat-rag-explorer uv sync uv run pytest -
Set up the environment variables π If the app is running, first stop it (Ctrl+C).
cp .env.example .env
Edit
.envand add your API key:OPENROUTER_API_KEY=sk-or-v1-your-key-here
See Logging Configuration for optional logging settings.
-
Run the application
uv run main.py
Port in use? The app auto-finds an available port (8000-8004).
-
Explore Open your browser to http://127.0.0.1:8000.
-
Inspect Request Details: Click "view details" on any message to see exactly what the LLM receivedβmodel, parameters, token counts, timing, and retrieved RAG documents with their source metadata and similarity scores.
-
Real-time Streaming: Server-Sent Events (SSE) to stream LLM responses token-by-token
-
Model Selection: Dynamic model picker with OpenRouter models, filtered to RAG-recommended models via
.models_list -
Conversation History: Multi-turn conversation support with context retention
-
Metrics Sidebar: Real-time session metrics including token usage
-
Markdown Support: Secure rendering using Marked.js and DOMPurify (works offline)
-
Clean UI: Responsive interface built with vanilla HTML/CSS/JS
The app filters available OpenRouter models to those listed in .models_list. This file contains models that perform well in RAG scenarios. To customize the available models, edit .models_list:
# One model ID per line, comments start with #
openai/gpt-4.1-mini
anthropic/claude-sonnet-4
google/gemini-2.0-flash-001
Delete .models_list to show all OpenRouter models (
RAG integration allows the chat to retrieve relevant documents from ChromaDB and inject them as context for the LLM. See docs/RAG.md for detailed documentation.
Quick Start:
- Go to Settings > RAG Settings
- Configure your ChromaDB connection (local, server, or cloud)
- Test connection and select a collection
- Enable RAG toggle in chat sidebar
Sample Data Included: A pre-built ChromaDB with 195 chunks from "The Morn Chronicles" (a Star Trek DS9 fan fiction) is automatically copied to
data/chroma_db/on first startup. Use pathdata/chroma_dbin RAG Settings.
To ingest your own documents for RAG retrieval, see the utils/README.md for CLI tools:
- split.py - Split large markdown files into chapters by heading pattern
- ingest.py - Two-phase workflow: preview chunks β inspect β ingest to ChromaDB
The ingest tool writes human-readable chunk previews to data/chunks/ so you can tune chunking parameters before committing to the vector database.
The sections below provide deeper insight into the application's architecture, testing, logging system, and development roadmap.
chat-rag-explorer/
βββ chat_rag_explorer/ # Main package
β βββ static/ # CSS, JS, and local libraries
β βββ templates/ # HTML templates
β βββ __init__.py # App factory
β βββ logging.py # Centralized logging configuration
β βββ routes.py # Web endpoints
β βββ services.py # LLM integration logic
β βββ rag_config_service.py # ChromaDB connection management
β βββ prompt_service.py # System prompt CRUD operations
β βββ chat_history_service.py # Conversation logging to JSONL
βββ utils/ # CLI utilities for content preparation
β βββ README.md # Utility documentation
β βββ split.py # Split 1 page markdown into chapters
β βββ ingest.py # Ingest markdown into ChromaDB
βββ data/
β βββ corpus/ # Source markdown documents
β βββ chunks/ # Chunk previews for inspection (gitignored)
β βββ chroma_db/ # Working ChromaDB databases (gitignored, auto-created)
β βββ chroma_db_sample/ # Pristine sample DB (copied to chroma_db/ on startup)
βββ prompts/ # System prompt templates (markdown)
βββ logs/ # Application logs (gitignored)
βββ tests/ # Test suite
βββ config.py # Configuration settings (environment variable mapping)
βββ main.py # Application entry point
βββ pyproject.toml # Dependencies and project metadata (uv)
βββ .env.example # Template for environment variables (.env)
βββ .env # Secrets and local overrides (gitignored)
βββ .models_list # RAG-recommended models filter (see Model Selection)
- Modular Architecture: Flask Blueprints and Application Factory pattern
- Centralized Logging: Request ID correlation and configurable log levels
- Modern Python Tooling: Uses
uvfor fast dependency management
The application features a comprehensive logging system for debugging and monitoring.
Set these environment variables in your .env file:
| Variable | Default | Description |
|---|---|---|
LOG_LEVEL_APP |
DEBUG |
Log level for application code |
LOG_LEVEL_DEPS |
INFO |
Log level for dependencies (Flask, httpx, etc.) |
LOG_TO_STDOUT |
true |
Output logs to console |
LOG_TO_FILE |
true |
Write logs to file |
LOG_FILE_PATH |
logs/app.log |
Path to log file |
CHAT_HISTORY_ENABLED |
false |
Enable chat interaction logging |
CHAT_HISTORY_PATH |
logs/chat-history.jsonl |
Path to chat history file |
Startup Banner: On application start, logs configuration summary with masked API key:
============================================================
RAG Lab - Starting up
============================================================
Configuration:
- OpenRouter Base URL: https://openrouter.ai/api/v1
- OpenRouter API Key: sk-or-v1...6a0d
- Default Model: openai/gpt-3.5-turbo
============================================================
Request Correlation: All API requests include a unique request ID for tracing:
[a1b2c3d4] POST /api/chat - Model: openai/gpt-4, Messages: 3, Content length: 150 chars
[a1b2c3d4] Starting chat stream - Model: openai/gpt-4
[a1b2c3d4] Token usage - Prompt: 45, Completion: 120, Total: 165
[a1b2c3d4] POST /api/chat - Stream completed (1.523s, 42 chunks)
Performance Metrics: Timing information for requests, including time-to-first-chunk (TTFC) for streams.
The browser console includes structured logs with session tracking:
[2025-12-26T15:30:00.000Z] [sess_abc123] INFO: Chat request initiated {model: "openai/gpt-4", messageLength: 50}
[2025-12-26T15:30:01.500Z] [sess_abc123] DEBUG: Time to first chunk {ttfc_ms: "823.45"}
[2025-12-26T15:30:02.000Z] [sess_abc123] INFO: Chat response completed {chunks: 42, totalTime_ms: "1523.00"}
Open browser DevTools (F12) -> Console to view frontend logs.
The project uses pytest with randomized test ordering to catch hidden state dependencies.
uv run pytest # Run all tests (randomized order)
uv run pytest -v # Verbose output
uv run pytest -x # Stop on first failure
uv run pytest --cov # Run with coverage report
uv run pytest -k "test_name" # Run specific test by nameUse nox to run tests across Python 3.11, 3.12, and 3.13:
nox # Run on all Python versions
nox -s tests-3.12 # Run on specific version
nox -- -x # Pass args to pytest- Unit tests live in
tests/unit/and must not make network calls - External dependencies (ChromaDB, OpenRouter) are mocked
- Use
tmp_pathfixture for any file operations - Tests run in random order to catch hidden state dependencies
This project uses Release Please for automated versioning and changelog generation.
How it works:
- All commits to
mainmust use Conventional Commits format - Release Please automatically creates/updates a Release PR with version bumps and changelog
- Merge the Release PR when ready to cut a release
- A GitHub Release and git tag are created automatically
Commit format:
| Prefix | Description | Version Bump |
|---|---|---|
feat: |
New feature | Minor (0.1.0 β 0.2.0) |
fix: |
Bug fix | Patch (0.1.0 β 0.1.1) |
feat!: or fix!: |
Breaking change | Major (0.1.0 β 1.0.0) |
docs: |
Documentation | Patch |
chore: |
Maintenance | No Release PR triggered, it'll just be part of the next release |
Examples:
git commit -m "feat: add dark mode toggle"
git commit -m "fix: correct token count in sidebar"
git commit -m "feat!: redesign REST API endpoints"This project is open source and available under the MIT License.

