AI-powered Q&A for your organization's documents
Features • Quick Start • Configuration • Deployment • Documentation
Retriever is an AI-powered question-answering system that helps users find information in your organization's policy and procedure documents. Upload your documents, and Retriever uses RAG (Retrieval-Augmented Generation) to provide accurate, sourced answers.
Retriever can be adapted for any organization with documentation that users need to search.
- Natural Language Q&A — Ask questions in plain English and get accurate answers with source citations
- Multi-Document Support — Index multiple markdown and text documents
- Source Citations — Every answer includes clickable citations to the original documents
- Conversation History — Continue conversations with context from previous questions
- Hybrid Search — Combines semantic understanding with keyword matching for better retrieval
- Content Safety — Built-in moderation and hallucination detection
- User Authentication — Secure login system with JWT tokens
- Semantic Caching — Faster responses for similar questions
- Rate Limiting — Prevent abuse with configurable request limits
- Python 3.13+
- uv (recommended) or pip
- API keys for:
- OpenRouter (for LLM access)
- OpenAI (for embeddings and moderation — free tier available)
# Clone the repository
git clone https://github.com/your-org/retriever.git
cd retriever
# Install dependencies
uv sync --extra dev
# Copy environment template
cp .env.example .envEdit .env with your API keys:
# Required
OPENROUTER_API_KEY=your-openrouter-key
OPENAI_API_KEY=your-openai-key
JWT_SECRET_KEY=generate-a-random-secret-key
# Optional (defaults work for local development)
LLM_MODEL=anthropic/claude-sonnet-4
DEBUG=truePlace your markdown (.md) or text (.txt) documents in the documents/ directory:
documents/
├── employee-handbook.md
├── safety-procedures.md
└── faq.txt
# Start the development server
uv run uvicorn src.main:app --reload --port 8000Visit http://localhost:8000 to start asking questions.
- Login — Create an account or log in at
/login - Ask Questions — Type your question in the chat interface
- View Sources — Click citation cards to see the original document text
- Continue Conversations — Ask follow-up questions with context preserved
Retriever exposes a REST API for programmatic access:
# Ask a question
curl -X POST http://localhost:8000/api/v1/rag/ask \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{"question": "What is the check-in procedure?"}'API documentation is available at /docs (OpenAPI/Swagger).
| Variable | Description | Default |
|---|---|---|
OPENROUTER_API_KEY |
API key for LLM provider | Required |
OPENAI_API_KEY |
API key for embeddings/moderation | Required |
JWT_SECRET_KEY |
Secret for JWT token signing | Required |
LLM_MODEL |
Primary LLM model | anthropic/claude-sonnet-4 |
LLM_FALLBACK_MODEL |
Fallback model | anthropic/claude-haiku |
RAG_CHUNK_SIZE |
Document chunk size (chars) | 1500 |
RAG_TOP_K |
Number of chunks to retrieve | 5 |
RATE_LIMIT_REQUESTS |
Requests per window | 10 |
CACHE_ENABLED |
Enable semantic caching | true |
AUTH_ENABLED |
Require authentication | true |
See .env.example for the complete list of configuration options.
For best results:
- Use markdown format with clear headings (
#,##,###) - Keep sections focused on single topics
- Use descriptive headings that match how users ask questions
- Include relevant keywords naturally in the text
Retriever can be deployed to any platform that supports Python applications.
Prerequisites:
- Docker and docker-compose compatible container tool installed
.envfile configured with your API keys
Build and run:
# Build the production image
docker build -t retriever:latest .
# Run with docker-compose (recommended)
docker-compose up -d
# Check logs
docker-compose logs -f retriever
# Check health
curl http://localhost:8000/healthAlternative: Run with docker directly
docker run -d \
--name retriever \
-p 8000:8000 \
--env-file .env \
-v retriever-data:/app/data \
-v retriever-documents:/app/documents \
retriever:latestCreate a user:
The database is inside the container, so you need to execute the script within the running container:
# Using docker-compose (recommended)
docker-compose exec retriever uv run python scripts/create_user.py
# Or using docker directly
docker exec -it retriever uv run python scripts/create_user.pyVolume Management:
# List volumes
docker volume ls
# Backup data
docker run --rm \
-v retriever-data:/data \
-v $(pwd):/backup \
alpine tar czf /backup/retriever-data-backup.tar.gz /data
# Restore data
docker run --rm \
-v retriever-data:/data \
-v $(pwd):/backup \
alpine tar xzf /backup/retriever-data-backup.tar.gz -C /
# Stop containers (preserves volumes)
docker-compose down
# Stop and DELETE volumes (CAUTION: destroys all data)
docker-compose down -vTroubleshooting:
| Issue | Solution |
|---|---|
| Port 8000 already in use | Change port: docker run -p 8001:8000 ... |
| Health check failing | Check logs: docker-compose logs retriever |
Cannot write to /app/data |
Verify container runs as appuser (uid 1000) |
| Missing environment variables | Ensure .env file exists with all required keys |
| Old code running after changes | Rebuild image: docker-compose build --no-cache |
Environment Variables:
See .env.example for the complete list. Required:
OPENROUTER_API_KEY— OpenRouter API keyOPENAI_API_KEY— OpenAI API keyJWT_SECRET_KEY— Generate withopenssl rand -base64 32
What gets persisted:
retriever-datavolume → SQLite database + Chroma vector storeretriever-documentsvolume → Uploaded policy documents
- Connect your repository
- Set environment variables in the dashboard
- Deploy
- Set
DEBUG=false - Use a strong
JWT_SECRET_KEY(32+ characters, random) - Configure rate limiting appropriately for your traffic
- Set up monitoring (Sentry DSN in
SENTRY_DSN) - Use persistent storage for
data/directory (volumes in Docker, mounted storage on cloud platforms) - Test the Docker image locally before cloud deployment
- Enable HTTPS in production (handled by Cloud Run, Railway, Render)
Retriever uses a modular monolith architecture with clean separation of concerns:
┌─────────────────────────────────────────────────────────────┐
│ DOCUMENT PIPELINE │
│ [Markdown/Text] → [Chunker] → [Embeddings] → [Vector DB] │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ QUERY FLOW │
│ [Question] → [Hybrid Search] → [Rerank] → [LLM] → [Answer] │
└─────────────────────────────────────────────────────────────┘
Tech Stack:
- Backend: Python 3.13+, FastAPI, Pydantic
- LLM: Claude via OpenRouter
- Vector DB: Chroma (embedded)
- Frontend: Jinja2 + HTMX + Tailwind CSS
- Database: SQLite
# Run tests
uv run pytest
# Run with coverage
uv run pytest --cov=src --cov-report=term-missing
# Linting and formatting
uv run ruff check src/ tests/ --fix
uv run ruff format src/ tests/
# Type checking
uv run mypy src/ --strict- Architecture Overview
- Development Standards
- Implementation Roadmap
- Deployment Guide
- Adding Documents
MIT
