A comprehensive educational resource teaching data-driven approaches to building and improving Retrieval-Augmented Generation systems that get better over time. Learn from real case studies with concrete metrics showing how RAG systems improve from 60% to 85%+ accuracy through systematic measurement and iteration.
Transform RAG from a technical implementation into a continuously improving product through:
- Data-driven evaluation: Establish metrics before building features
- Systematic improvement: Turn evaluation insights into measurable gains
- User feedback loops: Design systems that learn from real usage
- Specialized retrieval: Build purpose-built retrievers for different content types
- Intelligent routing: Orchestrate multiple specialized components
- Production deployment: Maintain improvement velocity at scale
Legal Tech Company: 63% → 87% accuracy over 3 months through systematic error analysis, better chunking, and validation patterns. Generated 50,000+ citation examples for continuous training.
Construction Blueprint Search: 27% → 85% recall in 4 days by using vision models for spatial descriptions. Further improved to 92% for counting queries through bounding box detection.
Feedback Collection: 10 → 40 daily submissions (4x improvement) through better UX copy and interactive elements, enabling faster improvement cycles.
The core philosophy centers around the "RAG Flywheel" - a continuous improvement cycle that emphasizes:
- Measure: Establish benchmarks and evaluation metrics
- Analyze: Understand failure modes and user patterns
- Improve: Apply targeted optimizations
- Iterate: Continuous refinement based on real-world usage
.
├── docs/ # Complete workshop series (Chapters 0-7)
│ ├── workshops/ # Progressive learning path from evaluation to production
│ ├── talks/ # Industry expert presentations with case studies
│ ├── office-hours/# Q&A summaries addressing real implementation challenges
│ └── misc/ # Additional learning resources
├── latest/ # Reference implementations and case study code
│ ├── case_study/ # Comprehensive WildChat project demonstrating concepts
│ ├── week0-6/ # Code examples aligned with workshop chapters
│ └── examples/ # Standalone demonstrations
├── data/ # Real datasets from case studies and talks
└── mkdocs.yml # Documentation configuration
The workshops follow a systematic progression from evaluation to production:
Mindset shift from technical project to product. See how the legal tech company went from 63% to 87% accuracy by treating RAG as a recommendation engine with continuous feedback loops.
Build evaluation frameworks before you have users. Learn from the blueprint search case: 27% → 85% recall in 4 days through synthetic data and task-specific vision model prompting.
Turn evaluation insights into measurable improvements. Fine-tuning embeddings delivers 6-10% gains. Learn when to use re-rankers vs custom embeddings based on your data distribution.
3.1 - Feedback Collection: Zapier increased feedback from 10 to 40 submissions/day through better UX copy
3.2 - Perceived Performance: 11% perception improvement equals 40% reduction in perceived wait time
3.3 - Quality of Life: Citations, validation, chain-of-thought delivering 18% accuracy improvements
4.1 - Finding Patterns: Construction company discovered 8% of queries (scheduling) drove 35% of churn
4.2 - Prioritization: Use 2x2 frameworks to choose what to build next based on volume and impact
5.1 - Foundations: Why one-size-fits-all fails. Different queries need different approaches
5.2 - Implementation: Documents, images, tables, SQL - each needs specialized handling
6.1 - Query Routing: Construction company: 65% → 78% through proper routing (95% × 82% = 78%)
6.2 - Tool Interfaces: Clean APIs enable parallel development. 40 examples/tool = 95% routing accuracy
6.3 - Performance Measurement: Two-level metrics separate routing failures from retrieval failures
Maintain improvement velocity at scale. Construction company: 78% → 84% success while scaling 5x query volume and reducing unit costs from $0.09 to $0.04 per query.
- Part 1: Understanding different content types
- Part 2: Implementation strategies
- Topics:
- Working with documents, images, tables, and structured data
- Metadata filtering and Text-to-SQL integration
- PDF parsing and multimodal embeddings
The workshops use industry-standard tools for production RAG systems:
- LLM APIs: OpenAI, Anthropic, Cohere
- Vector Databases: LanceDB, ChromaDB, Turbopuffer
- Frameworks: Sentence-transformers, BERTopic, Transformers, Instructor
- Evaluation: Synthetic data generation, precision/recall metrics, A/B testing
- Monitoring: Logfire, production observability patterns
- Processing: Pandas, SQLModel, Docling for PDF parsing
The /docs directory contains comprehensive workshop materials built with MkDocs:
- Workshop Chapters (0-7): Complete learning path from evaluation to production
- Office Hours: Q&A summaries addressing real implementation challenges
- Industry Talks: Expert presentations on RAG anti-patterns, embedding performance, production monitoring
- Case Studies: Detailed examples with specific metrics and timelines
- Product mindset: RAG as evolving product, not static implementation
- Data-driven improvement: Metrics and feedback guide development
- Systematic approach: Structured improvement processes over ad-hoc tweaking
- User-centered design: Focus on user value, not just technical capabilities
- Continuous learning: Systems that improve with every interaction
Build and view documentation:
mkdocs serve # Local development with live reload
mkdocs build # Static site generationlatest/ directory for the most current course content.
The cohort_1/ and cohort_2/ directories contain materials from previous course iterations and are kept for reference only. All new development and course work should be done in latest/.
- Python 3.11 (required - the project uses specific features from this version)
uvpackage manager (recommended) orpip
-
Clone the repository
-
Navigate to the
latest/directory:cd latest/ -
Install dependencies:
# Using uv (recommended) uv install # Or using pip pip install -e .
-
Start with
week0/for the most up-to-date content -
Follow the notebooks in sequential order within each week
-
Reference the corresponding book chapters in
/docsfor deeper understanding
Before committing changes, run:
# Format and fix code issues
uv run ruff check --fix --unsafe-fixes .
uv run ruff format .This course emphasizes:
- Systematic Improvement: Data-driven approaches over guesswork
- Product Thinking: Building RAG systems that solve real problems
- Practical Application: Real-world datasets and examples
- Evaluation-First: Measure before and after every change
- Continuous Learning: The field evolves rapidly; the flywheel helps you adapt
- Industry talk transcripts in
/data/ - Office hours recordings summaries in
/docs/office_hours/ - Advanced notebooks in
/latest/extra_kura/for clustering and classification topics - Complete case study implementation in
/latest/case_study/
This is educational material for the "Systematically Improving RAG Applications" course.