Building RAG Applications Workshop

This repository contains materials for the Building RAG (Retrieval-Augmented Generation) Applications Workshop. The workshop covers both naive RAG implementations and advanced RAG techniques to help you understand and build effective RAG systems.

This repository is used for the O'Reilly training course: Building Reliable RAG Applications: From PoC to Production.

Prerequisites

API keys (OpenAI, Cohere)
Qdrant database (Cloud or Docker)

Quick Start

1. Install uv

This project uses uv for dependency management. It automatically handles Python and virtual environments.

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

2. Clone and Install

git clone https://github.com/Sarangk90/building-rag-app-workshop.git
cd building-rag-app-workshop

# Install Python 3.11 and all dependencies
uv sync

3. Complete Workshop Setup

📖 Follow the complete setup guide: SETUP.md

The setup guide covers:

Qdrant database setup (Cloud or Docker options)
Environment variable configuration
Data ingestion process
Troubleshooting common issues

⚠️ You must complete the setup before running any notebooks!

4. Start Jupyter

uv run jupyter lab

Workshop Notebooks

After completing the setup, run the notebooks in this order:

1. Naive RAG

naive-rag/01-naive-rag.ipynb - Basic RAG implementation
naive-rag/02-naive-rag-challenges.ipynb - RAG limitations and evaluation

2. Advanced RAG

advanced-rag/01-advanced-rag-rerank.ipynb - Advanced RAG with reranking

3. SciFact Dataset (Optional)

advanced-rag/scifact/01-data-indexing.ipynb - Data indexing techniques
advanced-rag/scifact/02-advanced-rag.ipynb - Advanced techniques

Note: Each notebook automatically detects your setup (Cloud vs Docker) and connects appropriately.

Workshop Content

Naive RAG

Basic RAG implementation using OpenAI embeddings
Vector storage with Qdrant
Simple retrieval and generation pipeline

Advanced RAG

Hybrid search combining dense and sparse embeddings
Reranking with cross-encoders for improved relevance
Evaluation using standard metrics and benchmarks

Data

The workshop uses:

Wikipedia articles on machine learning topics (Deep learning, Transformers, etc.)
BeIR SciFact dataset for demonstrations and evaluations

Wikipedia Article Management

The repository includes pre-downloaded Wikipedia articles in data/wiki_articles/ to avoid repetitive API calls during workshops. Use the following scripts to manage articles:

List Available Articles

uv run python scripts/fetch_additional_articles.py --list-available

Fetch Additional Articles

# Fetch specific articles
uv run python scripts/fetch_additional_articles.py "Machine learning" "Computer vision"

# Fetch from extended list (30+ ML/AI topics)
uv run python scripts/fetch_additional_articles.py

# View the extended article list
uv run python scripts/fetch_additional_articles.py --list-extended

Force Re-fetch Existing Articles

uv run python scripts/fetch_additional_articles.py --force "Deep learning"

Available Pre-downloaded Articles:

Artificial neural network
BERT (language model)
Deep learning
Generative pre-trained transformer
Overfitting
Transformer (machine learning model)

Key Dependencies

openai: For embeddings and completions
qdrant-client: For vector storage and retrieval
wikipedia, beautifulsoup4: For data collection and cleaning
FlagEmbedding: For reranking functionality
cohere: For additional reranking options
ragas: For comprehensive RAG evaluation
Various utilities: tqdm, python-dotenv, etc.

Evaluation

The workshop includes evaluation scripts using RAGAS metrics to assess the quality of RAG outputs across dimensions like relevance, faithfulness, and answer quality.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.clinerules		.clinerules
advanced-rag		advanced-rag
data		data
imgs		imgs
naive-rag		naive-rag
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
.talismanrc		.talismanrc
CLAUDE.md		CLAUDE.md
README.md		README.md
SETUP.md		SETUP.md
default.sqlite		default.sqlite
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building RAG Applications Workshop

Prerequisites

Quick Start

1. Install uv

2. Clone and Install

3. Complete Workshop Setup

4. Start Jupyter

Workshop Notebooks

1. Naive RAG

2. Advanced RAG

3. SciFact Dataset (Optional)

Workshop Content

Naive RAG

Advanced RAG

Data

Wikipedia Article Management

List Available Articles

Fetch Additional Articles

Force Re-fetch Existing Articles

Key Dependencies

Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Building RAG Applications Workshop

Prerequisites

Quick Start

1. Install uv

2. Clone and Install

3. Complete Workshop Setup

4. Start Jupyter

Workshop Notebooks

1. Naive RAG

2. Advanced RAG

3. SciFact Dataset (Optional)

Workshop Content

Naive RAG

Advanced RAG

Data

Wikipedia Article Management

List Available Articles

Fetch Additional Articles

Force Re-fetch Existing Articles

Key Dependencies

Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages