This guide walks you through setting up your environment for the RAG workshop. You'll need to complete these steps before running any of the workshop notebooks.
This project uses uv for fast, reliable dependency management. uv automatically handles Python version management and virtual environments.
# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"# This installs Python 3.11 (if needed) and all dependencies
uv syncThat's it! uv handles everything automatically.
Create a .env file in the project root with your API keys:
OPENAI_API_KEY=your_openai_api_key_here
QDRANT_URL=your_qdrant_cluster_url_here
QDRANT_API_KEY=your_qdrant_api_key_here
# Optional: For advanced RAG features
COHERE_API_KEY=your_cohere_api_key_hereYou have two options for the vector database. Choose the one that works best for your environment:
- Go to Qdrant Cloud and sign up for a free account
- Create a new cluster (free tier is sufficient)
- Get your cluster URL and API key from the dashboard
- Add them to your
.envfile
# Run Qdrant locally
docker run -d -p 6333:6333 -p 6334:6334 qdrant/qdrant:v1.13.2For local setup, use these .env values:
QDRANT_URL=http://localhost:6333
# Note: No QDRANT_API_KEY needed for local setup# From the project root directory
uv run python scripts/ingest_to_qdrant_cloud.pyThe script automatically detects whether you're using cloud or local setup based on your QDRANT_URL.
- ✅ Load the extended Wikipedia dataset (61 articles)
- 🔪 Create 1,210 chunks with 300 character chunks, 50 character overlap
- 🤖 Generate embeddings using OpenAI text-embedding-3-small
- 📤 Upload everything to your Qdrant instance (cloud or local)
- ⏱️ Takes approximately 5-10 minutes to complete
uv run jupyter labThen open the notebooks in order:
naive-rag/01-naive-rag.ipynb- Basic RAG implementationnaive-rag/02-naive-rag-challenges.ipynb- Exploring RAG limitationsadvanced-rag/01-advanced-rag-rerank.ipynb- Advanced RAG with rerankingadvanced-rag/scifact/- SciFact dataset examples (optional)
Each notebook automatically detects your setup and connects appropriately.
# Make sure you're in the project root directory
cd path/to/building-rag-app-workshop
# Run the ingestion script
uv run python scripts/ingest_to_qdrant_cloud.py# Check if Qdrant container is running
docker ps
# If not running, start it again
docker run -d -p 6333:6333 -p 6334:6334 qdrant/qdrant:v1.13.2
# Test connection
curl http://localhost:6333- Verify your Qdrant Cloud cluster is running in the dashboard
- Double-check your cluster URL and API key
- Make sure you're using the correct cluster region
- Double-check your
.envfile is in the project root - Restart your Jupyter kernel after creating/updating
.env - For local setup:
QDRANT_URL=http://localhost:6333(no API key needed) - For cloud setup: Both
QDRANT_URLandQDRANT_API_KEYrequired
- Make sure you have credits in your OpenAI account
- Verify your OpenAI API key is correct
- Sign up for a free Cohere account at cohere.ai
- Get your API key from the dashboard
- Add it to your
.envfile asCOHERE_API_KEY=your_key_here
- Check the
data/ingestion_summary.jsonfile (created after successful ingestion) - Look at the terminal output from the ingestion script for error messages
- For Docker: Check Docker logs with
docker logs <container_id>
After completing the setup, you should see:
- A
.envfile in your project root with the required API keys - Output from the ingestion script showing "🎉 INGESTION COMPLETED SUCCESSFULLY!"
- A
data/ingestion_summary.jsonfile with ingestion details
You're ready to start the workshop once you see "Expected number of chunks found! Ingestion was successful." in any notebook!