A hands-on group learning project for building a Retrieval-Augmented Generation (RAG) system using Python, Docker, and open-source AI models.
This group project teaches students how to build a question-answering AI system that can:
- Read and process documents
- Find relevant information using semantic search
- Generate accurate answers using a local LLM
Students will work in teams to practice fundamental Python skills while working with cutting-edge AI technology.
- File I/O and text processing
- String manipulation and chunking
- Functions and data structures
- Loops and conditionals
- Working with external libraries
- Embeddings and vector search
- Retrieval-Augmented Generation (RAG)
- Language Model integration
- System evaluation and metrics
IS640_rag_project/
├── README.md # This file
├── STUDENT_PROJECT_GUIDE.md # Detailed assignment instructions
├── docker_starter.md # Docker setup guide
├── dockerfile # Docker image definition for Ollama
├── COMMANDS.txt # Docker commands reference
├── test_setup.ipynb # Environment verification notebook
├── rag_env.yml # Conda environment file
├── .gitignore # Git ignore file
├── docs/ # Sample documents
│ ├── text/ # Text files for testing
│ │ ├── Attendance_Rules.txt
│ │ ├── Cheating_Rules.txt
│ │ └── School_Parking_Rules.txt
│ └── pdf/ # (Optional) PDF documents
└── chroma_db/ # Vector database storage (created on first run)
- Python 3.12 or higher
- Anaconda or Miniconda
- Docker Desktop
- VS Code (recommended)
- 8GB RAM minimum (16GB recommended)
git clone <repository-url>
cd IS640_rag_projectFollow the instructions in docker_starter.md to:
- Install Docker Desktop
- Create a Docker account
- Set up the Docker extension for VS Code
- Refer to rag_env_starter.md
Build the Docker image using the included dockerfile:
# Build the Docker image (this will take 10-15 minutes)
docker build -f dockerfile -t ollama-mistral-img .
# Start the container (CPU only)
docker run -d --rm --name ollama-mistral-offline -p 127.0.0.1:11434:11434 ollama-mistral-img
# Or with GPU support (if available)
docker run -d --rm --gpus all --name ollama-mistral-offline -p 127.0.0.1:11434:11434 ollama-mistral-imgFor more Docker commands, see docker_commands.md.
Open and run test_setup.ipynb to verify everything is working correctly. This notebook will test:
- Python dependencies
- Ollama LLM connection
- Embedding model
- ChromaDB vector database
Read STUDENT_PROJECT_GUIDE.md for complete project instructions.
All required packages are included in rag_env.yml:
sentence-transformers- Text embeddingschromadb- Vector databaserequests- HTTP client for LLMnumpy,pandas- Data processingstreamlit- GUI for chat interface
- Minimum: 8GB RAM, 4-core CPU
- Recommended: 16GB RAM, 8-core CPU, GPU (optional)
- Storage: 5GB free space
The project includes a dockerfile that builds an Ollama container with the Mistral 7B model. This provides a local LLM with:
- No API keys required
- Complete privacy (runs locally)
- Offline operation (once built)
- Free usage
The dockerfile uses a multi-stage build to:
- Download and install Ollama
- Pull the Mistral 7B model
- Create a minimal runtime image
See docker_starter.md for detailed Docker Desktop setup and docker_commands.md for build/run commands.
This is a group project where teams will complete 7 TODO tasks:
- Document Loading - Load text files from a folder
- Text Chunking - Split documents into smaller pieces
- Process Documents - Apply chunking to all documents
- RAG Query Function - Build the main Q&A pipeline
- Test Dataset - Create evaluation questions
- Evaluation Metrics - Measure system performance
- Run Evaluation - Test and analyze results
Group Size: 2-4 students per team
See STUDENT_PROJECT_GUIDE.md for complete details, group responsibilities, grading rubric, and submission requirements.
The docs/text/ folder contains sample school policy documents for testing. Teams should:
- Test with the provided documents first
- Create their own document collection (5-10 files)
- Choose topics the team understands well
Problem: Conda environment won't create
# Try creating manually
conda create -n rag3_313 python=3.13
conda activate rag3_313
pip install sentence-transformers chromadb requestsProblem: Docker container won't start
- Verify Docker Desktop is running
- Check WSL 2 is installed (Windows)
- Enable virtualization in BIOS
Problem: Can't connect to Ollama
# Check if container is running
docker ps
# Test connection
curl http://127.0.0.1:11434/api/tagsProblem: Kernel crashes when loading embedding model
- Your system may need more RAM
- Try a smaller model
- Close other applications
- Assignment Questions: See
STUDENT_PROJECT_GUIDE.md - Docker Setup: See
docker_starter.md - Docker Commands: See
docker_commands.md - Technical Issues: Check the troubleshooting section above
This project is designed for educational purposes.
Built using:
- Ollama - Local LLM runtime
- ChromaDB - Vector database
- Sentence Transformers - Embedding models
- Mistral AI - Language model
Ready to build your first RAG system? Start with STUDENT_PROJECT_GUIDE.md!