RAG System Group Project

A hands-on group learning project for building a Retrieval-Augmented Generation (RAG) system using Python, Docker, and open-source AI models.

Overview

This group project teaches students how to build a question-answering AI system that can:

Read and process documents
Find relevant information using semantic search
Generate accurate answers using a local LLM

Students will work in teams to practice fundamental Python skills while working with cutting-edge AI technology.

What You'll Learn

Python Fundamentals

File I/O and text processing
String manipulation and chunking
Functions and data structures
Loops and conditionals
Working with external libraries

AI/ML Concepts

Embeddings and vector search
Retrieval-Augmented Generation (RAG)
Language Model integration
System evaluation and metrics

Project Structure

IS640_rag_project/
├── README.md                      # This file
├── STUDENT_PROJECT_GUIDE.md       # Detailed assignment instructions
├── docker_starter.md              # Docker setup guide
├── dockerfile                     # Docker image definition for Ollama
├── COMMANDS.txt                   # Docker commands reference
├── test_setup.ipynb              # Environment verification notebook
├── rag_env.yml                    # Conda environment file
├── .gitignore                     # Git ignore file
├── docs/                          # Sample documents
│   ├── text/                     # Text files for testing
│   │   ├── Attendance_Rules.txt
│   │   ├── Cheating_Rules.txt
│   │   └── School_Parking_Rules.txt
│   └── pdf/                      # (Optional) PDF documents
└── chroma_db/                     # Vector database storage (created on first run)

Quick Start

Prerequisites

Python 3.12 or higher
Anaconda or Miniconda
Docker Desktop
VS Code (recommended)
8GB RAM minimum (16GB recommended)

Step 1: Clone the Repository

git clone <repository-url>
cd IS640_rag_project

Step 2: Set Up Docker

Follow the instructions in docker_starter.md to:

Install Docker Desktop
Create a Docker account
Set up the Docker extension for VS Code

Step 3: Create Conda Environment

Refer to rag_env_starter.md

Step 4: Build and Start the Ollama LLM

Build the Docker image using the included dockerfile:

# Build the Docker image (this will take 10-15 minutes)
docker build -f dockerfile -t ollama-mistral-img .

# Start the container (CPU only)
docker run -d --rm --name ollama-mistral-offline -p 127.0.0.1:11434:11434 ollama-mistral-img

# Or with GPU support (if available)
docker run -d --rm --gpus all --name ollama-mistral-offline -p 127.0.0.1:11434:11434 ollama-mistral-img

For more Docker commands, see docker_commands.md.

Step 5: Verify Your Setup

Open and run test_setup.ipynb to verify everything is working correctly. This notebook will test:

Python dependencies
Ollama LLM connection
Embedding model
ChromaDB vector database

Step 6: Start the Assignment

Read STUDENT_PROJECT_GUIDE.md for complete project instructions.

Requirements

Python Packages

All required packages are included in rag_env.yml:

sentence-transformers - Text embeddings
chromadb - Vector database
requests - HTTP client for LLM
numpy, pandas - Data processing
streamlit - GUI for chat interface

Hardware

Minimum: 8GB RAM, 4-core CPU
Recommended: 16GB RAM, 8-core CPU, GPU (optional)
Storage: 5GB free space

Docker Setup

The project includes a dockerfile that builds an Ollama container with the Mistral 7B model. This provides a local LLM with:

No API keys required
Complete privacy (runs locally)
Offline operation (once built)
Free usage

The dockerfile uses a multi-stage build to:

Download and install Ollama
Pull the Mistral 7B model
Create a minimal runtime image

See docker_starter.md for detailed Docker Desktop setup and docker_commands.md for build/run commands.

Group Project Assignment

This is a group project where teams will complete 7 TODO tasks:

Document Loading - Load text files from a folder
Text Chunking - Split documents into smaller pieces
Process Documents - Apply chunking to all documents
RAG Query Function - Build the main Q&A pipeline
Test Dataset - Create evaluation questions
Evaluation Metrics - Measure system performance
Run Evaluation - Test and analyze results

Group Size: 2-4 students per team

See STUDENT_PROJECT_GUIDE.md for complete details, group responsibilities, grading rubric, and submission requirements.

Sample Documents

The docs/text/ folder contains sample school policy documents for testing. Teams should:

Test with the provided documents first
Create their own document collection (5-10 files)
Choose topics the team understands well

Troubleshooting

Environment Issues

Problem: Conda environment won't create

# Try creating manually
conda create -n rag3_313 python=3.13
conda activate rag3_313
pip install sentence-transformers chromadb requests

Docker Issues

Problem: Docker container won't start

Verify Docker Desktop is running
Check WSL 2 is installed (Windows)
Enable virtualization in BIOS

Problem: Can't connect to Ollama

# Check if container is running
docker ps

# Test connection
curl http://127.0.0.1:11434/api/tags

Notebook Issues

Problem: Kernel crashes when loading embedding model

Your system may need more RAM
Try a smaller model
Close other applications

Support

Assignment Questions: See STUDENT_PROJECT_GUIDE.md
Docker Setup: See docker_starter.md
Docker Commands: See docker_commands.md
Technical Issues: Check the troubleshooting section above

Resources

Learning Materials

Understanding RAG

License

This project is designed for educational purposes.

Acknowledgments

Built using:

Ollama - Local LLM runtime
ChromaDB - Vector database
Sentence Transformers - Embedding models
Mistral AI - Language model

Ready to build your first RAG system? Start with STUDENT_PROJECT_GUIDE.md!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
__pycache__		__pycache__
docs		docs
student_chroma_db		student_chroma_db
README.md		README.md
STUDENT_PROJECT_GUIDE.md		STUDENT_PROJECT_GUIDE.md
chat_interface.ipynb		chat_interface.ipynb
docker_commands.md		docker_commands.md
docker_starter.md		docker_starter.md
dockerfile		dockerfile
evaluation_results.json		evaluation_results.json
optional_small_model_dockerfile		optional_small_model_dockerfile
optional_small_model_starter.md		optional_small_model_starter.md
rag_concepts.md		rag_concepts.md
rag_env_starter.md		rag_env_starter.md
rag_helpers.py		rag_helpers.py
student_rag_project.ipynb		student_rag_project.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG System Group Project

Overview

What You'll Learn

Python Fundamentals

AI/ML Concepts

Project Structure

Quick Start

Prerequisites

Step 1: Clone the Repository

Step 2: Set Up Docker

Step 3: Create Conda Environment

Step 4: Build and Start the Ollama LLM

Step 5: Verify Your Setup

Step 6: Start the Assignment

Requirements

Python Packages

Hardware

Docker Setup

Group Project Assignment

Sample Documents

Troubleshooting

Environment Issues

Docker Issues

Notebook Issues

Support

Resources

Learning Materials

Understanding RAG

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

rolas978/is640-project

Folders and files

Latest commit

History

Repository files navigation

RAG System Group Project

Overview

What You'll Learn

Python Fundamentals

AI/ML Concepts

Project Structure

Quick Start

Prerequisites

Step 1: Clone the Repository

Step 2: Set Up Docker

Step 3: Create Conda Environment

Step 4: Build and Start the Ollama LLM

Step 5: Verify Your Setup

Step 6: Start the Assignment

Requirements

Python Packages

Hardware

Docker Setup

Group Project Assignment

Sample Documents

Troubleshooting

Environment Issues

Docker Issues

Notebook Issues

Support

Resources

Learning Materials

Understanding RAG

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages