AI-Powered Vocabulary Quiz Generator (End-to-End RAG Project)

Live on: https://learnwords.duckdns.org This is the live production version of the project. You can directly enter English words and generate smart, context-aware quizzes powered by a fine-tuned LLM and semantic search.

Custom Model & Fine-Tuning

The AI model powering this application is not pre-trained out of the box. I personally fine-tuned google/flan-t5-small using the Stanford SQuAD dataset to transform it into a specialized Question Generator.

🔗 View My Training Notebook: Google Colab - Fine-Tuning & Quantization

Usage

video.mp4

Project Description

This project is a full-stack, production-ready Generative AI application designed to teach English vocabulary through dynamic quizzes. Unlike standard flashcard apps, it uses a Retrieval-Augmented Generation (RAG) pipeline to fetch precise definitions from a vector database and a Fine-Tuned Google Flan-T5 model to generate context-aware questions in real-time.

The project is designed as a complete End-to-End AI application, covering:

Custom Model Training: Transforming a general-purpose LLM into a specialized question generator.
Edge Optimization: Running heavy NLP tasks on limited CPU resources via ONNX Quantization.
Data Engineering: Scraping, cleaning, and indexing 10,000+ words into a Vector DB.
Containerization: Single-container microservice architecture.
Cloud Deployment: Secure deployment on AWS EC2 with Nginx & SSL.

Project Goal

The main goal of this project is to solve the "Hallucination" problem in LLMs when generating educational content and to build a cost-effective AI product. The focus is not only on the model but on the entire engineering pipeline:

Resource Optimization: Achieving <500ms inference latency on 1GB RAM (AWS Free Tier) using INT8 Quantization.
Accuracy: Implementing a hybrid search algorithm (Metadata Filtering + Fuzzy Search) to ensure zero-miss retrieval.
User Experience: "Sequential Learning" logic where regenerating a quiz fetches a different definition for the same word.
Production Deployment: Robust Dockerized environment served via HTTPS.

🛠️ Technologies Used

AI & NLP Pipeline

Task: Text-to-Text Generation (Question Generation)
Models:
- Google Flan-T5 Small (Fine-Tuned)
- Sentence-Transformers (Embedding Model: all-MiniLM-L6-v2)
Optimization: ONNX Runtime & Quantization (INT8)
Database: Pinecone (Vector Database with Metadata Filtering)

Application Logic & Backend

Language: Python 3.9
Framework: FastAPI (Asynchronous endpoints)
Logic: Custom RAG pipeline with "Smart Masking" (Regex-based answer hiding) and "Dynamic Distractor Generation".
Environment: Python-Dotenv (Configuration management)

Frontend (User Interface)

Tech: HTML5, JavaScript (ES6+)
Design: Tailwind CSS (Modern "Dark Tech" theme)
Interactivity: Bulk word processing, loading states, and interactive feedback.

DevOps, Container & Cloud

Containerization: Docker (Multi-stage build optimization)
Cloud Provider: AWS EC2 (Ubuntu - Free Tier Optimized)
Web Server: Nginx (Reverse Proxy)
Security: Certbot (Let's Encrypt SSL/TLS)
Networking: DuckDNS (Dynamic DNS Updater)

Libraries Used

📂 Project Structure

app/
- main.py → FastAPI entry point. Handles bulk-generate, sequential logic, and CORS.
- services/ → Core logic for Vector DB connection and Generator inference.
- models/onnx_quantized/ → CUSTOM MODEL FILES (Encoder/Decoder) generated via Colab.
frontend/
- index.html → Modern landing page for bulk input.
- quiz.html → Dynamic quiz interface with "Regenerate" capability.
seed_db.py → ETL script to fetch 10,000+ words from dictionary APIs and populate Pinecone.
Dockerfile → Optimized image build steps (installing CPU-only PyTorch first).
.env → Environment variables (Pinecone API Keys).

Model Training & Fine-Tuning

The core "Brain" of this project was not just downloaded; it was engineered. I performed fine-tuning and optimization to make a Small Language Model (SLM) behave like a specialized teacher.

View Training Notebook: Google Colab - Fine-Tuning & Quantization

1. Why Fine-Tune? (The Logic)

A pre-trained FLAN-T5 model is naturally instruction-tuned, but it struggles with Question Generation.

Default Behavior: If you ask it to "Generate a question for Apple," it might say "Apple is a technology company" (answering instead of asking) or "What is apple?" (too simple).
Fine-Tuning Goal: I needed to force the model to format output specifically for B2-level quizzes. It learned not just to ask, but to construct a question based only on the provided context.

2. Dataset Strategy: SQuAD

I utilized the Stanford Question Answering Dataset (SQuAD), which consists of 100,000+ questions based on Wikipedia articles. However, I had to reverse the data structure:

Standard SQuAD: (Context + Question) -> Answer
My Engineering: (Context + Answer) -> Question

JSON Structure Example:

{
  "context": "Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL)...",
  "question": "Which NFL team won Super Bowl 50?",
  "answers": [ { "text": "Denver Broncos", "answer_start": 177 } ]
}

3. Training & Preprocessing

I processed the data to feed the model a specific prompt template during training:

Input: answer: Denver Broncos context: Super Bowl 50 was an American football game...
Target Output: Which NFL team won Super Bowl 50?

This forces the model to learn the relationship between a specific answer and its surrounding context to derive a question.

4. ONNX Quantization

To deploy this on a free AWS instance (1 vCPU, 1GB RAM), standard PyTorch weights were too heavy.

Conversion: The fine-tuned model was exported to ONNX format.
Quantization: I applied INT8 Quantization using the optimum library.
Result: Reduced model size by 4x and improved CPU inference speed by ~20x.

Vector Database & Data Engineering

This project relies on a comprehensive Vector Database to act as the "Memory" of the AI.

1. Data Sources

I scraped and curated data from high-quality sources to ensure definitions are accurate and modern:

Word List: MIT 10,000 Most Frequent Words
Definitions: Free Dictionary API (Sourced from Oxford/Google)
Vector Database Content: Engineered a semantic search engine containing over 7,100+ curated English words (sourced from Oxford/Google Dictionary data), utilizing Metadata Filtering for exact matches and Fuzzy Search for fallbacks.

2. Database Structure

The data is stored in Pinecone with the following JSON structure, allowing for both semantic search and metadata filtering:

{
  "id": "word_apple",
  "vector": [0.12, 0.54, ...], // Embedding for semantic search
  "metadata": {
    "word": "Apple",
    "definition": "A round fruit with red or green skin and a whitish inside.",
    "synonyms": ["fruit", "red pome"],
    "example_sentence": "She eats an apple every day to stay healthy.",
    "difficulty": "A1"
  }
}

RAG Pipeline Logic (Runtime Flow)

When a user requests a quiz for a word (e.g., "Apple"), the system executes the following Runtime Flow:

Retrieval: The code queries the Vector DB (Pinecone). It first attempts a Metadata Filter (word='Apple') for precision, falling back to Vector Search if needed.
Prompt Engineering: The retrieved definition is injected into a specific template.
- System Prompt: generate question: answer: Apple context: [Retrieved Definition]
Generation: The ONNX-optimized T5 model processes this prompt and generates a context-aware question.
Post-Processing: The answer ("Apple") is masked in the generated question (replaced with _______) to create a fill-in-the-blank style quiz.

How to Run Locally

You can run the system in two different ways:

Using Docker (recommended for consistency)
Running manually with Python

1. Environment Variable Configuration (.env)

You need a Pinecone API Key (Free Tier). Create a file named .env in the project root:

PINECONE_API_KEY=your_pinecone_api_key
PINECONE_ENV=your_region
INDEX_NAME=your_index_name

2. Option A: Run with Docker (Recommended)

This ensures all dependencies (including the specific ONNX runtime and CPU-only PyTorch) are correct.

Steps

Build the image:

docker build -t ai-quiz-app .

Run the container:

docker run -p 8080:80 --env-file .env ai-quiz-app

Open your browser:

http://localhost:8080

3. Option B: Run Without Docker (Manual Setup)

Install Dependencies

pip install -r requirements.txt

Start the Application

uvicorn app.main:app --reload

Frontend runs at: http://127.0.0.1:8000

Production Deployment

Deployed on: AWS EC2 (t2.micro) running Ubuntu.
Optimization: Configured with 4GB Swap Space to prevent OOM (Out of Memory) kills during model loading.
Docker Optimization: Uses torch --index-url .../cpu to reduce image size by removing CUDA dependencies.
Access: Served via DuckDNS with Port 80 redirection.
Security: Traffic secured with SSL/TLS certificates via Let's Encrypt (Certbot) and Nginx Reverse Proxy.

Summary

This project demonstrates the capability to build cost-effective, high-performance AI applications. It moves beyond simple API wrappers by implementing:

Custom Model Fine-Tuning & Quantization.
Hybrid Search Algorithms (Keyword + Vector).
Full-Stack Microservice Architecture.
Real-world Cloud Deployment.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
app		app
frontend		frontend
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt
seed_db.py		seed_db.py

Folders and files

Latest commit

History

Repository files navigation

AI-Powered Vocabulary Quiz Generator (End-to-End RAG Project)

Custom Model & Fine-Tuning

Usage

Project Description

Project Goal

🛠️ Technologies Used

AI & NLP Pipeline

Application Logic & Backend

Frontend (User Interface)

DevOps, Container & Cloud

Libraries Used

📂 Project Structure

Model Training & Fine-Tuning

1. Why Fine-Tune? (The Logic)

2. Dataset Strategy: SQuAD

3. Training & Preprocessing

4. ONNX Quantization

Vector Database & Data Engineering

1. Data Sources

2. Database Structure

RAG Pipeline Logic (Runtime Flow)

How to Run Locally

1. Environment Variable Configuration (.env)

2. Option A: Run with Docker (Recommended)

Steps

3. Option B: Run Without Docker (Manual Setup)

Install Dependencies

Start the Application

Production Deployment

Summary

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages