Skip to content

A web-based tutoring assistant built with Flask and powered by Google’s Gemini LLM. The application uses a multi-agent architecture to handle subject-specific questions, image-to-text OCR, and AI-generated summaries from YouTube links.

Notifications You must be signed in to change notification settings

whyRahull/MultiAgent-Tutor-GenAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MultiAgent GenAI Tutor

Live Demo - Deployed Link

Live Demo Link

Video Demonstration of MultiAgent GenAI Tutor

Demo Video Link

A Flask-powered web application that implements a multi-agent tutoring assistant using Google’s Gemini LLM and specialized tools. Users can ask questions in math, physics, chemistry, or biology; upload images for OCR; or paste YouTube URLs for AI-generated summaries.


Table of Contents

  1. Features
  2. Architecture & Flow
  3. Setup & Installation
  4. Environment Variables
  5. Running Locally
  6. Deployment
  7. Folder Structure
  8. Screenshots
  9. Agent–Tool Interaction
  10. Challenges & Solutions

Features

  • Text Chat: Ask any question in math, physics, chemistry, or biology.
  • OCR: Upload an image of a question; gets converted to text.
  • YouTube Summarizer: Paste a youtube.com or youtu.be link and receive concise bullet points.
  • Calculator: Automatic math expression evaluation via CalculatorTool.
  • Email Transcript: Receive a PDF copy of your full chat via email.

Architecture & Flow

MultiAgent-Tutor-GenAI-flowchart

  1. User Input

    • Text question → MentorAgent
    • Image upload → OCRTool → text → MentorAgent
    • YouTube URL → YouTubeSummarizerTool
  2. MentorAgent

    • Detects YouTube URLs (regex).
    • Otherwise, classifies the subject (math/physics/chemistry/biology) via a Gemini prompt.
  3. TutorAgent

    • Routes by subject to the appropriate agent (MathAgent, PhysicsAgent, BiologyAgent).
    • If subject is unknown or unsupported, returns a friendly fallback.
  4. Subject Agents

    • MathAgent: Checks for numeric expressions, uses CalculatorTool if found; else asks Gemini.
    • PhysicsAgent/BiologyAgent/ChemistryAgent: Delegate directly to Gemini with subject-specific prompts.
  5. Tools

    • CalculatorTool: Safe eval of math expressions.
    • OCRTool: PIL + Tesseract OCR.
    • YouTubeSummarizerTool: Fetches transcript, uses Gemini to summarize.
    • EmailTool: Converts chat to PDF and emails it.

Setup & Installation

  1. Clone

    git clone https://github.com/yourusername/MultiAgent-Tutor-GenAI.git
    cd MultiAgent-Tutor-GenAI
  2. Environment

    python3 -m venv venv
    source venv/bin/activate   # Windows: venv\Scripts\activate
    pip install --upgrade pip
    pip install -r requirements.txt
    
    Tesseract OCR
    Ubuntu: sudo apt-get install tesseract-ocr
    Windows: Download MSI from https://github.com/tesseract-ocr/tesseract

Tesseract OCR

Ubuntu

sudo apt-get install tesseract-ocr

Windows

Download and install from: https://github.com/tesseract-ocr/tesseract


Environment Variables

Create a .env file in the project root:

# Gemini LLM API
GEMINI_API_KEY=your_google_gemini_api_key

# Flask session
FLASK_SECRET_KEY=a_random_secret_key

# (Optional) EmailTool SMTP
SMTP_SERVER=smtp.gmail.com
SMTP_PORT=587
SENDER_EMAIL=your_email@example.com
SENDER_PASSWORD=your_email_password

Running Locally

export FLASK_APP=app.py
export FLASK_ENV=development
flask run

Visit http://localhost:5000 in your browser.


Deployment

Deployed on Render. Live App: https://multiagent-tutor-genai-1.onrender.com/


Folder Structure

MultiAgent-Tutor-GenAI/
├── agents/
│   ├── mentor_agent.py      # Orchestrator
│   ├── subject_classifier.py
│   ├── tutor_agent.py
│   ├── math_agent.py
│   ├── physics_agent.py
│   ├── biology_agent.py
│   └── chemistry_agent.py
├── tools/
│   ├── calculator_tool.py
│   ├── ocr_tool.py
│   ├── youtube_summarizer.py
│   └── email_tool.py
├── templates/
│   └── index.html
├── static/
│   └── ss1.png
│   └── ss2.png
├── .env
├── requirements.txt
└── app.py

Screenshots

MultiAgent-Tutor-GenAI MultiAgent-Tutor-GenAI MultiAgent-Tutor-GenAI

Agent–Tool Interaction

Flow:

User → MentorAgent → [OCRTool | Subject Classifier] → TutorAgent → [CalculatorTool or Gemini LLM] → Response

Extensibility:

Add new agents (e.g. HistoryAgent) by implementing answer() and registering it in tutor_agent.py.

Prompt Control:

Carefully engineered prompts ensure consistent outputs:

  • Single-word classification
  • 250-word bullet summaries

Challenges & Solutions

Subject Classification Errors

Fix: Prompt refinement to enforce exact category keywords and handle “unknown” fallback.

OCR Misreads

Fix: Pre-processing images (contrast adjustments) and error handling for blank extractions.

LLM Hallucinations in Math

Fix: Offload pure computations to the CalculatorTool whenever possible.

Session Size

Fix: Consider truncating very long transcripts or migrating to a database for persistence.


About

A web-based tutoring assistant built with Flask and powered by Google’s Gemini LLM. The application uses a multi-agent architecture to handle subject-specific questions, image-to-text OCR, and AI-generated summaries from YouTube links.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published