TutorAI: Intelligent Document Retrieval and Question Answering System

TutorAI is a state-of-the-art AI-driven document retrieval and question-answering system that leverages OpenAI's GPT models, LangChain, and ChromaDB to provide intelligent insights from uploaded PDF documents. With advanced text chunking, embedding generation, and similarity-based retrieval mechanisms, TutorAI enables users to interact with their documents in an intuitive and efficient manner.

🌟 Introduction

TutorAI is designed to help users engage with their documents like never before. Upload academic papers, reports, or books, and TutorAI will split them into manageable chunks, embed them, and allow intelligent question-answering. By integrating advanced AI and ML techniques, TutorAI provides precise and contextually accurate answers while referencing the original document.

🚀 Features

PDF Upload and Parsing: Load and process multiple PDFs seamlessly.
Text Chunking: Splits large documents into manageable and context-rich chunks.
Embeddings with OpenAI: Utilizes text-embedding-3-small for robust embedding generation.
Cosine Similarity: Measures the similarity between document chunks for high-quality retrieval.
Document Visualization: Visualizes embeddings in 2D using PCA for better understanding.
Multi-Query Retrieval: Generates diverse question formulations to improve search accuracy.
Context-Based QA: Answers questions strictly based on the provided document context, ensuring precision.

💻 Technologies Used

Python: Core programming language.
LangChain: Framework for building modular applications using LLMs.
OpenAI GPT Models: gpt-4o-mini and text-embedding-3-small.
ChromaDB: Vector database for efficient storage and retrieval of document embeddings.
Scikit-Learn: PCA for dimensionality reduction and cosine similarity calculations.
Matplotlib: For data visualization.

🛠 Setup Instructions

Follow these steps to set up and run the project on your local system:

Clone the Repository:

git clone https://github.com/yourusername/TutorAI.git
cd TutorAI

Install Dependencies: Ensure Python 3.8+ is installed, then run:
```
pip install -r requirements.txt
```
Set Up OpenAI API Key: Create a .env file in the project root and add your OpenAI API key:
```
OPENAI_API_KEY=your_openai_api_key
```
Load PDF Files: Place your PDFs in the TutorAI_Data/ directory.
Run the Project: Execute the script:
```
python main.py
```

🔎 How It Works

Document Loading:
- Loads all PDFs from the specified directory.
Text Splitting:
- Uses LangChain's RecursiveCharacterTextSplitter to split text into chunks.
Embedding Generation:
- Embeds chunks using OpenAI's text-embedding-3-small.
Similarity Computation:
- Applies cosine similarity for efficient information retrieval.
Query Handling:
- Processes user questions, formulates diverse variations, and retrieves the best-matching context.
Answer Generation:
- Answers are generated strictly based on the retrieved context.

🧑‍💻 Usage

Upload PDFs to TutorAI_Data/.
Modify and run chain.invoke() to start querying your documents.
Visualize embeddings using the provided PCA functionality.
Retrieve answers with document references for traceability.

📊 Visualization

This project includes a visualization of document embeddings using PCA for a 2D projection. Below is an example scatter plot showing embeddings distributed across two principal components:

🌟 Future Enhancements

Add support for more file formats (e.g., DOCX, TXT).
Enable real-time querying through a web-based interface.
Expand multi-query retrieval with more robust LLM models.
Implement user-friendly feedback loops for iterative improvements.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Intelligent RAG bot.ipynb		Intelligent RAG bot.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TutorAI: Intelligent Document Retrieval and Question Answering System

📜 Table of Contents

🌟 Introduction

🚀 Features

💻 Technologies Used

🛠 Setup Instructions

🔎 How It Works

🧑‍💻 Usage

📊 Visualization

🌟 Future Enhancements

About

Uh oh!

Releases

Packages

Languages

Arham2702/Intelligent-RAG-Bot

Folders and files

Latest commit

History

Repository files navigation

TutorAI: Intelligent Document Retrieval and Question Answering System

📜 Table of Contents

🌟 Introduction

🚀 Features

💻 Technologies Used

🛠 Setup Instructions

🔎 How It Works

🧑‍💻 Usage

📊 Visualization

🌟 Future Enhancements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages