Name	Name	Last commit message	Last commit date
parent directory ..
data	data
README.md	README.md
main.py	main.py
requirements.txt	requirements.txt

RAG Bot

A powerful Python-based chatbot that uses Retrieval-Augmented Generation (RAG) to answer questions about multiple PDF documents. This system combines advanced natural language processing, semantic search, and large language models to provide accurate answers based on document content.

Features

Process multiple PDF documents from a folder
Extract and process text from PDFs efficiently
Split text into semantic chunks for better context
Use sentence transformers for semantic search
Generate accurate answers using T5 model
Command-line interface for easy interaction
Detailed logging for monitoring and debugging

Requirements

Python 3.9+
PyPDF2
sentence-transformers
transformers
torch
Other dependencies listed in requirements.txt

Installation

Clone this repository:

git clone <repository-url>
cd 02-rag-bot

Create and activate a virtual environment (recommended):

python -m venv .venv
source .venv/bin/activate  # On Windows, use: .venv\Scripts\activate

Install the required packages:

pip install -r requirements.txt

Usage

You can use the RAG bot in two ways:

1. Command Line Arguments

python main.py --folder_path "path/to/your/pdf/folder" --question "Your question here?"

2. Interactive Mode

Simply run the script and follow the prompts:

python main.py

The program will ask you to:

Enter the path to your folder containing PDF files
Enter your question about the PDF content

How It Works

PDF Processing: The system processes all PDF files in the specified folder using PyPDF2.
Text Extraction: Text is extracted from each PDF and combined for processing.
Text Chunking: The combined text is split into semantic chunks for efficient processing.
Embedding Generation: Each text chunk is converted into embeddings using sentence transformers.
Semantic Search: When a question is asked, the system finds the most relevant chunks using cosine similarity.
Answer Generation: The relevant context and question are processed through a T5 model to generate an accurate answer.

Technical Details

Uses google/flan-t5-xl for question answering
Employs all-MiniLM-L6-v2 for semantic text similarity
Implements efficient text chunking with configurable chunk sizes
Processes multiple PDFs in a single folder
Includes comprehensive logging for monitoring and debugging
Handles PDF reading errors gracefully

Error Handling

The system includes robust error handling for:

Missing folders
Invalid folder paths
PDF reading errors
No PDF files found in folder
Processing errors
Model generation errors

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT

Acknowledgments

PyPDF2 for PDF processing
Hugging Face for the transformer models
Sentence Transformers for semantic search capabilities

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

RAG Bot

Features

Requirements

Installation

Usage

1. Command Line Arguments

2. Interactive Mode

How It Works

Technical Details

Error Handling

Contributing

License

Acknowledgments

FilesExpand file tree

02-rag-bot

Directory actions

More options

Directory actions

More options

Latest commit

History

02-rag-bot

Folders and files

parent directory

README.md

RAG Bot

Features

Requirements

Installation

Usage

1. Command Line Arguments

2. Interactive Mode

How It Works

Technical Details

Error Handling

Contributing

License

Acknowledgments