Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
google_io_2025.pdf	google_io_2025.pdf
main.py	main.py
requirements.txt	requirements.txt

Name

Last commit message

Last commit date

google_io_2025.pdf

PDF Chatbot

A powerful Python-based chatbot that can answer questions about the content of PDF documents using advanced natural language processing and machine learning techniques.

Features

Extract text from PDF documents
Process and chunk text for efficient analysis
Use semantic search to find relevant content
Generate accurate answers to questions about the PDF content
Command-line interface for easy interaction

Requirements

Python 3.7+
PyPDF2
sentence-transformers
transformers
torch
Other dependencies listed in requirements.txt

Installation

Clone this repository:

git clone <repository-url>
cd 01-pdf-chatbot

Create and activate a virtual environment (recommended):

python -m venv .venv
source .venv/bin/activate  # On Windows, use: .venv\Scripts\activate

Install the required packages:

pip install -r requirements.txt

Usage

You can use the chatbot in two ways:

1. Command Line Arguments

python main.py --pdf_path "path/to/your/document.pdf" --question "Your question here?"

2. Interactive Mode

Simply run the script and follow the prompts:

python main.py

The program will ask you to:

Enter the path to your PDF file
Enter your question about the PDF content

How It Works

Text Extraction: The system extracts text from the provided PDF file using PyPDF2.
Text Processing: The extracted text is split into manageable chunks for efficient processing.
Semantic Search: When a question is asked, the system uses sentence transformers to find the most relevant text chunk.
Answer Generation: The relevant context and question are processed through a T5 model to generate an accurate answer.

Technical Details

Uses google/flan-t5-xl for question answering
Employs all-MiniLM-L6-v2 for semantic text similarity
Implements efficient text chunking with configurable chunk sizes
Handles PDF reading errors gracefully

Error Handling

The system includes robust error handling for:

Missing PDF files
PDF reading errors
Invalid input
Processing errors

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT

Acknowledgments

PyPDF2 for PDF processing
Hugging Face for the transformer models
Sentence Transformers for semantic search capabilities

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

PDF Chatbot

Features

Requirements

Installation

Usage

1. Command Line Arguments

2. Interactive Mode

How It Works

Technical Details

Error Handling

Contributing

License

Acknowledgments

FilesExpand file tree

01-pdf-chatbot

Directory actions

More options

Directory actions

More options

Latest commit

History

01-pdf-chatbot

Folders and files

parent directory

README.md

PDF Chatbot

Features

Requirements

Installation

Usage

1. Command Line Arguments

2. Interactive Mode

How It Works

Technical Details

Error Handling

Contributing

License

Acknowledgments