DocuChat is a powerful and extensible command-line chatbot that can answer questions from local PDF documents as well as engage in general conversations using a language model. It leverages PDF parsing, vector embeddings, and retrieval-augmented generation (RAG) to provide accurate, context-aware responses.
- Ingest local PDF documents
- Ask questions about uploaded PDFs
- Chat with LLM about general topics
- Uses vector embeddings for document retrieval
- Easily extendable with additional tools or agents
- Clone the repository
git clone https://github.com/your-username/docuchat.git
cd docuchat- Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate- Install dependencies
pip install -r requirements.txtdocuchat/
├── data/
│ └── pdfs/ # Place your PDF files here
├── docuchat/
│ ├── __init__.py
│ ├── agent.py # Main agent logic
│ ├── config.py # Configuration file
│ ├── data_loader.py # PDF loader and splitter
│ ├── vector.py # Vector store handling
│ └── cli.py # Typer CLI entry point
├── requirements.txt
└── README.md
Put your PDF files in data/pdfs/ and run:
python main.py ingestThis will:
- Load all PDFs from the folder
- Chunk them
- Create vector embeddings and store them
python main.py chatYou'll be able to:
- Ask questions about your documents
- Have general conversations with the LLM
- Add tools or agents in
agent.py - Customize prompts or retrieval strategies
- Plug in new data sources or formats
$ python main.py chat
Docuchat ready! Type your questions (exit to quit)
You: Give me summary of my pdfs
=== ANSWER ===
Based on the results of the tool call, it appears that the PDFs contain information about a university competition and a team that won an award. Here is a summary of the content:
The Xylem Global Student Innovation Challenge is a competition that encourages students to design innovative solutions to water-related issues. This year, over 4,400 students from around 150 countries submitted projects, with a focus on flood response, climate resilience, and clean water access. Team Khudrorin won the 'Best Project' award, and Team ResQMap won the 'Community Choice Award' in the university track and received a prize money of $500.