🔬 DeepResearcher

Advanced AI Research Assistant with Local LLM Integration and RAG Pipeline

✨ Features

🤖 Local LLM Integration: Uses local language models via llama-cpp-python for privacy and control
📚 RAG Pipeline: Retrieval-Augmented Generation with vector embeddings using FAISS
🌐 Web Search: Live web search integration using DuckDuckGo
📄 Multi-modal File Support:
- PDF and DOCX document processing
- Image OCR using Tesseract
- Audio transcription with Faster-Whisper
- Video analysis (audio + frame OCR)
💻 Streamlit Web UI: Intuitive interface with drag & drop, chat, and settings
💾 Persistent Storage: Save and load knowledge bases
📊 Export Options: Download chat history and results

🚀 Quick Start

Prerequisites

Python 3.8+
Git
Tesseract OCR (for image processing)
FFmpeg (for audio/video processing)

Installation

Clone the repository ```bash git clone https://github.com/Drxmukesh/DeepResearch.git cd deepresearcher ```
Install dependencies ```bash pip install -r requirements.txt ```
Install system dependencies

Ubuntu/Debian: ```bash sudo apt-get update sudo apt-get install tesseract-ocr ffmpeg ```

macOS: ```bash brew install tesseract ffmpeg ```

Windows: "its only run in visual studio c++ build"
- Download Tesseract from: https://github.com/UB-Mannheim/tesseract/wiki
- Download FFmpeg from: https://ffmpeg.org/download.html
- "windows user also run by wsl"
Download LLM Models ```bash python scripts/setup_models.py ```

Or manually download models to the models/ directory:
- Llama-2-7B-Chat: Download Link
- Mistral-7B-Instruct: Download Link
Run the application ```bash streamlit run app.py ```
Open your browser and navigate to http://localhost:8501

📖 Usage Guide

1. Upload Documents

Drag and drop files into the upload area
Supported formats: PDF, DOCX, TXT, PNG, JPG, MP3, WAV, MP4, AVI
Click "Process Files" to add them to your knowledge base

2. Configure Settings

Use the sidebar to adjust:

Model Selection: Choose your local LLM
Generation Parameters: Temperature, max tokens, context length
RAG Settings: Web search, similarity threshold
File Processing: Chunk size and overlap

3. Ask Questions

Type your research question in the chat interface
DeepResearcher will search your documents and the web
Get comprehensive answers with source citations

4. Manage Knowledge Base

View document count in the Knowledge Base tab
Clear or backup your vector store
Export chat history as Markdown

🏗️ Architecture

``` DeepResearcher/ ├── app.py # Main Streamlit application ├── config.py # Configuration settings ├── requirements.txt # Python dependencies ├── utils/ │ ├── llm_handler.py # Local LLM management │ ├── rag_pipeline.py # RAG implementation │ ├── file_processor.py # Multi-modal file processing │ ├── web_search.py # Web search functionality │ └── vector_store.py # FAISS vector database ├── models/ # Local LLM models ├── vector_store/ # Persistent vector storage └── scripts/ └── setup_models.py # Model download script ```

🔧 Configuration

Model Configuration

Edit config.py to customize:

Model paths and settings
RAG parameters
File processing options
Web search settings

Environment Variables

Create a .env file for sensitive settings: ``` OPENAI_API_KEY=your_key_here # Optional: for embedding models HUGGINGFACE_TOKEN=your_token # Optional: for model downloads ```

📊 Supported File Types

Type	Extensions	Processing Method
Documents	PDF, DOCX, TXT	Text extraction
Images	PNG, JPG, JPEG, GIF	OCR with Tesseract
Audio	MP3, WAV, M4A, FLAC	Transcription with Whisper
Video	MP4, AVI, MOV, MKV	Audio transcription + frame OCR

🚀 Advanced Features

Custom Prompts

Modify system prompts in utils/rag_pipeline.py to customize AI behavior.

Model Integration

Add new LLM models by:

Placing model files in models/ directory
Updating config.py with model paths
Restarting the application

Vector Store Backends

Switch between FAISS and Chroma by modifying utils/vector_store.py.

🐛 Troubleshooting

Common Issues

Model Loading Errors:

Ensure model files are in the correct format (GGML/GGUF)
Check available RAM (models require 4-8GB)
Verify model paths in configuration

File Processing Errors:

Install system dependencies (Tesseract, FFmpeg)
Check file permissions and sizes
Ensure supported file formats

Web Search Issues:

Check internet connection
Verify DuckDuckGo search is not blocked
Try reducing max web results

Performance Optimization

For Better Speed:

Use quantized models (Q4_0, Q5_0)
Reduce context length for faster inference
Enable GPU acceleration if available

For Better Quality:

Use larger models (13B, 30B parameters)
Increase context length
Adjust temperature for more creative responses

🤝 Contributing

Fork the repository
Create a feature branch: git checkout -b feature-name
Make your changes and test thoroughly
Submit a pull request with detailed description

Development Setup

```bash

Install development dependencies

pip install -r requirements-dev.txt

Run tests

python -m pytest tests/

Format code

black . isort . ```

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

llama-cpp-python for local LLM inference
FAISS for vector similarity search
Streamlit for the web interface
Sentence Transformers for embeddings
Faster-Whisper for audio transcription

📞 Support

📧 Email: drxmukeshchoudhary@gmail.com
🐛 Issues: GitHub Issues

DeepResearcher - Empowering research with local AI and advanced RAG capabilities! 🔬✨

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔬 DeepResearcher

✨ Features

🚀 Quick Start

Prerequisites

Installation

📖 Usage Guide

1. Upload Documents

2. Configure Settings

3. Ask Questions

4. Manage Knowledge Base

🏗️ Architecture

🔧 Configuration

Model Configuration

Environment Variables

📊 Supported File Types

🚀 Advanced Features

Custom Prompts

Model Integration

Vector Store Backends

🐛 Troubleshooting

Common Issues

Performance Optimization

🤝 Contributing

Development Setup

Install development dependencies

Run tests

Format code

📄 License

🙏 Acknowledgments

📞 Support

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
__pycache__		__pycache__
scripts		scripts
utils		utils
vector_store		vector_store
README.md		README.md
app.py		app.py
config.py		config.py
requirements.txt		requirements.txt

KodeCharya/DeepResearch

Folders and files

Latest commit

History

Repository files navigation

🔬 DeepResearcher

✨ Features

🚀 Quick Start

Prerequisites

Installation

📖 Usage Guide

1. Upload Documents

2. Configure Settings

3. Ask Questions

4. Manage Knowledge Base

🏗️ Architecture

🔧 Configuration

Model Configuration

Environment Variables

📊 Supported File Types

🚀 Advanced Features

Custom Prompts

Model Integration

Vector Store Backends

🐛 Troubleshooting

Common Issues

Performance Optimization

🤝 Contributing

Development Setup

Install development dependencies

Run tests

Format code

📄 License

🙏 Acknowledgments

📞 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages