Private GPT Agents

A comprehensive, privacy-focused local LLM backend with advanced document processing, web research, and agentic report generation capabilities. Run powerful AI models entirely on your own hardware without sending data to external services.

🌟 Features

Core Capabilities

🤖 Multi-Model Support: Seamlessly switch between specialized models for different tasks
- Chat/Report Generation: Qwen2.5-14B-Instruct (Q6_K) for high-quality text generation
- Vision Analysis: Qwen2.5-VL-3B-Instruct for image understanding and analysis
- Code Generation: Qwen2.5-Coder-7B for intelligent code writing and analysis
- Embeddings: Qwen3-Embedding-0.6B for semantic search and similarity

Advanced Features

📄 Advanced Document Processing
- Multi-format support (PDF, DOCX, TXT, MD, RTF)
- Intelligent document structure analysis
- Automatic section and heading detection
- Table and list extraction
- Multi-file batch processing
🔍 Deep Web Research
- Privacy-focused web search integration
- Multi-depth research with source verification
- Automatic query variant generation
- Source credibility analysis
- Related topic discovery
📊 Agentic Report Generation
- AI-powered iterative report writing
- Dynamic template parsing
- Automatic chart and visualization generation
- Multi-section report structuring
- Progress tracking and streaming updates
🖼️ Vision Capabilities
- Image analysis and understanding
- OCR and text extraction
- Visual question answering
- Multi-image processing
💬 Interactive Chat
- Context-aware conversations
- Streaming responses
- Chat history with embeddings
- Model-specific optimizations

Privacy & Security

🔒 100% Local Processing: All data stays on your machine
🛡️ Privacy Mode: Anonymizes search queries and removes tracking
🔐 No External API Calls: Complete control over your data
🗑️ Automatic Cleanup: Temporary files are managed securely

🏗️ Architecture

private-gpt-agents/
├── main.py                 # FastAPI application entry point
├── config.py               # Configuration and settings
├── models/                 # LLM model management
│   ├── llm_manager.py     # Model loading and inference
│   └── qwen_vision_manager.py  # Vision model handling
├── services/               # Core business logic
│   ├── deep_research.py   # Web research service
│   ├── web_search.py      # Privacy-focused search
│   ├── file_processor.py  # Document handling
│   ├── report_generator.py    # Basic report generation
│   ├── agentic_report_generator.py  # Advanced AI reports
│   ├── advanced_document_processor.py  # PDF/DOCX analysis
│   ├── embedding_service.py   # Vector embeddings
│   └── enhanced_chart_generator.py  # Data visualization
├── frontend/               # React-based UI
│   ├── src/
│   │   ├── components/    # Reusable UI components
│   │   ├── pages/         # Application pages
│   │   └── services/      # API client
│   └── public/
├── training/               # Model fine-tuning tools
│   ├── train_dpo.py       # DPO training
│   ├── merge_and_quantise.py  # Model optimization
│   └── serve.py           # Model serving
├── templates/              # Report templates
├── scripts/                # Utility scripts
└── docker-compose.yml      # Container orchestration

🚀 Quick Start

Prerequisites

Hardware:
- Minimum: 16GB RAM, 20GB free disk space
- Recommended: 32GB RAM, 50GB SSD, GPU with 8GB+ VRAM
- Optimized for: Apple Silicon (M1/M2/M3), NVIDIA GPUs, or modern CPUs
Software:
- Docker and Docker Compose (recommended)
- OR Python 3.11+, Node.js 18+

Option 1: Docker (Recommended)

# Clone the repository
git clone https://github.com/piyushgit011/private-gpt-agents.git
cd private-gpt-agents

# Setup environment
make setup
# OR manually:
mkdir -p models/{report_generation,vision,coding,embedding}
mkdir -p templates temp_files logs
cp .env.example .env

# Download models (see Models section)
python scripts/download_models.py

# Build and start services
make build
make start

# Access the application
# Frontend: http://localhost:3000
# Backend API: http://localhost:8000
# API Docs: http://localhost:8000/docs

Option 2: Local Development

Linux Setup

chmod +x setup_linux.sh
./setup_linux.sh

Windows Setup

.\setup_windows.ps1

Manual Setup

# Backend
pip install -r requirements.txt
uvicorn main:app --reload --host 0.0.0.0 --port 8000

# Frontend (in another terminal)
cd frontend
npm install
npm run dev

📦 Model Setup

The application uses specialized GGUF models for optimal performance. Download models to their respective directories:

Required Models

Report Generation Model

# Download Qwen2.5-14B-Instruct Q6_K
mkdir -p models/report_generation
cd models/report_generation
wget https://huggingface.co/Qwen/Qwen2.5-14B-Instruct-GGUF/resolve/main/qwen2.5-14b-instruct-q6_k.gguf

Vision Model (Auto-downloads via HuggingFace Transformers)

# Qwen2.5-VL-3B-Instruct will download automatically on first use
# Ensure you have transformers and torch installed

Coding Model

mkdir -p models/coding
cd models/coding
wget https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q6_k.gguf

Embedding Model

mkdir -p models/embedding
cd models/embedding
wget https://huggingface.co/Qwen/Qwen3-Embedding-0.6B-GGUF/resolve/main/Qwen3-Embedding-0.6B-Q8_0.gguf

Using the Download Script

python scripts/download_models.py

🔧 Configuration

Edit .env file to customize settings:

# Model Configuration
MODELS_BASE_PATH=./models/
N_GPU_LAYERS=35          # Adjust for your GPU
N_THREADS=8              # Adjust for your CPU

# Web Search
WEB_SEARCH_ENABLED=true
PRIVACY_MODE=true
ANONYMIZE_QUERIES=true

# Research Settings
MAX_RESEARCH_DEPTH=3
MAX_SOURCES_PER_RESEARCH=20

# Performance (M1/M2 Mac optimized)
CONTEXT_LENGTH_CHAT=4096
CONTEXT_LENGTH_CODING=16384
MAX_MEMORY_USAGE=0.75

See .env.example for all available options.

📖 API Documentation

Once running, visit http://localhost:8000/docs for interactive API documentation.

Key Endpoints

Chat

POST /chat
{
  "message": "Explain quantum computing",
  "model_type": "report_generation",
  "stream": false
}

Vision Analysis

POST /vision/analyze
Content-Type: multipart/form-data

image: <file>
prompt: "Describe this image in detail"

Web Research

POST /research
{
  "query": "Latest developments in AI",
  "max_sources": 10,
  "depth": 2
}

Report Generation

POST /reports/generate
{
  "topic": "Climate Change Impact",
  "template": "research_report",
  "sections": ["introduction", "analysis", "conclusion"],
  "use_web_search": true
}

File Processing

POST /files/upload/multiple
Content-Type: multipart/form-data

files: <file1>, <file2>, ...

🎯 Use Cases

1. Research Assistant

Conduct deep research on any topic
Generate comprehensive reports with citations
Analyze multiple documents simultaneously
Extract and summarize key information

2. Document Analysis

Process PDFs, Word documents, and text files
Extract structured information
Generate summaries and insights
Compare multiple documents

3. Code Assistant

Generate code in multiple languages
Explain and analyze existing code
Debug and optimize code
Generate documentation

4. Vision Tasks

Analyze images and screenshots
Extract text from images (OCR)
Describe visual content
Answer questions about images

5. Content Creation

Generate articles and reports
Create structured documents
Generate visualizations and charts
Multi-format output (PDF, DOCX, HTML)

🛠️ Development

Running Tests

# Backend tests
pytest

# Frontend tests
cd frontend
npm test

Makefile Commands

make help           # Show all available commands
make setup          # Initial setup
make install        # Install dependencies
make build          # Build Docker containers
make start          # Start services
make stop           # Stop services
make restart        # Restart services
make logs           # View logs
make clean          # Clean temporary files
make test           # Run tests

Project Structure Details

Backend (Python/FastAPI)
- Models use llama-cpp-python for GGUF support
- Vision models use HuggingFace Transformers
- Async/await for concurrent operations
- Streaming responses for real-time feedback
Frontend (React/TypeScript)
- Modern React with hooks
- TailwindCSS for styling
- React Query for data fetching
- Dark mode support

🐛 Troubleshooting

Model Loading Issues

# Check model files
ls -lh models/*/

# Verify model paths in config.py
python diagnose_models.py

Memory Issues

Reduce N_GPU_LAYERS in .env
Use smaller quantized models (Q4_K_M instead of Q6_K)
Decrease CONTEXT_LENGTH values
Enable model unloading after use

Docker Issues

# Rebuild containers
docker-compose down
docker-compose build --no-cache
docker-compose up

# Check logs
docker-compose logs -f backend
docker-compose logs -f frontend

Performance Optimization

Apple Silicon: Set N_GPU_LAYERS=-1 for Metal acceleration
NVIDIA GPU: Ensure CUDA is properly configured
CPU Only: Reduce model size and context length

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Qwen Team for excellent open-source models
llama.cpp for efficient model inference
FastAPI for the robust backend framework
React and TailwindCSS for the modern frontend

📧 Support

For issues, questions, or suggestions:

Open an issue on GitHub
Check existing documentation
Review closed issues for solutions

🔮 Roadmap

Multi-user support with authentication
Model fine-tuning interface
RAG (Retrieval Augmented Generation)
Plugin system for extensions
Mobile responsive UI improvements
Advanced analytics dashboard
Export formats (Markdown, LaTeX, EPUB)
Voice input/output support
Collaborative editing features

Built with ❤️ for privacy-conscious AI enthusiasts

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
frontend		frontend
models		models
scripts		scripts
services		services
temp_files		temp_files
templates		templates
training		training
utils		utils
.DS_Store		.DS_Store
.env		.env
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
config.py		config.py
diagnose_models.py		diagnose_models.py
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
main.py		main.py
setup_linux.sh		setup_linux.sh
setup_windows.ps1		setup_windows.ps1
test_8000_tokens_fix.py		test_8000_tokens_fix.py

Folders and files

Latest commit

History

Repository files navigation

Private GPT Agents

🌟 Features

Core Capabilities

Advanced Features

Privacy & Security

🏗️ Architecture

🚀 Quick Start

Prerequisites

Option 1: Docker (Recommended)

Option 2: Local Development

Linux Setup

Windows Setup

Manual Setup

📦 Model Setup

Required Models

Using the Download Script

🔧 Configuration

📖 API Documentation

Key Endpoints

Chat

Vision Analysis

Web Research

Report Generation

File Processing

🎯 Use Cases

1. Research Assistant

2. Document Analysis

3. Code Assistant

4. Vision Tasks

5. Content Creation

🛠️ Development

Running Tests

Makefile Commands

Project Structure Details

🐛 Troubleshooting

Model Loading Issues

Memory Issues

Docker Issues

Performance Optimization

🤝 Contributing

📝 License

🙏 Acknowledgments

📧 Support

🔮 Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages