A comprehensive, privacy-focused local LLM backend with advanced document processing, web research, and agentic report generation capabilities. Run powerful AI models entirely on your own hardware without sending data to external services.
- ๐ค Multi-Model Support: Seamlessly switch between specialized models for different tasks
- Chat/Report Generation: Qwen2.5-14B-Instruct (Q6_K) for high-quality text generation
- Vision Analysis: Qwen2.5-VL-3B-Instruct for image understanding and analysis
- Code Generation: Qwen2.5-Coder-7B for intelligent code writing and analysis
- Embeddings: Qwen3-Embedding-0.6B for semantic search and similarity
-
๐ Advanced Document Processing
- Multi-format support (PDF, DOCX, TXT, MD, RTF)
- Intelligent document structure analysis
- Automatic section and heading detection
- Table and list extraction
- Multi-file batch processing
-
๐ Deep Web Research
- Privacy-focused web search integration
- Multi-depth research with source verification
- Automatic query variant generation
- Source credibility analysis
- Related topic discovery
-
๐ Agentic Report Generation
- AI-powered iterative report writing
- Dynamic template parsing
- Automatic chart and visualization generation
- Multi-section report structuring
- Progress tracking and streaming updates
-
๐ผ๏ธ Vision Capabilities
- Image analysis and understanding
- OCR and text extraction
- Visual question answering
- Multi-image processing
-
๐ฌ Interactive Chat
- Context-aware conversations
- Streaming responses
- Chat history with embeddings
- Model-specific optimizations
- ๐ 100% Local Processing: All data stays on your machine
- ๐ก๏ธ Privacy Mode: Anonymizes search queries and removes tracking
- ๐ No External API Calls: Complete control over your data
- ๐๏ธ Automatic Cleanup: Temporary files are managed securely
private-gpt-agents/
โโโ main.py # FastAPI application entry point
โโโ config.py # Configuration and settings
โโโ models/ # LLM model management
โ โโโ llm_manager.py # Model loading and inference
โ โโโ qwen_vision_manager.py # Vision model handling
โโโ services/ # Core business logic
โ โโโ deep_research.py # Web research service
โ โโโ web_search.py # Privacy-focused search
โ โโโ file_processor.py # Document handling
โ โโโ report_generator.py # Basic report generation
โ โโโ agentic_report_generator.py # Advanced AI reports
โ โโโ advanced_document_processor.py # PDF/DOCX analysis
โ โโโ embedding_service.py # Vector embeddings
โ โโโ enhanced_chart_generator.py # Data visualization
โโโ frontend/ # React-based UI
โ โโโ src/
โ โ โโโ components/ # Reusable UI components
โ โ โโโ pages/ # Application pages
โ โ โโโ services/ # API client
โ โโโ public/
โโโ training/ # Model fine-tuning tools
โ โโโ train_dpo.py # DPO training
โ โโโ merge_and_quantise.py # Model optimization
โ โโโ serve.py # Model serving
โโโ templates/ # Report templates
โโโ scripts/ # Utility scripts
โโโ docker-compose.yml # Container orchestration
-
Hardware:
- Minimum: 16GB RAM, 20GB free disk space
- Recommended: 32GB RAM, 50GB SSD, GPU with 8GB+ VRAM
- Optimized for: Apple Silicon (M1/M2/M3), NVIDIA GPUs, or modern CPUs
-
Software:
- Docker and Docker Compose (recommended)
- OR Python 3.11+, Node.js 18+
# Clone the repository
git clone https://github.com/piyushgit011/private-gpt-agents.git
cd private-gpt-agents
# Setup environment
make setup
# OR manually:
mkdir -p models/{report_generation,vision,coding,embedding}
mkdir -p templates temp_files logs
cp .env.example .env
# Download models (see Models section)
python scripts/download_models.py
# Build and start services
make build
make start
# Access the application
# Frontend: http://localhost:3000
# Backend API: http://localhost:8000
# API Docs: http://localhost:8000/docschmod +x setup_linux.sh
./setup_linux.sh.\setup_windows.ps1# Backend
pip install -r requirements.txt
uvicorn main:app --reload --host 0.0.0.0 --port 8000
# Frontend (in another terminal)
cd frontend
npm install
npm run devThe application uses specialized GGUF models for optimal performance. Download models to their respective directories:
-
Report Generation Model
# Download Qwen2.5-14B-Instruct Q6_K mkdir -p models/report_generation cd models/report_generation wget https://huggingface.co/Qwen/Qwen2.5-14B-Instruct-GGUF/resolve/main/qwen2.5-14b-instruct-q6_k.gguf
-
Vision Model (Auto-downloads via HuggingFace Transformers)
# Qwen2.5-VL-3B-Instruct will download automatically on first use # Ensure you have transformers and torch installed
-
Coding Model
mkdir -p models/coding cd models/coding wget https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q6_k.gguf -
Embedding Model
mkdir -p models/embedding cd models/embedding wget https://huggingface.co/Qwen/Qwen3-Embedding-0.6B-GGUF/resolve/main/Qwen3-Embedding-0.6B-Q8_0.gguf
python scripts/download_models.pyEdit .env file to customize settings:
# Model Configuration
MODELS_BASE_PATH=./models/
N_GPU_LAYERS=35 # Adjust for your GPU
N_THREADS=8 # Adjust for your CPU
# Web Search
WEB_SEARCH_ENABLED=true
PRIVACY_MODE=true
ANONYMIZE_QUERIES=true
# Research Settings
MAX_RESEARCH_DEPTH=3
MAX_SOURCES_PER_RESEARCH=20
# Performance (M1/M2 Mac optimized)
CONTEXT_LENGTH_CHAT=4096
CONTEXT_LENGTH_CODING=16384
MAX_MEMORY_USAGE=0.75See .env.example for all available options.
Once running, visit http://localhost:8000/docs for interactive API documentation.
POST /chat
{
"message": "Explain quantum computing",
"model_type": "report_generation",
"stream": false
}POST /vision/analyze
Content-Type: multipart/form-data
image: <file>
prompt: "Describe this image in detail"POST /research
{
"query": "Latest developments in AI",
"max_sources": 10,
"depth": 2
}POST /reports/generate
{
"topic": "Climate Change Impact",
"template": "research_report",
"sections": ["introduction", "analysis", "conclusion"],
"use_web_search": true
}POST /files/upload/multiple
Content-Type: multipart/form-data
files: <file1>, <file2>, ...- Conduct deep research on any topic
- Generate comprehensive reports with citations
- Analyze multiple documents simultaneously
- Extract and summarize key information
- Process PDFs, Word documents, and text files
- Extract structured information
- Generate summaries and insights
- Compare multiple documents
- Generate code in multiple languages
- Explain and analyze existing code
- Debug and optimize code
- Generate documentation
- Analyze images and screenshots
- Extract text from images (OCR)
- Describe visual content
- Answer questions about images
- Generate articles and reports
- Create structured documents
- Generate visualizations and charts
- Multi-format output (PDF, DOCX, HTML)
# Backend tests
pytest
# Frontend tests
cd frontend
npm testmake help # Show all available commands
make setup # Initial setup
make install # Install dependencies
make build # Build Docker containers
make start # Start services
make stop # Stop services
make restart # Restart services
make logs # View logs
make clean # Clean temporary files
make test # Run tests-
Backend (Python/FastAPI)
- Models use llama-cpp-python for GGUF support
- Vision models use HuggingFace Transformers
- Async/await for concurrent operations
- Streaming responses for real-time feedback
-
Frontend (React/TypeScript)
- Modern React with hooks
- TailwindCSS for styling
- React Query for data fetching
- Dark mode support
# Check model files
ls -lh models/*/
# Verify model paths in config.py
python diagnose_models.py- Reduce
N_GPU_LAYERSin.env - Use smaller quantized models (Q4_K_M instead of Q6_K)
- Decrease
CONTEXT_LENGTHvalues - Enable model unloading after use
# Rebuild containers
docker-compose down
docker-compose build --no-cache
docker-compose up
# Check logs
docker-compose logs -f backend
docker-compose logs -f frontend- Apple Silicon: Set
N_GPU_LAYERS=-1for Metal acceleration - NVIDIA GPU: Ensure CUDA is properly configured
- CPU Only: Reduce model size and context length
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Qwen Team for excellent open-source models
- llama.cpp for efficient model inference
- FastAPI for the robust backend framework
- React and TailwindCSS for the modern frontend
For issues, questions, or suggestions:
- Open an issue on GitHub
- Check existing documentation
- Review closed issues for solutions
- Multi-user support with authentication
- Model fine-tuning interface
- RAG (Retrieval Augmented Generation)
- Plugin system for extensions
- Mobile responsive UI improvements
- Advanced analytics dashboard
- Export formats (Markdown, LaTeX, EPUB)
- Voice input/output support
- Collaborative editing features
Built with โค๏ธ for privacy-conscious AI enthusiasts