This document provides comprehensive deployment instructions for QuData LLM Processing System across different environments and deployment methods.
# Clone repository
git clone https://github.com/qubasehq/qudata.git
cd qudata
# Development deployment
docker-compose up -d
# Production deployment
cp .env.example .env
# Edit .env with your configuration
mkdir -p secrets
echo "your-secure-password" > secrets/db_password.txt
echo "your-redis-password" > secrets/redis_password.txt
openssl rand -base64 32 > secrets/secret_key.txt
chmod 600 secrets/*
docker-compose -f docker-compose.prod.yml up -d# System dependencies (Ubuntu/Debian)
sudo apt update && sudo apt install -y \
python3 python3-pip python3-venv \
postgresql redis-server nginx supervisor
# Application setup
git clone https://github.com/qubasehq/qudata.git
cd qudata
python3 -m venv venv
source venv/bin/activate
pip install -e ".[ml,web,dev]"
# Configure and start services
# See docs/deployment/manual-deployment.md for detailed instructionsBest for: Development, testing, and production environments requiring containerization.
Advantages:
- Consistent environment across different systems
- Easy scaling and orchestration
- Built-in monitoring and logging
- Simplified dependency management
Files:
Dockerfile- Multi-stage production-ready imagedocker-compose.yml- Development environmentdocker-compose.prod.yml- Production environment with scaling
Documentation: Docker Deployment Guide
Best for: Environments requiring fine-grained control or where Docker is not available.
Advantages:
- Full control over system configuration
- Direct access to system resources
- Custom optimization possibilities
- Integration with existing infrastructure
Documentation: Manual Deployment Guide
Best for: Scalable cloud-native deployments.
Supported Platforms:
- AWS (ECS, EKS, EC2)
- Google Cloud (GKE, Compute Engine)
- Azure (AKS, Container Instances)
- DigitalOcean (Kubernetes, Droplets)
Documentation: Cloud Deployment Guide
graph TB
LB[Load Balancer/Nginx] --> API[QuData API Servers]
API --> DB[(PostgreSQL)]
API --> REDIS[(Redis Cache)]
API --> WORKER[Background Workers]
WORKER --> DB
WORKER --> REDIS
WORKER --> STORAGE[File Storage]
MON[Monitoring] --> API
MON --> DB
MON --> REDIS
MON --> WORKER
LOG[Logging] --> API
LOG --> WORKER
subgraph "Data Processing"
STORAGE --> INGEST[Ingestion]
INGEST --> CLEAN[Cleaning]
CLEAN --> ANNOTATE[Annotation]
ANNOTATE --> SCORE[Quality Scoring]
SCORE --> EXPORT[Export]
end
- Purpose: Local development and testing
- Resources: Minimal (2GB RAM, 2 CPU cores)
- Features: Hot reload, debug logging, development tools
- Services: API, Database, Cache, Optional monitoring
- Purpose: Pre-production testing and validation
- Resources: Medium (8GB RAM, 4 CPU cores)
- Features: Production-like configuration, comprehensive testing
- Services: All production services with reduced scaling
- Purpose: Live production workloads
- Resources: High (16GB+ RAM, 8+ CPU cores)
- Features: High availability, monitoring, backup, security
- Services: Load-balanced API, workers, monitoring, logging
Key configuration through environment variables:
# Application
ENVIRONMENT=production
DEBUG=false
LOG_LEVEL=INFO
# Database
DB_HOST=postgres
DB_NAME=qudata
DB_USER=qudata
DB_PASSWORD=secure_password
# Performance
WORKERS=4
MAX_WORKERS=8
BATCH_SIZE=200
MAX_MEMORY=8GB
# Security
SECRET_KEY=your-secret-key
ALLOWED_HOSTS=your-domain.com.env- Environment-specific variablesconfigs/pipeline.yaml- Processing pipeline configurationconfigs/quality.yaml- Quality scoring thresholdsconfigs/taxonomy.yaml- Content classification rules
Production deployments use secure secret management:
# Docker Secrets
secrets/
├── db_password.txt
├── redis_password.txt
├── secret_key.txt
└── grafana_password.txt
# Kubernetes Secrets
kubectl create secret generic qudata-secrets \
--from-file=db-password=secrets/db_password.txt \
--from-file=redis-password=secrets/redis_password.txtScale services based on load:
# Docker Compose
docker-compose -f docker-compose.prod.yml up -d --scale qudata-api=5 --scale qudata-worker=3
# Kubernetes
kubectl scale deployment qudata-api --replicas=5
kubectl scale deployment qudata-worker --replicas=3Key performance parameters:
# API Server
WORKERS: 4 # Number of API workers
MAX_WORKERS: 8 # Maximum workers
WORKER_TIMEOUT: 300 # Request timeout
# Processing
BATCH_SIZE: 200 # Documents per batch
MAX_MEMORY: 8GB # Memory limit
PARALLEL_WORKERS: 4 # Parallel processing workers
# Database
DB_POOL_SIZE: 10 # Connection pool size
DB_MAX_OVERFLOW: 20 # Additional connections
# Cache
REDIS_MAX_CONNECTIONS: 50 # Redis connection pool
CACHE_TTL: 3600 # Cache expiration| Environment | CPU | RAM | Storage | Network |
|---|---|---|---|---|
| Development | 2 cores | 4GB | 20GB | 100Mbps |
| Staging | 4 cores | 8GB | 100GB | 1Gbps |
| Production | 8+ cores | 16GB+ | 500GB+ | 1Gbps+ |
Built-in health check endpoints:
# Application health
curl http://localhost:8000/health
# Component health
curl http://localhost:8000/health/db
curl http://localhost:8000/health/redis
curl http://localhost:8000/health/storage
# Detailed status
curl http://localhost:8000/statusPrometheus metrics available at /metrics:
- Request/response metrics
- Processing performance
- Resource utilization
- Business metrics (documents processed, quality scores)
Structured JSON logging with multiple levels:
{
"timestamp": "2024-01-15T10:30:00Z",
"level": "INFO",
"service": "qudata-api",
"message": "Document processed successfully",
"document_id": "uuid-here",
"processing_time": 1.23,
"quality_score": 0.85
}Pre-configured Grafana dashboards:
- System Overview
- API Performance
- Processing Metrics
- Quality Analytics
- Error Tracking
- TLS/SSL encryption for all external communication
- Network isolation between services
- Rate limiting and DDoS protection
- Firewall configuration
- Input validation and sanitization
- SQL injection prevention
- XSS protection
- CSRF protection
- Secure headers
- Encryption at rest and in transit
- PII detection and removal
- Access control and authentication
- Audit logging
- Non-root user execution
- Minimal base images
- Regular security updates
- Vulnerability scanning
# Database backup
docker-compose exec postgres pg_dump -U qudata qudata > backup.sql
# Data backup
tar -czf data_backup.tar.gz data/processed/
# Configuration backup
tar -czf config_backup.tar.gz configs/ .env# Database restore
docker-compose exec -T postgres psql -U qudata -d qudata < backup.sql
# Data restore
tar -xzf data_backup.tar.gz
# Service restart
docker-compose restart- Database: Daily at 2 AM
- Processed Data: Weekly
- Configuration: On changes
- Retention: 30 days for database, 90 days for data
-
Service won't start
- Check logs:
docker-compose logs service-name - Verify configuration
- Check resource availability
- Check logs:
-
Database connection issues
- Verify database is running
- Check connection parameters
- Test network connectivity
-
Performance issues
- Monitor resource usage
- Check for bottlenecks
- Scale services if needed
-
Memory issues
- Adjust batch sizes
- Increase memory limits
- Enable streaming processing
Enable debug logging:
export DEBUG=true
export LOG_LEVEL=DEBUG
docker-compose restart qudata- Documentation: docs/
- GitHub Issues: Issues
- Discussions: GitHub Discussions
# Pull latest changes
git pull origin main
# Update dependencies
pip install -e ".[ml,web,dev]" --upgrade
# Restart services
docker-compose restartRegular health checks:
# System health
curl http://localhost:8000/health
# Performance metrics
curl http://localhost:8000/metrics
# Service status
docker-compose psAutomatic log rotation configured:
- Application logs: 30 days retention
- System logs: 7 days retention
- Access logs: 90 days retention
Before deploying to production:
- SSL certificates configured
- Secrets properly secured
- Environment variables set
- Resource limits configured
- Monitoring enabled
- Backup procedures tested
- Security hardening applied
- Load testing completed
- Documentation updated
- Team training completed
If you encounter issues during deployment:
- Check the troubleshooting section
- Review the relevant deployment guide
- Search existing GitHub issues
- Create a new issue with deployment details
- Join the community discussions
To contribute to deployment documentation:
- Fork the repository
- Create a feature branch
- Make your changes
- Test the deployment process
- Submit a pull request
For more information, see CONTRIBUTING.md.