This guide covers deploying the UAP (Unified Agentic Platform) to production environments using SkyPilot for multi-cloud deployment with comprehensive monitoring and secrets management.
- SkyPilot - Multi-cloud orchestration
- Teller - Secrets management
- Docker - Containerization
- DevBox - Development environment
Before deployment, ensure you have:
- Valid cloud provider credentials (AWS, GCP, or Azure)
- Sufficient quota for GPU instances
- Secrets properly configured in your chosen secrets provider
# Deploy to auto-selected cloud with health checks
./scripts/deploy-production.sh --cloud auto --test --monitor
# Deploy to specific cloud provider
./scripts/deploy-production.sh --cloud gcp --env production --backup
# Cost-optimized deployment
./scripts/deploy-production.sh --cloud cost-optimized --monitor# Setup comprehensive monitoring stack
./scripts/setup-monitoring.sh
# Start monitoring services
docker-compose -f docker-compose.monitoring.yml up -d# Verify deployment health
./scripts/health-check.shUses the general production configuration that supports failover across all clouds:
./scripts/deploy-production.sh --cloud autoOptimized for AWS with spot instance handling:
./scripts/deploy-production.sh --cloud aws --region us-west-2Optimized for GCP with preemptible instances:
./scripts/deploy-production.sh --cloud gcp --region us-central1Optimized for Azure with spot VM handling:
./scripts/deploy-production.sh --cloud azure --region eastusAutomatically selects cheapest resources across all clouds:
./scripts/deploy-production.sh --cloud cost-optimizedskypilot/uap-production.yaml- General production (multi-cloud)skypilot/uap-aws.yaml- AWS-specific optimizationsskypilot/uap-gcp.yaml- GCP-specific optimizationsskypilot/uap-azure.yaml- Azure-specific optimizationsskypilot/uap-cost-optimized.yaml- Cost optimization priority
.env.production.template- Production environment variables.env.staging.template- Staging environment variables
Dockerfile- Multi-stage production builddocker-compose.production.yml- Complete production stackdocker-compose.monitoring.yml- Monitoring services
The .teller.yml file configures multi-provider secrets management:
providers:
google_secret_manager: # Primary
hashicorp_vault: # Secondary
aws_secret_manager: # TertiaryCore secrets that must be configured:
COPILOTKIT_API_KEY- CopilotKit framework accessAGNO_API_KEY- Agno framework accessMASTRA_API_KEY- Mastra framework access
OPENAI_API_KEY- OpenAI API accessANTHROPIC_API_KEY- Anthropic API access
DATABASE_URL- PostgreSQL connection stringREDIS_URL- Redis connection stringJWT_SECRET- JWT signing secret
GOOGLE_APPLICATION_CREDENTIALS_JSON- GCP service accountAWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY- AWS credentialsAZURE_CLIENT_ID/AZURE_CLIENT_SECRET- Azure credentials
# Create secrets in Google Secret Manager
gcloud secrets create openai-api-key --data-file=openai-key.txt
gcloud secrets create anthropic-api-key --data-file=anthropic-key.txt
# ... create other secrets# Test secrets configuration
teller run echo "Secrets loaded successfully"./scripts/deploy-production.sh [OPTIONS]
Options:
-c, --cloud CLOUD Target cloud (aws|gcp|azure|auto|cost-optimized)
-e, --env ENV Environment (production|staging)
-r, --region REGION Target region
-t, --test Run tests before deployment
-d, --dry-run Show deployment plan without executing
-f, --force Force deployment even if health checks fail
-b, --backup Create backup before deployment
-m, --monitor Enable monitoring setup
-h, --help Show helpKey environment variables for customization:
# Resource Configuration
export UVICORN_WORKERS=4
export AGNO_GPU_MEMORY="8GB"
export MASTRA_WORKER_COUNT=4
# Performance Tuning
export MAX_CONCURRENT_REQUESTS=1000
export REQUEST_TIMEOUT=300
export RATE_LIMIT_PER_MINUTE=100
# Feature Flags
export ENABLE_METRICS=true
export ENABLE_TRACING=true
export ENABLE_RATE_LIMITING=trueThe monitoring setup includes:
- Prometheus - Metrics collection and alerting
- Grafana - Visualization and dashboards
- Node Exporter - System metrics
- cAdvisor - Container metrics
- Custom exporters - Redis, PostgreSQL, NGINX metrics
Monitor these critical metrics:
- API response time (target: <2s p95)
- Request rate and error rate
- Active WebSocket connections
- Agent framework health status
- CPU usage (alert: >80%)
- Memory usage (alert: >85%)
- Disk space (alert: <10% free)
- GPU utilization
- Agent interaction count
- Framework routing efficiency
- User session duration
# Grafana Dashboard
http://your-deployment-ip:3001
# Default: admin/admin
# Prometheus
http://your-deployment-ip:9090
# Direct metrics endpoint
http://your-deployment-ip:8000/metricsConfigured alerts include:
- High response time (>2s for 2 minutes)
- High error rate (>5% for 1 minute)
- Agent framework down (>30 seconds)
- Resource exhaustion (CPU >80%, Memory >85%)
# Check SkyPilot status
sky status --refresh
# View deployment logs
sky logs uap
# Check secrets access
teller run env | grep -E "API_KEY|SECRET"# Detailed health check
./scripts/health-check.sh
# Check individual services
curl http://your-ip:8000/health
curl http://your-ip:8000/agents/status# Check framework logs
sky ssh uap "tail -f /app/logs/backend.log"
# Restart specific service
sky ssh uap "sudo systemctl restart uap"# Check resource usage
sky ssh uap "htop"
sky ssh uap "nvidia-smi" # If GPU available
# Review metrics in Grafana
# Navigate to UAP Overview dashboard# Stop current deployment
sky down uap -y
# Restore from backup (if created)
# Manual restore using backup files in /tmp/uap-backup-*# Update resource requirements in config
vim skypilot/uap-production.yaml
# Redeploy with new resources
./scripts/deploy-production.sh --force# Access deployment directly
sky ssh uap
# Check service status
sudo systemctl status uap
# View logs
journalctl -u uap -f- Spot Instances - All configurations use spot/preemptible instances
- Multi-Cloud - Automatic selection of cheapest provider
- Resource Right-Sizing - Configurable CPU/memory/GPU requirements
- Auto-Shutdown - Configured idle detection and shutdown
# Check current costs
sky cost-report
# Optimize for cost
./scripts/deploy-production.sh --cloud cost-optimized- Secrets stored in secure provider (not environment files)
- Non-root container execution
- Network segmentation configured
- TLS/SSL certificates configured
- Access logging enabled
- Rate limiting configured
- Regular security updates scheduled
- Firewall rules limiting access to necessary ports only
- Internal service communication over private networks
- External access through load balancer/reverse proxy only
# Deploy multiple instances
sky up -c skypilot/uap-production.yaml --cluster-name uap-west
sky up -c skypilot/uap-production.yaml --cluster-name uap-east
# Configure load balancing between instances# Update resource requirements
# Edit skypilot/*.yaml files to increase CPU/memory/GPU
# Redeploy with new resources
./scripts/deploy-production.sh --forceThe deployment infrastructure is ready for real framework implementations:
- CopilotKit: Ready for integration (currently mock implementation)
- Agno: Ready for integration (currently mock implementation)
- Mastra: Ready for integration (currently mock implementation)
When Agents 3, 4, 5 complete framework integrations:
- Update
backend/requirements.txtwith real framework dependencies - Uncomment framework installations in SkyPilot configurations
- Update secrets with real API keys and configurations
- Redeploy with real framework implementations
- Check deployment logs:
sky logs uap - Review health checks:
./scripts/health-check.sh - Monitor metrics in Grafana dashboard
- Check this documentation for troubleshooting steps
- Regular backup creation before deployments
- Monitor resource usage and costs
- Update secrets rotation schedule
- Review and update alerting thresholds
skypilot/
├── uap-production.yaml # Multi-cloud production
├── uap-aws.yaml # AWS-specific
├── uap-gcp.yaml # GCP-specific
├── uap-azure.yaml # Azure-specific
└── uap-cost-optimized.yaml # Cost optimization
scripts/
├── deploy-production.sh # Main deployment script
├── setup-monitoring.sh # Monitoring setup
├── start-production.sh # Production startup
└── health-check.sh # Health verification
monitoring/
├── prometheus.yml # Metrics collection
├── alerts.yml # Alert rules
├── grafana/ # Dashboards and datasources
└── nginx/ # Load balancer config
- CPU: 4+ cores
- Memory: 16+ GB RAM
- Storage: 100 GB SSD
- GPU: Optional (T4/V100/A100 supported)
- CPU: 8+ cores
- Memory: 32+ GB RAM
- Storage: 200 GB SSD
- GPU: A100 or V100 for optimal performance
- CPU: 4+ cores
- Memory: 16+ GB RAM
- Storage: 100 GB standard disk
- GPU: T4 or L4 for cost efficiency