This directory contains Docker configurations for a complete Prometheus monitoring stack to support the Prometheus Metrics Emulator (PromEmu).
- Architecture
- Services
- Quick Start
- Management Script
- Configuration Files
- Data Persistence
- Networking
- Health Checks
- Monitoring Emulated Metrics
- Alerting Rules
- Troubleshooting
- Integration with Emulation
- Production Considerations
┌─────────────────┐ ┌────────────────────┐ ┌─────────────────┐
│ Pushgateway │◄───│ Emulation Script │ │ Grafana │
│ Port: 9091 │ │ (Python) │ │ Port: 3000 │
└─────────┬───────┘ └────────────────────┘ └─────────┬───────┘
│ │
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Prometheus │◄─────────────────────────────│ Data Storage │
│ Port: 9090 │ │ (Volumes) │
└─────────────────┘ └─────────────────┘
- Image:
prom/pushgateway:v1.11.1 - Purpose: Receives metrics from emulation script
- Features:
- Persistent storage with 5-minute intervals
- Health checks
- Data persistence in Docker volume
- Image:
prom/prometheus:v3.5.0 - Purpose: Scrapes metrics from Pushgateway, stores time-series data
- Features:
- 7-day retention policy
- 10GB storage limit
- Custom alerting rules for emulation metrics
- API access for external tools
- Image:
grafana/grafana:12.1.0 - Purpose: Visualization and dashboards
- Credentials:
admin/admin - Features:
- Pre-configured Prometheus datasource
- Automatic dashboard provisioning
- Additional plugins (piechart, worldmap)
- Anonymous viewer access
- Docker and Docker Compose installed
- Ports 3000, 9090, 9091 available
# Start all services
./manage.sh start
# Or using docker-compose directly
docker-compose up -d --build- Pushgateway: http://localhost:9091
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000 (admin/admin)
./manage.sh statusThe manage.sh script provides convenient management commands:
./manage.sh start # Start all services
./manage.sh stop # Stop all services
./manage.sh restart # Restart all services
./manage.sh status # Show service status
./manage.sh config # Show current configuration
./manage.sh logs # Show all logs
./manage.sh logs grafana # Show specific service logs
./manage.sh backup # Backup data volumes
./manage.sh cleanup # Complete cleanup (removes all data)docker-compose.yml- Main orchestration file- Defines services, networks, volumes, and dependencies
pushgateway/Dockerfile- Custom build with health checks- Persistent storage in
/datavolume - Runs as non-root user for security
prometheus/Dockerfile- Custom build with config filesprometheus/prometheus.yml- Main configurationprometheus/rules/emulation_alerts.yml- Alerting rules- Scrapes Pushgateway every 15 seconds (configurable)
grafana/Dockerfile- Custom build with pluginsgrafana/provisioning/datasources/- Auto-configured datasourcesgrafana/provisioning/dashboards/- Auto-provisioned dashboardsgrafana/dashboards/emulation_overview.json- Main dashboard
All data is stored in Docker named volumes:
max-prometheus-data- Prometheus time-series datamax-grafana-data- Grafana dashboards and settingsmax-pushgateway-data- Pushgateway metric buffer
./manage.sh backupCreates timestamped backup in backups/ directory.
Services communicate via dedicated Docker network:
- Network:
max-monitoring - Internal DNS resolution between services
- Only necessary ports exposed to host
All services include health checks:
- Pushgateway: HTTP GET
/metrics - Prometheus: HTTP GET
/-/healthy - Grafana: HTTP GET
/api/health
Health check parameters:
- Interval: 30 seconds
- Timeout: 10 seconds
- Retries: 3
- Start period: 10-30 seconds
# CPU usage across all hosts
cpu_usage_percent
# Memory usage for specific host
memory_usage_percent{host="worker-01"}
# Heavy task status
heavy_task_active
# Database connections
db_connections
The included dashboard shows:
- CPU usage by host (time series)
- Memory usage by host (time series)
- Heavy task status (stat panel)
- Host distribution by role (pie chart)
- Database connections (time series)
- Network throughput (time series)
Pre-configured alerts:
- HighCPUUsage: CPU > 80% for 2 minutes
- CriticalCPUUsage: CPU > 95% for 1 minute
- HighMemoryUsage: Memory > 85% for 3 minutes
- HeavyTaskActive: Heavy task started
- PushgatewayDown: Service unavailable
- PrometheusDown: Service unavailable
# Check Docker is running
docker info
# Check port availability
netstat -tulpn | grep -E ':(3000|9090|9091)'
# View service logs
./manage.sh logs [service]# Verify Pushgateway has metrics
curl http://localhost:9091/metrics
# Check Prometheus targets
curl http://localhost:9090/api/v1/targets
# Verify Grafana datasource
curl -u admin:admin123 http://localhost:3000/api/datasources# Monitor resource usage
docker stats
# Check volume sizes
docker system df -v
# Reduce Prometheus retention (current default: 7d)
# Edit .docker-env: PROMETHEUS_RETENTION_TIME=3d./manage.sh cleanup
./manage.sh startThe emulation script pushes metrics to Pushgateway:
from core.emulator import MetricsEmulator
from core.config_example import get_example_config
# Configure to use local Pushgateway
config = get_example_config()
config.pushgateway_url = 'http://localhost:9091'
# Run emulation
emulator = MetricsEmulator(config)
await emulator.run_for_duration(1800) # 30 minutesFor production use, consider:
-
Security:
- Change default Grafana password
- Enable HTTPS with reverse proxy
- Configure authentication
-
Persistence:
- Use external storage for volumes
- Set up backup automation
- Configure retention policies
-
Monitoring:
- Add Alertmanager for notifications
- Configure external receivers (email, Slack)
- Monitor the monitoring stack itself
-
Performance:
- Tune Prometheus storage settings
- Configure appropriate resource limits
- Use SSD storage for better performance