A production-grade cloud native platform demonstrating SRE best practices, comprehensive observability, and DevSecOps principles.
βββββββββββββββββββ ββββββββββββββββββββ
β API Gateway ββββββΆβ Go API Service β
β (Ingress) β β (2 replicas) β
βββββββββββββββββββ ββββββββββββββββββββ
β
βΌ
βββββββββββββββ
β Python β
β Worker β
βββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββ
β Observability Stack β
βββββββββββββββββββββββββββββββββββββββββββββ€
β Prometheus β Grafana β Loki/Promtailβ
βββββββββββββββββββββββββββββββββββββββββββββ
- Go API Service: High-performance REST API with health checks, metrics, and graceful shutdown
- Python Worker: Background job processor with structured logging and error handling
- Monitoring Stack: Full observability with Prometheus, Grafana, and Loki
- CI/CD Pipeline: Automated testing, security scanning, and deployment
- β Comprehensive health checks (liveness/readiness probes)
- β Structured logging with JSON output
- β Built-in metrics endpoints
- β Graceful shutdown handling
- β Resource limits and requests
- β Pod disruption budgets for high availability
- β Security contexts (non-root, read-only filesystem)
- π Prometheus metrics collection
- π Grafana dashboards
- π Centralized logging with Loki
- π Alert rules for common issues
- π Custom application metrics
- π Container security scanning with Trivy
- π Non-root containers
- π‘οΈ Read-only root filesystems
- π Dropped Linux capabilities
- π¨ Automated security scans in CI/CD
- Docker and Docker Compose
- Kubernetes cluster (K3s, minikube, or cloud provider)
- kubectl configured
- Go 1.21+ (for local development)
- Python 3.12+ (for local development)
# Start all services
docker-compose up -d
# Check status
docker-compose ps
# View logs
docker-compose logs -f api-go
docker-compose logs -f worker-python
# Access services
# API: http://localhost:3030
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000 (admin/admin123)# Using Makefile (recommended)
make k8s-deploy
# Or manually
kubectl apply -f kubernetes/monitoring/
kubectl apply -f kubernetes/base/
# Check deployment status
make k8s-status
# Port forward services
make port-forward-api # API on :3030
make port-forward-grafana # Grafana on :3000
make port-forward-prometheus # Prometheus on :9090make help # Show all available targets
# Development
make build-go # Build Go API binary
make test-go # Test Go API
make test-python # Test Python worker
make dev-api # Run API locally
make dev-worker # Run worker locally
# Docker
make docker-build # Build all Docker images
make docker-compose-up # Start services with docker-compose
make docker-compose-down # Stop services
# Kubernetes
make k8s-deploy # Deploy everything to Kubernetes
make k8s-deploy-monitoring # Deploy monitoring stack only
make k8s-deploy-apps # Deploy applications only
make k8s-delete # Delete all Kubernetes resources
make k8s-status # Show deployment status
# Logs
make logs-api # Show API logs
make logs-worker # Show worker logs
make logs-prometheus # Show Prometheus logs
make logs-grafana # Show Grafana logs
# Security
make trivy-scan # Run security scans
# Utilities
make clean # Clean build artifactsAPP_PORT: API listening port (default: 3030)APP_VERSION: Application version
LOG_LEVEL: Logging level (default: INFO)APP_VERSION: Application versionJOB_INTERVAL: Seconds between jobs (default: 8)METRICS_INTERVAL: Seconds between metrics logs (default: 60)
Prometheus: Query metrics and view targets
make port-forward-prometheus
# Open http://localhost:9090Grafana: Visualize metrics and logs
make port-forward-grafana
# Open http://localhost:3000
# Login: admin / admin123API metrics available at /metrics:
request_count: Total requests handlederror_count: Total errors encounteredavg_duration_ms: Average request durationuptime: Service uptime
Configured alerts in Prometheus:
- High error rate (>5% for 5 minutes)
- API service down (>2 minutes)
- High latency (P95 > 1s for 5 minutes)
- Pod crash looping
- High memory/CPU usage
- All containers run as non-root user (UID 1000)
- Read-only root filesystems
- All Linux capabilities dropped
- Trivy scanning in CI/CD pipeline
# Local security scan
make trivy-scan
# Scan specific component
trivy fs --severity HIGH,CRITICAL ./app/api-gomake test-go
# or
cd app/api-go && go test -v -race ./...make test-python
# or
cd app/worker-python && pytest -v# Go
make fmt-go # Format Go code
make vet-go # Vet Go code
# Python
make lint-python # Run black and flake8GitHub Actions workflow includes:
- Test: Unit tests for Go and Python
- Lint: Code quality checks
- Security Scan: Trivy vulnerability scanning
- Build: Docker images for both services
- Push: Images to GitHub Container Registry
- Deploy: Automated Kubernetes deployment
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Root endpoint |
/health |
GET | Health check (liveness) |
/ready |
GET | Readiness check |
/metrics |
GET | Prometheus metrics |
/api/* |
GET | API endpoints |
# Health check
curl http://localhost:3030/health
# Metrics
curl http://localhost:3030/metrics
# API endpoint
curl http://localhost:3030/api/-
Reliability
- High availability with 2+ replicas
- Pod disruption budgets
- Health checks and auto-healing
-
Observability
- Comprehensive logging
- Metrics collection
- Distributed tracing ready
-
Performance
- Resource limits and requests
- Efficient container images
- Rolling updates with zero downtime
-
Security
- Security scanning
- Least privilege containers
- Regular dependency updates
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests:
make test - Run security scan:
make trivy-scan - Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
Built with:
- Go for high-performance API
- Python for flexible worker tasks
- Kubernetes for container orchestration
- Prometheus + Grafana for observability
- Loki for log aggregation