Cloud Native SRE Lab

A production-grade cloud native platform demonstrating SRE best practices, comprehensive observability, and DevSecOps principles.

🏗️ Architecture

┌─────────────────┐     ┌──────────────────┐
│   API Gateway   │────▶│   Go API Service │
│   (Ingress)     │     │   (2 replicas)   │
└─────────────────┘     └──────────────────┘
                               │
                               ▼
                        ┌─────────────┐
                        │   Python    │
                        │   Worker    │
                        └─────────────┘

┌───────────────────────────────────────────┐
│          Observability Stack              │
├───────────────────────────────────────────┤
│  Prometheus  │  Grafana  │  Loki/Promtail│
└───────────────────────────────────────────┘

Components

Go API Service: High-performance REST API with health checks, metrics, and graceful shutdown
Python Worker: Background job processor with structured logging and error handling
Monitoring Stack: Full observability with Prometheus, Grafana, and Loki
CI/CD Pipeline: Automated testing, security scanning, and deployment

✨ Features

SRE Best Practices

✅ Comprehensive health checks (liveness/readiness probes)
✅ Structured logging with JSON output
✅ Built-in metrics endpoints
✅ Graceful shutdown handling
✅ Resource limits and requests
✅ Pod disruption budgets for high availability
✅ Security contexts (non-root, read-only filesystem)

Observability

📊 Prometheus metrics collection
📈 Grafana dashboards
📝 Centralized logging with Loki
🔔 Alert rules for common issues
📉 Custom application metrics

Security

🔒 Container security scanning with Trivy
🔐 Non-root containers
🛡️ Read-only root filesystems
🔑 Dropped Linux capabilities
🚨 Automated security scans in CI/CD

🚀 Quick Start

Prerequisites

Docker and Docker Compose
Kubernetes cluster (K3s, minikube, or cloud provider)
kubectl configured
Go 1.21+ (for local development)
Python 3.12+ (for local development)

Local Development with Docker Compose

# Start all services
docker-compose up -d

# Check status
docker-compose ps

# View logs
docker-compose logs -f api-go
docker-compose logs -f worker-python

# Access services
# API: http://localhost:3030
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000 (admin/admin123)

Kubernetes Deployment

# Using Makefile (recommended)
make k8s-deploy

# Or manually
kubectl apply -f kubernetes/monitoring/
kubectl apply -f kubernetes/base/

# Check deployment status
make k8s-status

# Port forward services
make port-forward-api        # API on :3030
make port-forward-grafana    # Grafana on :3000
make port-forward-prometheus # Prometheus on :9090

📝 Available Make Targets

make help                 # Show all available targets

# Development
make build-go            # Build Go API binary
make test-go             # Test Go API
make test-python         # Test Python worker
make dev-api             # Run API locally
make dev-worker          # Run worker locally

# Docker
make docker-build        # Build all Docker images
make docker-compose-up   # Start services with docker-compose
make docker-compose-down # Stop services

# Kubernetes
make k8s-deploy          # Deploy everything to Kubernetes
make k8s-deploy-monitoring # Deploy monitoring stack only
make k8s-deploy-apps     # Deploy applications only
make k8s-delete          # Delete all Kubernetes resources
make k8s-status          # Show deployment status

# Logs
make logs-api            # Show API logs
make logs-worker         # Show worker logs
make logs-prometheus     # Show Prometheus logs
make logs-grafana        # Show Grafana logs

# Security
make trivy-scan          # Run security scans

# Utilities
make clean               # Clean build artifacts

🔧 Configuration

Environment Variables

Go API

APP_PORT: API listening port (default: 3030)
APP_VERSION: Application version

Python Worker

LOG_LEVEL: Logging level (default: INFO)
APP_VERSION: Application version
JOB_INTERVAL: Seconds between jobs (default: 8)
METRICS_INTERVAL: Seconds between metrics logs (default: 60)

📊 Monitoring & Observability

Accessing Dashboards

Prometheus: Query metrics and view targets

make port-forward-prometheus
# Open http://localhost:9090

Grafana: Visualize metrics and logs

make port-forward-grafana
# Open http://localhost:3000
# Login: admin / admin123

Key Metrics

API metrics available at /metrics:

request_count: Total requests handled
error_count: Total errors encountered
avg_duration_ms: Average request duration
uptime: Service uptime

Alert Rules

Configured alerts in Prometheus:

High error rate (>5% for 5 minutes)
API service down (>2 minutes)
High latency (P95 > 1s for 5 minutes)
Pod crash looping
High memory/CPU usage

🔒 Security

Container Security

All containers run as non-root user (UID 1000)
Read-only root filesystems
All Linux capabilities dropped
Trivy scanning in CI/CD pipeline

Security Scanning

# Local security scan
make trivy-scan

# Scan specific component
trivy fs --severity HIGH,CRITICAL ./app/api-go

🧪 Testing

Go API Tests

make test-go
# or
cd app/api-go && go test -v -race ./...

Python Worker Tests

make test-python
# or
cd app/worker-python && pytest -v

Code Quality

# Go
make fmt-go    # Format Go code
make vet-go    # Vet Go code

# Python
make lint-python  # Run black and flake8

🚢 CI/CD Pipeline

GitHub Actions workflow includes:

Test: Unit tests for Go and Python
Lint: Code quality checks
Security Scan: Trivy vulnerability scanning
Build: Docker images for both services
Push: Images to GitHub Container Registry
Deploy: Automated Kubernetes deployment

📚 API Endpoints

Go API Service

Endpoint	Method	Description
`/`	GET	Root endpoint
`/health`	GET	Health check (liveness)
`/ready`	GET	Readiness check
`/metrics`	GET	Prometheus metrics
`/api/*`	GET	API endpoints

Example Requests

# Health check
curl http://localhost:3030/health

# Metrics
curl http://localhost:3030/metrics

# API endpoint
curl http://localhost:3030/api/

🎯 SRE Principles Demonstrated

Reliability
- High availability with 2+ replicas
- Pod disruption budgets
- Health checks and auto-healing
Observability
- Comprehensive logging
- Metrics collection
- Distributed tracing ready
Performance
- Resource limits and requests
- Efficient container images
- Rolling updates with zero downtime
Security
- Security scanning
- Least privilege containers
- Regular dependency updates

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Run tests: make test
Run security scan: make trivy-scan
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built with:

Go for high-performance API
Python for flexible worker tasks
Kubernetes for container orchestration
Prometheus + Grafana for observability
Loki for log aggregation

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
app		app
cicd		cicd
kubernetes		kubernetes
security		security
.env.example		.env.example
.gitignore		.gitignore
DEPLOYMENT.md		DEPLOYMENT.md
IMPROVEMENTS.md		IMPROVEMENTS.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
api-go.tar		api-go.tar
docker-compose.yaml		docker-compose.yaml
worker-python.tar		worker-python.tar

Folders and files

Latest commit

History

Repository files navigation

Cloud Native SRE Lab

🏗️ Architecture

Components

✨ Features

SRE Best Practices

Observability

Security

🚀 Quick Start

Prerequisites

Local Development with Docker Compose

Kubernetes Deployment

📝 Available Make Targets

🔧 Configuration

Environment Variables

Go API

Python Worker

📊 Monitoring & Observability

Accessing Dashboards

Key Metrics

Alert Rules

🔒 Security

Container Security

Security Scanning

🧪 Testing

Go API Tests

Python Worker Tests

Code Quality

🚢 CI/CD Pipeline

📚 API Endpoints

Go API Service

Example Requests

🎯 SRE Principles Demonstrated

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages