Skip to content

qyqyardy/cloud-native-sre-lab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Cloud Native SRE Lab

A production-grade cloud native platform demonstrating SRE best practices, comprehensive observability, and DevSecOps principles.

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   API Gateway   │────▢│   Go API Service β”‚
β”‚   (Ingress)     β”‚     β”‚   (2 replicas)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
                               β–Ό
                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚   Python    β”‚
                        β”‚   Worker    β”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚          Observability Stack              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Prometheus  β”‚  Grafana  β”‚  Loki/Promtailβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Components

  • Go API Service: High-performance REST API with health checks, metrics, and graceful shutdown
  • Python Worker: Background job processor with structured logging and error handling
  • Monitoring Stack: Full observability with Prometheus, Grafana, and Loki
  • CI/CD Pipeline: Automated testing, security scanning, and deployment

✨ Features

SRE Best Practices

  • βœ… Comprehensive health checks (liveness/readiness probes)
  • βœ… Structured logging with JSON output
  • βœ… Built-in metrics endpoints
  • βœ… Graceful shutdown handling
  • βœ… Resource limits and requests
  • βœ… Pod disruption budgets for high availability
  • βœ… Security contexts (non-root, read-only filesystem)

Observability

  • πŸ“Š Prometheus metrics collection
  • πŸ“ˆ Grafana dashboards
  • πŸ“ Centralized logging with Loki
  • πŸ”” Alert rules for common issues
  • πŸ“‰ Custom application metrics

Security

  • πŸ”’ Container security scanning with Trivy
  • πŸ” Non-root containers
  • πŸ›‘οΈ Read-only root filesystems
  • πŸ”‘ Dropped Linux capabilities
  • 🚨 Automated security scans in CI/CD

πŸš€ Quick Start

Prerequisites

  • Docker and Docker Compose
  • Kubernetes cluster (K3s, minikube, or cloud provider)
  • kubectl configured
  • Go 1.21+ (for local development)
  • Python 3.12+ (for local development)

Local Development with Docker Compose

# Start all services
docker-compose up -d

# Check status
docker-compose ps

# View logs
docker-compose logs -f api-go
docker-compose logs -f worker-python

# Access services
# API: http://localhost:3030
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000 (admin/admin123)

Kubernetes Deployment

# Using Makefile (recommended)
make k8s-deploy

# Or manually
kubectl apply -f kubernetes/monitoring/
kubectl apply -f kubernetes/base/

# Check deployment status
make k8s-status

# Port forward services
make port-forward-api        # API on :3030
make port-forward-grafana    # Grafana on :3000
make port-forward-prometheus # Prometheus on :9090

πŸ“ Available Make Targets

make help                 # Show all available targets

# Development
make build-go            # Build Go API binary
make test-go             # Test Go API
make test-python         # Test Python worker
make dev-api             # Run API locally
make dev-worker          # Run worker locally

# Docker
make docker-build        # Build all Docker images
make docker-compose-up   # Start services with docker-compose
make docker-compose-down # Stop services

# Kubernetes
make k8s-deploy          # Deploy everything to Kubernetes
make k8s-deploy-monitoring # Deploy monitoring stack only
make k8s-deploy-apps     # Deploy applications only
make k8s-delete          # Delete all Kubernetes resources
make k8s-status          # Show deployment status

# Logs
make logs-api            # Show API logs
make logs-worker         # Show worker logs
make logs-prometheus     # Show Prometheus logs
make logs-grafana        # Show Grafana logs

# Security
make trivy-scan          # Run security scans

# Utilities
make clean               # Clean build artifacts

πŸ”§ Configuration

Environment Variables

Go API

  • APP_PORT: API listening port (default: 3030)
  • APP_VERSION: Application version

Python Worker

  • LOG_LEVEL: Logging level (default: INFO)
  • APP_VERSION: Application version
  • JOB_INTERVAL: Seconds between jobs (default: 8)
  • METRICS_INTERVAL: Seconds between metrics logs (default: 60)

πŸ“Š Monitoring & Observability

Accessing Dashboards

Prometheus: Query metrics and view targets

make port-forward-prometheus
# Open http://localhost:9090

Grafana: Visualize metrics and logs

make port-forward-grafana
# Open http://localhost:3000
# Login: admin / admin123

Key Metrics

API metrics available at /metrics:

  • request_count: Total requests handled
  • error_count: Total errors encountered
  • avg_duration_ms: Average request duration
  • uptime: Service uptime

Alert Rules

Configured alerts in Prometheus:

  • High error rate (>5% for 5 minutes)
  • API service down (>2 minutes)
  • High latency (P95 > 1s for 5 minutes)
  • Pod crash looping
  • High memory/CPU usage

πŸ”’ Security

Container Security

  • All containers run as non-root user (UID 1000)
  • Read-only root filesystems
  • All Linux capabilities dropped
  • Trivy scanning in CI/CD pipeline

Security Scanning

# Local security scan
make trivy-scan

# Scan specific component
trivy fs --severity HIGH,CRITICAL ./app/api-go

πŸ§ͺ Testing

Go API Tests

make test-go
# or
cd app/api-go && go test -v -race ./...

Python Worker Tests

make test-python
# or
cd app/worker-python && pytest -v

Code Quality

# Go
make fmt-go    # Format Go code
make vet-go    # Vet Go code

# Python
make lint-python  # Run black and flake8

🚒 CI/CD Pipeline

GitHub Actions workflow includes:

  1. Test: Unit tests for Go and Python
  2. Lint: Code quality checks
  3. Security Scan: Trivy vulnerability scanning
  4. Build: Docker images for both services
  5. Push: Images to GitHub Container Registry
  6. Deploy: Automated Kubernetes deployment

πŸ“š API Endpoints

Go API Service

Endpoint Method Description
/ GET Root endpoint
/health GET Health check (liveness)
/ready GET Readiness check
/metrics GET Prometheus metrics
/api/* GET API endpoints

Example Requests

# Health check
curl http://localhost:3030/health

# Metrics
curl http://localhost:3030/metrics

# API endpoint
curl http://localhost:3030/api/

🎯 SRE Principles Demonstrated

  1. Reliability

    • High availability with 2+ replicas
    • Pod disruption budgets
    • Health checks and auto-healing
  2. Observability

    • Comprehensive logging
    • Metrics collection
    • Distributed tracing ready
  3. Performance

    • Resource limits and requests
    • Efficient container images
    • Rolling updates with zero downtime
  4. Security

    • Security scanning
    • Least privilege containers
    • Regular dependency updates

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests: make test
  5. Run security scan: make trivy-scan
  6. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

Built with:

  • Go for high-performance API
  • Python for flexible worker tasks
  • Kubernetes for container orchestration
  • Prometheus + Grafana for observability
  • Loki for log aggregation

About

Production-like Cloud Native SRE lab demonstrating real-world practices using Go & Python microservices, Docker, Kubernetes (K3s), CI/CD, observability, and DevSecOps fundamentals. Designed to simulate how modern systems are built, deployed, and operated in production.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors