A demonstration of Multi-Cloud, Multi-Region High Availability architecture using Amazon EKS and Azure AKS, showcasing cross-region container registry replication, DNS-based failover with Route53, and optional cross-cloud failover between AWS and Azure.
This project deploys a fully redundant infrastructure across two AWS regions with optional cross-cloud failover to Azure:
┌─────────────────────┐
│ Route53 (DNS) │
│ multi-cloud.domain │
│ FAILOVER ROUTING │
└──────────┬──────────┘
│
┌────────────────────┴───────────────────┐
│ │
┌─────┴─────┐ ┌─────┴─────┐
│ PRIMARY │ │ SECONDARY │
└─────┬─────┘ └─────┬─────┘
│ │
▼ ▼
┌────────────────────────┐ ┌──────────────────┐
│ aws-pool.multi-cloud │ │ Azure Cloud │
│ WEIGHTED ROUTING │ │ (Standby) │
│ (50% / 50%) │ │ │
└───────────┬────────────┘ │ eastus + westus2 │
│ │ LoadBalancers │
┌───────────┴───────────┐ └──────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ AWS Cloud │ │ AWS Cloud │
│ us-east-1 │ │ us-west-2 │
│ │ │ │
│ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ ECR │─┼────┼─│ ECR │ │
│ │ (Primary) │ │Repl│ │ (Replica) │ │
│ └─────────────┘ │ │ └─────────────┘ │
│ │ │ │
│ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ EKS Cluster │ │ │ │ EKS Cluster │ │
│ │ (3 nodes) │ │ │ │ (3 nodes) │ │
│ │ Demo App x3 │ │ │ │ Demo App x3 │ │
│ │LoadBalancer │ │ │ │LoadBalancer │ │
│ └─────────────┘ │ │ └─────────────┘ │
└─────────────────┘ └─────────────────┘
│ │
└────────┬───────────┘
▼
┌─────────────────────┐
│ GitHub Actions │
│ (OIDC Auth) │
└─────────────────────┘
| Component | Description |
|---|---|
| ECR Repositories | Container registries in both AWS regions with automatic cross-region replication |
| EKS Clusters | Kubernetes 1.34 clusters with managed node groups (3-6 nodes each) |
| Route53 | Nested routing: weighted AWS pool (50/50 East/West) as PRIMARY, Azure as SECONDARY failover |
| CI/CD IAM | OIDC-based authentication for GitHub Actions (no stored credentials) |
| ACR (Azure) | Azure Container Registry with Premium SKU and geo-replication |
| AKS Clusters (Azure) | Kubernetes 1.34 clusters in eastus and westus2 (optional failover target) |
| Demo Application | Sample microservice deployed with 3 replicas and LoadBalancer |
- AWS CLI configured with appropriate credentials
- Azure CLI (optional, for Azure deployment)
- Terraform >= 1.12.0
- kubectl for Kubernetes cluster management
- make for running workflow commands
- An AWS account with permissions to create EKS, ECR, IAM, Route53, and VPC resources
- An Azure subscription (optional, for cross-cloud failover)
aws_infrastructure/
├── environments/
│ └── development/
│ ├── main.tf # Main config (ECR, EKS, Route53 failover)
│ ├── variables.tf # Input variables with defaults
│ ├── outputs.tf # Exported values
│ ├── providers.tf # AWS provider configuration
│ ├── data.tf # Data sources (VPCs, subnets, Route53 zone)
│ ├── backend.tf # S3/DynamoDB remote state config
│ ├── Makefile # Development workflow commands
│ ├── k8s/
│ │ ├── app.yaml # Demo application deployment
│ │ ├── debug.yaml # Debug pod configuration
│ │ └── node-reader-sa.yaml # Service account for node info
│ ├── load-time.sh # Response time measurement script
│ └── empty-ecr.sh # ECR cleanup utility
│
└── modules/
├── ecr/ # ECR repository module
├── eks/ # EKS cluster module (wraps terraform-aws-modules/eks)
├── cicd-iam/ # GitHub Actions OIDC authentication
└── remote_state/ # Terraform state backend (S3 + DynamoDB)
azure_infrastructure/
├── environments/
│ └── development/
│ ├── main.tf # Main config (ACR, AKS clusters)
│ ├── variables.tf # Input variables with defaults
│ ├── outputs.tf # Exported values
│ ├── providers.tf # Azure provider configuration
│ ├── backend.tf # Azure Storage remote state config
│ ├── Makefile # Development workflow commands
│ └── k8s/
│ ├── app.yaml # Demo application deployment
│ └── node-reader-sa.yaml # Service account for node info
│
└── modules/
├── acr/ # Azure Container Registry with geo-replication
├── aks/ # AKS cluster module
└── remote-state/ # Terraform state backend (Azure Storage)
The root Makefile provides orchestration commands that run AWS and Azure operations in parallel:
# Configure environment variables in .env (see Configuration section)
# Deploy all infrastructure (registries first, then clusters)
make init # Initialize Terraform for all components
make plan # Plan all changes
make apply # Apply all infrastructure
# Or deploy step-by-step
make apply-registries # Deploy ECR + ACR in parallel
make apply-clusters # Deploy EKS + AKS in parallel
# Switch kubectl context and deploy apps
make use-aws-east # Switch to EKS East
make deploy-app-aws-east # Deploy app to EKS East
make deploy-app-all # Deploy to all 4 clusters in parallel
# Failover testing
make failover # Scale AWS deployments to 0 (simulate failure)
make revert # Scale AWS deployments back to 3
make validate-traffic # Validate DNS, health checks, and all endpoints
# Cleanup
make destroy # Destroy all (clusters first, then registries)Run make help at the project root to see all available commands.
Create a .env file in the project root with your configuration:
# AWS Configuration
AWS_DEFAULT_REGION=us-east-1
AWS_REMOTE_BUCKET_NAME=your-terraform-state-bucket
AWS_REMOTE_DYNAMODB_TABLE=your-terraform-lock-table
AWS_ACCOUNT_ID=your-aws-account-id
AWS_CLUSTER_EAST=eks-cluster-dev-east
AWS_CLUSTER_WEST=eks-cluster-dev-west
# Azure Configuration
AZURE_STORAGE_ACCOUNT_NAME=your-storage-account
AZURE_CONTAINER_NAME=tfstate
AZURE_STATE_RESOURCE_GROUP=your-state-rg
AZURE_CLUSTER_EAST=aks-cluster-dev-east
AZURE_CLUSTER_WEST=aks-cluster-dev-west
AZURE_RG_EAST=your-resource-group-east
AZURE_RG_WEST=your-resource-group-west
# Kubernetes Application
K8S_DEPLOYMENT_NAME=basic-demo-microservice-01
K8S_NAMESPACE=default
K8S_REPLICAS=3
# DNS Configuration (for traffic validation)
DOMAIN_NAME=your-domain.com
SUBDOMAIN=multi-cloud
AWS_POOL_SUBDOMAIN=aws-poolcd aws_infrastructure/environments/development
make init# Review changes
make plan
# Apply infrastructure
make applyThis will create:
- ECR repositories in us-east-1 and us-west-2
- Cross-region replication from east to west
- EKS clusters in both regions
- IAM policies for ECR access and cluster administration
# Set up kubeconfig for east cluster
make kubeconfig-east
# Set up kubeconfig for west cluster
make kubeconfig-west# Switch to east cluster
make use-east
# Deploy the application
make deploy-app
# Repeat for west cluster
make use-west
make deploy-appAfter deploying the application, get the LoadBalancer hostnames and update variables.tf:
lb_hostname_east = "your-east-lb-hostname.elb.amazonaws.com"
lb_hostname_west = "your-west-lb-hostname.elb.amazonaws.com"Re-run make plan && make apply to create the Route53 health checks and weighted routing records.
Ensure the Azure variables are set in the root .env file (see AWS section above).
cd azure_infrastructure/environments/development
make init
make plan
make applyThis will create:
- ACR with Premium SKU and geo-replication (eastus → westus2)
- AKS clusters in both regions
- AcrPull role assignments for cluster identities
# Switch to east cluster and deploy
make use-east
make deploy-app
# Repeat for west cluster
make use-west
make deploy-appAfter deploying to both AWS and Azure, get the Azure LoadBalancer IPs:
make get-lb-ipsUpdate AWS variables.tf to enable cross-cloud failover:
enable_cross_cloud_failover = true
azure_lb_ip_east = "x.x.x.x"
azure_lb_ip_west = "x.x.x.x"Re-run AWS Terraform to create failover routing (AWS primary, Azure secondary).
From project root:
make use-aws-east # Switch to EKS East (us-east-1)
make use-aws-west # Switch to EKS West (us-west-2)
make use-azure-east # Switch to AKS East (eastus)
make use-azure-west # Switch to AKS West (westus2)From cloud-specific directories:
# AWS (from aws_infrastructure/environments/development/)
make use-east # Switch to us-east-1 cluster
make use-west # Switch to us-west-2 cluster
# Azure (from azure_infrastructure/environments/development/)
make use-east # Switch to eastus cluster
make use-west # Switch to westus2 clusterFrom project root (recommended):
make deploy-app-aws-east # Deploy to EKS East
make deploy-app-aws-west # Deploy to EKS West
make deploy-app-azure-east # Deploy to AKS East
make deploy-app-azure-west # Deploy to AKS West
make deploy-app-all # Deploy to all 4 clusters in parallelFrom cloud-specific directories:
# Ensure you're in the right context first
make use-east && make deploy-app
make use-west && make deploy-app# Ping the LoadBalancer service endpoint
make ping-service
# Test via DNS record (if configured)
make ping-dns-record# Run 10 requests and calculate average response time
./load-time.shmake lint # Format and validate Terraform
make plan # Preview changes
make apply # Apply changes
make refresh # Refresh state
make destroy # Tear down all infrastructure
make clean # Remove local Terraform files# Empty ECR repositories (required before destroy)
make empty-ecr| Variable | Default | Description |
|---|---|---|
aws_region |
us-east-1 |
Primary AWS region |
cluster_name |
eks-cluster-dev |
Base name for EKS clusters |
cluster_version |
1.34 |
Kubernetes version |
repository_name |
basic-demo-microservice-01 |
ECR repository name |
domain_name |
- | Root domain for Route53 hosted zone |
subdomain |
multi-cloud |
Subdomain for failover record |
lb_hostname_east |
"" |
East region LoadBalancer hostname |
lb_hostname_west |
"" |
West region LoadBalancer hostname |
enable_cross_cloud_failover |
false |
Enable nested routing with Azure failover |
azure_lb_ip_east |
"" |
Azure East US LoadBalancer IP |
azure_lb_ip_west |
"" |
Azure West US 2 LoadBalancer IP |
aws_pool_subdomain |
aws-pool |
Subdomain for AWS weighted pool |
aws_east_weight |
50 |
Traffic weight for East region (0-255) |
aws_west_weight |
50 |
Traffic weight for West region (0-255) |
elb_zone_id_east |
Z35SXDOTRQ7X7K |
ELB hosted zone ID for us-east-1 |
elb_zone_id_west |
Z1H1FL5HABSF5 |
ELB hosted zone ID for us-west-2 |
| Variable | Default | Description |
|---|---|---|
azure_region_east |
eastus |
Primary Azure region |
azure_region_west |
westus2 |
Secondary Azure region |
cluster_name |
aks-cluster-dev |
Base name for AKS clusters |
kubernetes_version |
1.29 |
Kubernetes version |
registry_name |
aksmultiregiondemoacr |
ACR name (globally unique) |
vm_size |
Standard_B2s |
VM size for AKS nodes |
AWS EKS:
eks_managed_node_groups = {
eks_nodes = {
desired_size = 3
max_size = 6
min_size = 3
instance_types = ["t3.small"]
}
}Azure AKS:
node_count = 3
min_count = 3
max_count = 6
vm_size = "Standard_B2s"The included demo application (basic-demo-microservice-01) demonstrates:
- Multi-replica deployment (3 pods per cluster)
- LoadBalancer service with
externalTrafficPolicy: Localfor reduced latency - Node awareness via environment variables exposing the underlying node name
- Service account with permissions to read node information
spec:
replicas: 3
template:
spec:
serviceAccountName: node-reader-sa
containers:
- name: basic-demo-microservice-01
image: <account>.dkr.ecr.us-east-1.amazonaws.com/basic-demo-microservice-01:latest
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName-
Cross-Region ECR Replication: Images pushed to the primary repository (us-east-1) automatically replicate to the secondary (us-west-2), ensuring both clusters can pull images locally.
-
Independent EKS Clusters: Each region has its own fully functional Kubernetes cluster, providing isolation from regional failures.
-
Route53 Nested Routing: Two-tier DNS architecture ensures full multi-region coverage:
- AWS Pool (
aws-pool.multi-cloud.domain): Weighted routing distributes traffic 50/50 between East and West - Main Record (
multi-cloud.domain): Failover routing with AWS pool as PRIMARY - Health checks monitor each LoadBalancer; unhealthy endpoints are automatically removed
- Calculated health check aggregates both AWS regions (healthy if at least one region is up)
- AWS Pool (
-
Cross-Cloud Failover: When
enable_cross_cloud_failover=true, the nested routing enables automatic failover to Azure. Both AWS regions participate in normal traffic distribution. Failover to Azure only occurs when BOTH AWS regions are unhealthy. -
Azure Geo-Replication: ACR with Premium SKU provides automatic geo-replication between eastus and westus2, ensuring local image pulls for both AKS clusters.
-
Secure CI/CD Pipeline: GitHub Actions authenticates via OIDC, eliminating the need for long-lived AWS credentials in your repository.
-
Local Traffic Policy: LoadBalancer services use
externalTrafficPolicy: Localto route traffic to pods on the same node, reducing cross-AZ latency.
Test the cross-cloud failover functionality:
# 1. Validate all endpoints are healthy
make validate-traffic
# 2. Simulate AWS failure (scales deployments to 0)
make failover
# 3. Wait ~3 minutes for Route53 to detect failure
# DNS will automatically failover to Azure endpoints
# 4. Validate traffic is now served by Azure
make validate-traffic
# 5. Restore AWS deployments
make revert
# 6. Wait ~3 minutes for Route53 to restore AWS as primary
make validate-trafficThe validate-traffic command provides a comprehensive report including:
- DNS resolution for main record and AWS pool
- Route53 health check status for all 4 regions
- HTTP health checks for each LoadBalancer endpoint
- Main DNS endpoint connectivity test
# From project root - destroys clusters first, then registries
make destroy# From project root
make destroy-clusters # Destroy EKS + AKS only
make destroy-registries # Destroy ECR + ACR only
# Or from cloud-specific directories
cd aws_infrastructure/environments/development && make destroy
cd azure_infrastructure/environments/development && make destroySee LICENSE for details.