This design document outlines the comprehensive modernization of a legacy Java Spring Boot monolithic application. The solution implements a cloud-native architecture on Kubernetes with enterprise-grade security, observability, and developer experience. The design leverages existing infrastructure including HashiCorp Vault, observability stack (Prometheus, Grafana, Loki, Tempo, Mimir), GitHub Actions for CI, and ArgoCD for GitOps-based deployment. The design follows a phased approach to minimize risk while maximizing the benefits of modern DevOps practices.
graph TB
subgraph "External"
DEV[Developer]
USER[End Users]
ADMIN[Administrators]
end
subgraph "CI/CD Pipeline"
GIT[Git Repository]
CICD[CI/CD System]
REG[Container Registry]
VAULT[HashiCorp Vault]
end
subgraph "Kubernetes Cluster"
subgraph "Ingress Layer"
ING[Ingress Controller]
CERT[cert-manager]
end
subgraph "Application Layer"
APP[Spring Boot App]
HPA[Horizontal Pod Autoscaler]
end
subgraph "Data Layer"
PG[CloudNativePG Cluster]
BAK[Automated Backups]
OBJ[S3 ObjectStore]
end
subgraph "Observability"
PROM[Prometheus]
GRAF[Grafana]
LOKI[Loki]
TEMPO[Tempo]
MIMIR[Mimir]
end
subgraph "Security"
RBAC[RBAC]
PSP[Pod Security]
NP[Network Policies]
end
end
DEV --> GIT
GIT --> CICD
CICD --> REG
CICD --> VAULT
USER --> ING
ING --> APP
APP --> PG
PG --> BAK
BAK --> OBJ
APP --> PROM
PROM --> GRAF
APP --> LOKI
APP --> TEMPO
PROM --> MIMIR
The containerization strategy uses a multi-stage build approach for optimal security and performance:
# Multi-stage Dockerfile structure
FROM eclipse-temurin:21-jdk-jammy AS builder
FROM gcr.io/distroless/java21-debian12:nonroot AS runtimeThe application follows GitHub Flow with semantic versioning (SemVer) extracted from the Maven project version:
Branching Strategy:
mainbranch: Production-ready code, protected branch- Feature branches: Short-lived branches for development (
feature/user-auth,bugfix/login-issue) - Pull Requests: All changes go through PR review process
Image Tagging Strategy:
mainbranch:{version}(e.g.,1.2.3)- Pull requests:
pr-{number}(e.g.,pr-123) - Feature branches:
{branch-name}-{commit-sha}(e.g.,1.2.3-feature-auth-abc1234)
Deployment Flow:
- PR Creation: Ephemeral environment created automatically
- PR Merge: Direct deployment to production after approval
- Container Labels: Include version, revision, and build timestamp
- Kustomize Integration: Automatic image tag updates in environment overlays
graph TB
subgraph "Kustomize Structure"
subgraph "Base Layer"
BASE[Base Manifests]
DBBASE[Database Base]
end
subgraph "Overlay Layer"
STAGING[Staging Overlay]
PROD[Production Overlay]
end
subgraph "External Secrets"
ESO[External Secrets Operator]
VAULT[HashiCorp Vault]
SS[SecretStore]
ES[ExternalSecret]
end
end
BASE --> DEV
BASE --> STAGING
BASE --> PROD
DBBASE --> DEV
DBBASE --> STAGING
DBBASE --> PROD
VAULT --> ESO
ESO --> SS
SS --> ES
ES --> DEV
ES --> STAGING
ES --> PROD
| Component | PR | Staging | Production |
|---|---|---|---|
| Namespace | spring-app-dev | spring-app-staging | spring-app-production |
| Replicas | 1 | 2 | 5 |
| Memory Request | 256Mi | 512Mi | 1Gi |
| Memory Limit | 512Mi | 1Gi | 2Gi |
| CPU Request | 250m | 250m | 500m |
| CPU Limit | 500m | 500m | 1000m |
| HPA Min/Max | 1/3 | 2/5 | 5/20 |
| PDB Min Available | 1 | 1 | 3 |
| Database Storage | 50GI | 50GI | 100GI |
| Read Replicas | No | No | Yes (2) |
| Log Level | DEBUG | INFO | WARN |
| Vault Server | vault-dev | vault-staging | vault |
Base Image Strategy:
- Use eclipse-temurin:21-jdk-jammy as builder image for compilation
- Use Google Distroless Java 21 for minimal runtime attack surface
- Multi-stage build to separate build and runtime environments
- Non-root user execution for security compliance
Configuration Management:
- External configuration through ConfigMaps and environment variables
- Secret management through Kubernetes Secrets integrated with HashiCorp Vault
- Configuration validation at startup
PostgreSQL Deployment Strategy: Instead of manually managing StatefulSets, Services, and backup CronJobs, we leverage CloudNativePG, a specialized Kubernetes Operator that automates the PostgreSQL lifecycle through declarative custom resources.
CloudNativePG Benefits:
- Automated Failover: Continuous monitoring with automatic replica promotion within seconds
- Simplified Backups & PITR: Declarative backup configuration with WAL archiving to S3-compatible storage
- Managed Read Replicas: Scaling read capacity by changing the
instancesnumber in the manifest - Zero-Downtime Upgrades: Rolling updates for PostgreSQL minor versions
- Integrated Monitoring: Automatic PodMonitor creation for Prometheus metrics
Backup Strategy:
- Automated daily backups using ScheduledBackup CRD
- Point-in-time recovery with WAL archiving to MinIO/Ceph S3
- 7-day retention policy with automatic cleanup
- Cross-region backup replication for disaster recovery
Network Architecture:
- Ingress controller with TLS termination
- Network policies for micro-segmentation
- Service mesh consideration for advanced traffic management
Security Controls:
- Pod Security Standards enforcement
- RBAC with principle of least privilege
- Admission controllers for policy enforcement
- Runtime security monitoring
GitHub Flow CI Pipeline:
graph TB
A[Feature Branch] --> B[Pull Request]
B --> C[scan-and-lint]
C --> D[build-and-sast]
D --> E[pre-tests]
E --> F[image-and-push]
F --> G[deploy]
C --> C1[Secrets Scan]
C --> C2[Trivy FS Scan]
C --> C3[Checkstyle/PMD]
D --> D1[Maven Package]
D --> D2[OWASP Dependency Check]
D --> D3[CodeQL Analysis]
D --> D4[SonarQube Scan]
D --> D5[Security Gates: 0 critical, ≤5 high]
E --> E1[Unit Tests with JUnit]
E --> E2[Integration Tests with Testcontainers]
E --> E3[JaCoCo Coverage Analysis]
E --> E4[Quality Gates: ≥80% coverage]
F --> F1[Multi-arch Build: AMD64/ARM64]
F --> F2[Semantic Versioning Tags]
F --> F3[Container Security Scan]
F --> F4[GitHub Container Registry Push]
G --> G1[PR: Preview Label for ArgoCD]
G --> G2[Main: Create PR to k8s Repository]
G2 --> H[post-tests - Main Branch Only]
H --> H1[Smoke Tests]
H --> H2[Load Tests]
H --> H3[End-to-End Tests]
H --> H4[UAT Tests]
GitHub Flow Pipeline Stages:
- scan-and-lint: Secrets scan, Trivy filesystem scan, Checkstyle/PMD validation
- build-and-sast: Maven package, OWASP dependency check, CodeQL analysis, security gates
- pre-tests: Unit tests with JUnit/JaCoCo, integration tests with Testcontainers, self-hosted SonarQube analysis (after test execution), quality gates (≥80% coverage)
- image-and-push: Multi-architecture builds (AMD64/ARM64), semantic versioning, container security scanning, registry push
- deploy: PR environments via ArgoCD ApplicationSet, staging deployment via GitOps PR
- post-tests: Smoke tests, load tests, end-to-end tests, UAT tests (main branch only)
CD Pipeline (ArgoCD):
- GitOps-based deployment to staging environment
- Automated testing and validation
- Production deployment with canary strategy
- Automated synchronization and drift detection
# Application Configuration Structure
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
application.yml: |
server:
port: 8080
spring:
datasource:
url: jdbc:postgresql://postgres-service:5432/appdb
hikari:
maximum-pool-size: 20
minimum-idle: 5
management:
endpoints:
web:
exposure:
include: health,metrics,prometheus
endpoint:
health:
show-details: always# SecretStore for Vault integration
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: vault-secret-store
namespace: spring-app
spec:
provider:
vault:
server: "https://vault.domain.local"
path: "secret"
version: "v2"
auth:
kubernetes:
mountPath: "kubernetes"
role: "spring-app"
serviceAccountRef:
name: "spring-app"
---
# ExternalSecret for automatic secret synchronization
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: app-secrets
namespace: spring-app
spec:
refreshInterval: 15s
secretStoreRef:
name: vault-secret-store
kind: SecretStore
target:
name: app-secrets
creationPolicy: Owner
data:
- secretKey: database_username
remoteRef:
key: spring-app/database
property: username
- secretKey: database_password
remoteRef:
key: spring-app/database
property: password# PostgreSQL Cluster managed by CloudNativePG Operator
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: postgres-app-cluster
spec:
instances: 3
imageName: ghcr.io/cloudnative-pg/postgresql:18.0-standard-trixie
monitoring:
enablePodMonitor: true
storage:
size: 100Gi
plugins:
- name: barman-cloud.cloudnative-pg.io
isWALArchiver: true
parameters:
barmanObjectName: s3local-eu-central
---
# Scheduled Backup with CloudNativePG Operator
apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
name: postgres-app-cluster-backup
spec:
cluster:
name: postgres-app-cluster
schedule: '1 2 3 * * *'
backupOwnerReference: self
method: plugin
pluginConfiguration:
name: barman-cloud.cloudnative-pg.io
---
# ObjectStore for backups
apiVersion: barmancloud.cnpg.io/v1
kind: ObjectStore
metadata:
name: s3local-eu-central
spec:
retentionPolicy: "7d"
configuration:
destinationPath: "s3://on-prem-s3-bucket/PGbackups/"
endpointURL: "http://minio.storage.central.eu.local:9000"
s3Credentials:
accessKeyId:
name: s3local-eu-central
key: ACCESS_KEY_ID
secretAccessKey:
name: s3local-eu-central
key: ACCESS_SECRET_KEY
wal:
compression: gzip
encryption: AES256-
Graceful Degradation:
- Circuit breaker patterns for external dependencies
- Fallback mechanisms for non-critical features
- Proper error logging and monitoring
-
Database Connection Handling:
- Connection pool monitoring and alerting
- Automatic retry mechanisms with exponential backoff
- Health check endpoints for database connectivity
-
Kubernetes-Level Error Handling:
- Readiness and liveness probes configuration
- Pod disruption budgets for maintenance scenarios
- Automatic restart policies for failed containers
The application integrates with existing observability infrastructure:
- Prometheus: Metrics collection with remote write to Mimir
- Grafana: Dashboard visualization and alerting
- Loki: Centralized log aggregation and querying
- Tempo: Distributed tracing with correlation
- Mimir: Long-term metrics storage and federation
# Example Alert Rules for existing AlertManager
groups:
- name: application.rules
rules:
- alert: ApplicationDown
expr: up{job="spring-boot-app"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Application is down"
- alert: DatabaseConnectionHigh
expr: hikaricp_connections_active / hikaricp_connections_max > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "Database connection pool usage is high"- JUnit 5 for unit tests with minimum 80% code coverage
- Mockito for dependency mocking
- TestContainers for integration testing with real databases
- Testcontainers for real database integration testing
- PostgreSQL container for repository layer testing
- Container structure tests for Dockerfile validation
- Security scanning with Trivy or Snyk
- Performance testing of containerized application
- Helm chart testing with chart-testing tool
- Kubernetes manifest validation with kubeval
- End-to-end testing with Ginkgo and Gomega
- Pipeline as code validation
- Deployment smoke tests
- Rollback testing scenarios
- Static Application Security Testing (SAST)
- Dynamic Application Security Testing (DAST)
- Container image vulnerability scanning
- Kubernetes security benchmarks (CIS)
- Create multi-stage Dockerfile
- Implement configuration externalization
- Set up local development environment
- Container security hardening
- Create basic Kubernetes manifests
- Implement database connectivity
- Set up monitoring and logging
- Basic CI/CD pipeline
- Implement autoscaling and high availability
- Advanced security controls
- Comprehensive observability
- Performance optimization
- Production deployment
- Disaster recovery testing
- Documentation and training
- Performance tuning and optimization
- Distroless base images
- Non-root user execution
- Minimal package installation
- Regular security updates
- Pod Security Standards
- Network policies for micro-segmentation
- RBAC with least privilege
- Admission controllers
- Encryption at rest and in transit
- Secret management with external systems
- Database access controls
- Audit logging
- Secure artifact storage
- Signed container images
- Security scanning integration
- Compliance validation
# Security Gate Configuration
security_gates:
trivy_filesystem:
severity_threshold: ["CRITICAL", "HIGH", "MEDIUM"]
max_critical: 0
max_high: 5
fail_on_timeout: true
owasp_dependency_check:
cvss_threshold: 9.0
formats: ["JSON", "HTML", "SARIF"]
suppression_file: ".github/dependency-check-suppressions.xml"
codeql:
languages: ["java"]
queries: ["security-extended", "security-and-quality"]
fail_on_error: true
sonarcloud:
quality_gate: "Sonar way"
coverage_threshold: 80
duplicated_lines_density: 3
maintainability_rating: "A"
reliability_rating: "A"
security_rating: "A"# Test Configuration
test_strategy:
unit_tests:
framework: "JUnit 5"
coverage_tool: "JaCoCo"
minimum_coverage: 80
parallel_execution: true
integration_tests:
framework: "Testcontainers"
database: "PostgreSQL 15"
container_reuse: true
test_profiles: ["integration-test"]
quality_gates:
coverage_threshold: 80
test_failure_threshold: 0
performance_regression: false# Build Configuration
build_strategy:
semantic_versioning:
source: "maven_project_version"
github_flow_strategy:
main: "{version}"
pull_request: "pr-{number}"
feature_branch: "{branch-name}-{commit_sha}"
container_build:
base_image: "eclipse-temurin:21-jdk-jammy"
runtime_image: "gcr.io/distroless/java21-debian12:nonroot"
platforms: ["linux/amd64", "linux/arm64"]
security_scan: true
deployment_strategy:
pull_request:
environment: "ephemeral"
namespace: "spring-app-pr-{number}"
resources: "minimal"
cleanup: "automatic"
smoke_tests: true
main_branch:
environment: "production"
namespace: "spring-app-production"
sync_policy: "automatic"
approval_required: true
canary_deployment: true
rollback_enabled: true# Notification Configuration
notifications:
github_comments:
security_summary: true
coverage_report: true
deployment_status: true
pr_environments:
creation_notification: true
url_sharing: true
monitoring_links: true
cleanup_notification: true
security_alerts:
critical_vulnerabilities: "immediate"
high_vulnerabilities: "daily_digest"
dependency_updates: "weekly"- Prometheus: Application metrics with ServiceMonitor configuration
- Mimir: Long-term metrics storage via remote write
- Grafana: Custom dashboards for application monitoring
- Loki: Log aggregation with structured logging
- Tempo: Distributed tracing with OpenTelemetry integration
- GitHub Actions: Multi-stage security and quality pipeline
- ArgoCD: GitOps-based deployment with ApplicationSets
- Container Registry: Secure image storage with vulnerability scanning
- SonarQube: Code quality and security analysis
- HashiCorp Vault: Secret management with External Secrets Operator
- Pod Security Standards: Existing security policies enforcement
- Network Policies: Integration with existing micro-segmentation
- OWASP Tools: Dependency checking and vulnerability management
This design provides a comprehensive, secure, and scalable solution for modernizing the legacy Java application while leveraging existing infrastructure and maintaining high availability and developer productivity.