Skip to content

Latest commit

 

History

History
664 lines (555 loc) · 18.5 KB

File metadata and controls

664 lines (555 loc) · 18.5 KB

Legacy Java Modernization Design Document

Overview

This design document outlines the comprehensive modernization of a legacy Java Spring Boot monolithic application. The solution implements a cloud-native architecture on Kubernetes with enterprise-grade security, observability, and developer experience. The design leverages existing infrastructure including HashiCorp Vault, observability stack (Prometheus, Grafana, Loki, Tempo, Mimir), GitHub Actions for CI, and ArgoCD for GitOps-based deployment. The design follows a phased approach to minimize risk while maximizing the benefits of modern DevOps practices.

Architecture

High-Level Architecture

graph TB
    subgraph "External"
        DEV[Developer]
        USER[End Users]
        ADMIN[Administrators]
    end
    
    subgraph "CI/CD Pipeline"
        GIT[Git Repository]
        CICD[CI/CD System]
        REG[Container Registry]
        VAULT[HashiCorp Vault]
    end
    
    subgraph "Kubernetes Cluster"
        subgraph "Ingress Layer"
            ING[Ingress Controller]
            CERT[cert-manager]
        end
        
        subgraph "Application Layer"
            APP[Spring Boot App]
            HPA[Horizontal Pod Autoscaler]
        end
        
        subgraph "Data Layer"
            PG[CloudNativePG Cluster]
            BAK[Automated Backups]
            OBJ[S3 ObjectStore]
        end
        
        subgraph "Observability"
            PROM[Prometheus]
            GRAF[Grafana]
            LOKI[Loki]
            TEMPO[Tempo]
            MIMIR[Mimir]
        end
        
        subgraph "Security"
            RBAC[RBAC]
            PSP[Pod Security]
            NP[Network Policies]
        end
    end
    
    DEV --> GIT
    GIT --> CICD
    CICD --> REG
    CICD --> VAULT
    USER --> ING
    ING --> APP
    APP --> PG
    PG --> BAK
    BAK --> OBJ
    APP --> PROM
    PROM --> GRAF
    APP --> LOKI
    APP --> TEMPO
    PROM --> MIMIR
Loading

Container Architecture

The containerization strategy uses a multi-stage build approach for optimal security and performance:

# Multi-stage Dockerfile structure
FROM eclipse-temurin:21-jdk-jammy AS builder
FROM gcr.io/distroless/java21-debian12:nonroot AS runtime

GitHub Flow and Semantic Versioning Strategy

The application follows GitHub Flow with semantic versioning (SemVer) extracted from the Maven project version:

Branching Strategy:

  • main branch: Production-ready code, protected branch
  • Feature branches: Short-lived branches for development (feature/user-auth, bugfix/login-issue)
  • Pull Requests: All changes go through PR review process

Image Tagging Strategy:

  • main branch: {version} (e.g., 1.2.3)
  • Pull requests: pr-{number} (e.g., pr-123)
  • Feature branches: {branch-name}-{commit-sha} (e.g., 1.2.3-feature-auth-abc1234)

Deployment Flow:

  • PR Creation: Ephemeral environment created automatically
  • PR Merge: Direct deployment to production after approval
  • Container Labels: Include version, revision, and build timestamp
  • Kustomize Integration: Automatic image tag updates in environment overlays

Kustomize-based Deployment Architecture

graph TB
    subgraph "Kustomize Structure"
        subgraph "Base Layer"
            BASE[Base Manifests]
            DBBASE[Database Base]
        end
        
        subgraph "Overlay Layer"
            STAGING[Staging Overlay]
            PROD[Production Overlay]
        end
        
        subgraph "External Secrets"
            ESO[External Secrets Operator]
            VAULT[HashiCorp Vault]
            SS[SecretStore]
            ES[ExternalSecret]
        end
    end
    
    BASE --> DEV
    BASE --> STAGING
    BASE --> PROD
    DBBASE --> DEV
    DBBASE --> STAGING
    DBBASE --> PROD
    VAULT --> ESO
    ESO --> SS
    SS --> ES
    ES --> DEV
    ES --> STAGING
    ES --> PROD
Loading

Environment-Specific Configurations

Component PR Staging Production
Namespace spring-app-dev spring-app-staging spring-app-production
Replicas 1 2 5
Memory Request 256Mi 512Mi 1Gi
Memory Limit 512Mi 1Gi 2Gi
CPU Request 250m 250m 500m
CPU Limit 500m 500m 1000m
HPA Min/Max 1/3 2/5 5/20
PDB Min Available 1 1 3
Database Storage 50GI 50GI 100GI
Read Replicas No No Yes (2)
Log Level DEBUG INFO WARN
Vault Server vault-dev vault-staging vault

Components and Interfaces

1. Application Container

Base Image Strategy:

  • Use eclipse-temurin:21-jdk-jammy as builder image for compilation
  • Use Google Distroless Java 21 for minimal runtime attack surface
  • Multi-stage build to separate build and runtime environments
  • Non-root user execution for security compliance

Configuration Management:

  • External configuration through ConfigMaps and environment variables
  • Secret management through Kubernetes Secrets integrated with HashiCorp Vault
  • Configuration validation at startup

2. Database Integration with CloudNativePG

PostgreSQL Deployment Strategy: Instead of manually managing StatefulSets, Services, and backup CronJobs, we leverage CloudNativePG, a specialized Kubernetes Operator that automates the PostgreSQL lifecycle through declarative custom resources.

CloudNativePG Benefits:

  • Automated Failover: Continuous monitoring with automatic replica promotion within seconds
  • Simplified Backups & PITR: Declarative backup configuration with WAL archiving to S3-compatible storage
  • Managed Read Replicas: Scaling read capacity by changing the instances number in the manifest
  • Zero-Downtime Upgrades: Rolling updates for PostgreSQL minor versions
  • Integrated Monitoring: Automatic PodMonitor creation for Prometheus metrics

Backup Strategy:

  • Automated daily backups using ScheduledBackup CRD
  • Point-in-time recovery with WAL archiving to MinIO/Ceph S3
  • 7-day retention policy with automatic cleanup
  • Cross-region backup replication for disaster recovery

3. Networking and Security

Network Architecture:

  • Ingress controller with TLS termination
  • Network policies for micro-segmentation
  • Service mesh consideration for advanced traffic management

Security Controls:

  • Pod Security Standards enforcement
  • RBAC with principle of least privilege
  • Admission controllers for policy enforcement
  • Runtime security monitoring

4. CI/CD Pipeline

GitHub Flow CI Pipeline:

graph TB
    A[Feature Branch] --> B[Pull Request]
    B --> C[scan-and-lint]
    C --> D[build-and-sast]
    D --> E[pre-tests]
    E --> F[image-and-push]
    F --> G[deploy]
    
    C --> C1[Secrets Scan]
    C --> C2[Trivy FS Scan]
    C --> C3[Checkstyle/PMD]
    
    D --> D1[Maven Package]
    D --> D2[OWASP Dependency Check]
    D --> D3[CodeQL Analysis]
    D --> D4[SonarQube Scan]
    D --> D5[Security Gates: 0 critical, ≤5 high]
    
    E --> E1[Unit Tests with JUnit]
    E --> E2[Integration Tests with Testcontainers]
    E --> E3[JaCoCo Coverage Analysis]
    E --> E4[Quality Gates: ≥80% coverage]
    
    F --> F1[Multi-arch Build: AMD64/ARM64]
    F --> F2[Semantic Versioning Tags]
    F --> F3[Container Security Scan]
    F --> F4[GitHub Container Registry Push]
    
    G --> G1[PR: Preview Label for ArgoCD]
    G --> G2[Main: Create PR to k8s Repository]
    
    G2 --> H[post-tests - Main Branch Only]
    H --> H1[Smoke Tests]
    H --> H2[Load Tests]
    H --> H3[End-to-End Tests]
    H --> H4[UAT Tests]
Loading

GitHub Flow Pipeline Stages:

  1. scan-and-lint: Secrets scan, Trivy filesystem scan, Checkstyle/PMD validation
  2. build-and-sast: Maven package, OWASP dependency check, CodeQL analysis, security gates
  3. pre-tests: Unit tests with JUnit/JaCoCo, integration tests with Testcontainers, self-hosted SonarQube analysis (after test execution), quality gates (≥80% coverage)
  4. image-and-push: Multi-architecture builds (AMD64/ARM64), semantic versioning, container security scanning, registry push
  5. deploy: PR environments via ArgoCD ApplicationSet, staging deployment via GitOps PR
  6. post-tests: Smoke tests, load tests, end-to-end tests, UAT tests (main branch only)

CD Pipeline (ArgoCD):

  1. GitOps-based deployment to staging environment
  2. Automated testing and validation
  3. Production deployment with canary strategy
  4. Automated synchronization and drift detection

Data Models

Configuration Schema

# Application Configuration Structure
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  application.yml: |
    server:
      port: 8080
    spring:
      datasource:
        url: jdbc:postgresql://postgres-service:5432/appdb
        hikari:
          maximum-pool-size: 20
          minimum-idle: 5
    management:
      endpoints:
        web:
          exposure:
            include: health,metrics,prometheus
      endpoint:
        health:
          show-details: always

External Secrets Operator Schema

# SecretStore for Vault integration
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: vault-secret-store
  namespace: spring-app
spec:
  provider:
    vault:
      server: "https://vault.domain.local"
      path: "secret"
      version: "v2"
      auth:
        kubernetes:
          mountPath: "kubernetes"
          role: "spring-app"
          serviceAccountRef:
            name: "spring-app"

---
# ExternalSecret for automatic secret synchronization
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: app-secrets
  namespace: spring-app
spec:
  refreshInterval: 15s
  secretStoreRef:
    name: vault-secret-store
    kind: SecretStore
  target:
    name: app-secrets
    creationPolicy: Owner
  data:
  - secretKey: database_username
    remoteRef:
      key: spring-app/database
      property: username
  - secretKey: database_password
    remoteRef:
      key: spring-app/database
      property: password

CloudNativePG Schema

# PostgreSQL Cluster managed by CloudNativePG Operator
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: postgres-app-cluster
spec:
  instances: 3 
  imageName: ghcr.io/cloudnative-pg/postgresql:18.0-standard-trixie
  monitoring:
    enablePodMonitor: true
  storage:
    size: 100Gi
  plugins:
  - name: barman-cloud.cloudnative-pg.io
    isWALArchiver: true
    parameters:
      barmanObjectName: s3local-eu-central

---
# Scheduled Backup with CloudNativePG Operator
apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
  name: postgres-app-cluster-backup
spec:
  cluster:
    name: postgres-app-cluster
  schedule: '1 2 3 * * *'
  backupOwnerReference: self
  method: plugin
  pluginConfiguration:
    name: barman-cloud.cloudnative-pg.io

---
# ObjectStore for backups
apiVersion: barmancloud.cnpg.io/v1
kind: ObjectStore
metadata:
  name: s3local-eu-central
spec:
  retentionPolicy: "7d"
  configuration:
    destinationPath: "s3://on-prem-s3-bucket/PGbackups/"
    endpointURL: "http://minio.storage.central.eu.local:9000"
    s3Credentials:
      accessKeyId:
        name: s3local-eu-central
        key: ACCESS_KEY_ID
      secretAccessKey:
        name: s3local-eu-central
        key: ACCESS_SECRET_KEY
    wal:
      compression: gzip
      encryption: AES256

Error Handling

Application-Level Error Handling

  1. Graceful Degradation:

    • Circuit breaker patterns for external dependencies
    • Fallback mechanisms for non-critical features
    • Proper error logging and monitoring
  2. Database Connection Handling:

    • Connection pool monitoring and alerting
    • Automatic retry mechanisms with exponential backoff
    • Health check endpoints for database connectivity
  3. Kubernetes-Level Error Handling:

    • Readiness and liveness probes configuration
    • Pod disruption budgets for maintenance scenarios
    • Automatic restart policies for failed containers

Monitoring and Alerting

The application integrates with existing observability infrastructure:

  • Prometheus: Metrics collection with remote write to Mimir
  • Grafana: Dashboard visualization and alerting
  • Loki: Centralized log aggregation and querying
  • Tempo: Distributed tracing with correlation
  • Mimir: Long-term metrics storage and federation
# Example Alert Rules for existing AlertManager
groups:
- name: application.rules
  rules:
  - alert: ApplicationDown
    expr: up{job="spring-boot-app"} == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "Application is down"
      
  - alert: DatabaseConnectionHigh
    expr: hikaricp_connections_active / hikaricp_connections_max > 0.8
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Database connection pool usage is high"

Testing Strategy

1. Unit Testing

  • JUnit 5 for unit tests with minimum 80% code coverage
  • Mockito for dependency mocking
  • TestContainers for integration testing with real databases

2. Integration Testing with Testcontainers

  • Testcontainers for real database integration testing
  • PostgreSQL container for repository layer testing
  • Container structure tests for Dockerfile validation
  • Security scanning with Trivy or Snyk
  • Performance testing of containerized application

3. Kubernetes Testing

  • Helm chart testing with chart-testing tool
  • Kubernetes manifest validation with kubeval
  • End-to-end testing with Ginkgo and Gomega

4. Pipeline Testing

  • Pipeline as code validation
  • Deployment smoke tests
  • Rollback testing scenarios

5. Security Testing

  • Static Application Security Testing (SAST)
  • Dynamic Application Security Testing (DAST)
  • Container image vulnerability scanning
  • Kubernetes security benchmarks (CIS)

Implementation Phases

Phase 1: Containerization (Days 1-2)

  • Create multi-stage Dockerfile
  • Implement configuration externalization
  • Set up local development environment
  • Container security hardening

Phase 2: Basic Kubernetes Deployment (Days 3-4)

  • Create basic Kubernetes manifests
  • Implement database connectivity
  • Set up monitoring and logging
  • Basic CI/CD pipeline

Phase 3: Advanced Features (Days 5-6)

  • Implement autoscaling and high availability
  • Advanced security controls
  • Comprehensive observability
  • Performance optimization

Phase 4: Production Readiness (Days 7-8)

  • Production deployment
  • Disaster recovery testing
  • Documentation and training
  • Performance tuning and optimization

Security Considerations

1. Container Security

  • Distroless base images
  • Non-root user execution
  • Minimal package installation
  • Regular security updates

2. Kubernetes Security

  • Pod Security Standards
  • Network policies for micro-segmentation
  • RBAC with least privilege
  • Admission controllers

3. Data Security

  • Encryption at rest and in transit
  • Secret management with external systems
  • Database access controls
  • Audit logging

4. Pipeline Security

  • Secure artifact storage
  • Signed container images
  • Security scanning integration
  • Compliance validation

CI/CD Pipeline Schema

Security Scanning Schema

# Security Gate Configuration
security_gates:
  trivy_filesystem:
    severity_threshold: ["CRITICAL", "HIGH", "MEDIUM"]
    max_critical: 0
    max_high: 5
    fail_on_timeout: true
    
  owasp_dependency_check:
    cvss_threshold: 9.0
    formats: ["JSON", "HTML", "SARIF"]
    suppression_file: ".github/dependency-check-suppressions.xml"
    
  codeql:
    languages: ["java"]
    queries: ["security-extended", "security-and-quality"]
    fail_on_error: true
    
  sonarcloud:
    quality_gate: "Sonar way"
    coverage_threshold: 80
    duplicated_lines_density: 3
    maintainability_rating: "A"
    reliability_rating: "A"
    security_rating: "A"

Testing Schema

# Test Configuration
test_strategy:
  unit_tests:
    framework: "JUnit 5"
    coverage_tool: "JaCoCo"
    minimum_coverage: 80
    parallel_execution: true
    
  integration_tests:
    framework: "Testcontainers"
    database: "PostgreSQL 15"
    container_reuse: true
    test_profiles: ["integration-test"]
    
  quality_gates:
    coverage_threshold: 80
    test_failure_threshold: 0
    performance_regression: false

Build and Deployment Schema

# Build Configuration
build_strategy:
  semantic_versioning:
    source: "maven_project_version"
    github_flow_strategy:
      main: "{version}"
      pull_request: "pr-{number}"
      feature_branch: "{branch-name}-{commit_sha}"
      
  container_build:
    base_image: "eclipse-temurin:21-jdk-jammy"
    runtime_image: "gcr.io/distroless/java21-debian12:nonroot"
    platforms: ["linux/amd64", "linux/arm64"]
    security_scan: true
    
  deployment_strategy:
    pull_request:
      environment: "ephemeral"
      namespace: "spring-app-pr-{number}"
      resources: "minimal"
      cleanup: "automatic"
      smoke_tests: true
      
    main_branch:
      environment: "production"
      namespace: "spring-app-production"
      sync_policy: "automatic"
      approval_required: true
      canary_deployment: true
      rollback_enabled: true

Notification Schema

# Notification Configuration
notifications:
  github_comments:
    security_summary: true
    coverage_report: true
    deployment_status: true
    
  pr_environments:
    creation_notification: true
    url_sharing: true
    monitoring_links: true
    cleanup_notification: true
    
  security_alerts:
    critical_vulnerabilities: "immediate"
    high_vulnerabilities: "daily_digest"
    dependency_updates: "weekly"

Integration with Existing Infrastructure

Observability Stack Integration

  • Prometheus: Application metrics with ServiceMonitor configuration
  • Mimir: Long-term metrics storage via remote write
  • Grafana: Custom dashboards for application monitoring
  • Loki: Log aggregation with structured logging
  • Tempo: Distributed tracing with OpenTelemetry integration

CI/CD Integration

  • GitHub Actions: Multi-stage security and quality pipeline
  • ArgoCD: GitOps-based deployment with ApplicationSets
  • Container Registry: Secure image storage with vulnerability scanning
  • SonarQube: Code quality and security analysis

Security Integration

  • HashiCorp Vault: Secret management with External Secrets Operator
  • Pod Security Standards: Existing security policies enforcement
  • Network Policies: Integration with existing micro-segmentation
  • OWASP Tools: Dependency checking and vulnerability management

This design provides a comprehensive, secure, and scalable solution for modernizing the legacy Java application while leveraging existing infrastructure and maintaining high availability and developer productivity.