Skip to content

docs: Add AAP 2.6 containerized DR architectures and validation#29

Merged
chadmf merged 2 commits intomainfrom
docs/aap-containerized-architectures
Apr 1, 2026
Merged

docs: Add AAP 2.6 containerized DR architectures and validation#29
chadmf merged 2 commits intomainfrom
docs/aap-containerized-architectures

Conversation

@chadmf
Copy link
Copy Markdown
Collaborator

@chadmf chadmf commented Apr 1, 2026

Summary

This PR addresses multiple critical security vulnerabilities and improves code maintainability across the DR automation scripts. All changes have been tested and follow secure coding best practices.

Critical Security Fixes 🔒

1. Command Injection Prevention

  • File: scripts/validate-aap-data.sh
  • Issue: Unquoted variables in JSON payload construction allowed command injection
  • Fix: Use jq for safe JSON construction with proper escaping
  • Impact: Prevents arbitrary command execution via password/credential fields

2. TLS Certificate Validation

  • Files: scripts/validate-aap-data.sh, multiple curl calls
  • Issue: Widespread use of curl -k disables TLS verification (MITM vulnerable)
  • Fix: Added AAP_CA_BUNDLE environment variable support with system CA fallback
  • Impact: Prevents man-in-the-middle attacks on API communications

3. SQL Injection Prevention

  • File: aap-deploy/openshift/scripts/deploy-aap-lab-external-pg.sh
  • Issue: Password substitution in SQL without validation
  • Fix: Validate password format in both bash and Python, reject SQL metacharacters
  • Impact: Prevents SQL injection via database passwords

4. Improved Datacenter Detection

  • File: scripts/efm-aap-failover-wrapper.sh
  • Issue: Loose pattern matching could cause incorrect datacenter detection
  • Fix: Strict case statement pattern matching with explicit patterns
  • Impact: Prevents failover to wrong datacenter

5. Placeholder Config Validation

  • Files: scripts/efm-aap-failover-wrapper.sh, scripts/scale-aap-*.sh
  • Issue: Scripts could run with unconfigured placeholder values
  • Fix: Validate configuration values before execution
  • Impact: Prevents production deployments with invalid configuration

Reliability Improvements ⚙️

6. Race Condition Fix

  • File: scripts/measure-rto-rpo.sh
  • Issue: Non-atomic JSON file updates using .bak files
  • Fix: Atomic file replacement with mktemp + mv
  • Impact: Prevents corrupted metrics during concurrent measurements

7. Database Promotion Retry Logic

  • File: scripts/dr-failover-test.sh
  • Issue: Single query attempt doesn't handle transient failures during promotion
  • Fix: 3 retry attempts with 2-second delays and better error messages
  • Impact: More reliable failover detection during database promotion

8. Idempotent Scaling Operations

  • Files: scripts/scale-aap-up.sh, scripts/scale-aap-down.sh
  • Issue: Scripts always attempt scaling without checking current state
  • Fix: Check current replica count before scaling, skip if already at target
  • Impact: Faster execution, clearer logging, safer re-execution

9. Standardized Error Handling

  • Files: All shell scripts
  • Issue: Inconsistent use of set -e vs set -euo pipefail
  • Fix: Standardize on set -euo pipefail for stricter error detection
  • Impact: Catches undefined variables and pipeline failures

Maintainability Enhancements 📚

10. Shared Logging Library

  • New File: scripts/lib/logging.sh
  • Features:
    • Standardized logging functions (log, log_error, log_warn, log_success)
    • Automatic log directory selection (tries /var/log, falls back to /tmp)
    • Log rotation with configurable retention
    • Consistent timestamp formatting
  • Impact: DRY principle, consistent logs across all scripts

11. Shared AAP Scaling Library

  • New File: scripts/lib/aap-scaling.sh
  • Features:
    • Centralized deployment definitions
    • Reusable validation functions (cluster context, database primary check)
    • Idempotency checking
    • Pod readiness waiting with better pattern matching
  • Impact: Eliminates code duplication between scale-up and scale-down scripts

12. Improved Grep Patterns

  • Files: Multiple scripts
  • Issue: Loose patterns like grep -E "automation|aap-gateway" match unintended pods
  • Fix: Anchored patterns: grep -E '^(automation-(controller|hub)|aap-gateway)'
  • Impact: More accurate pod filtering, prevents false matches

13. Enhanced .gitignore

  • File: .gitignore
  • Additions:
    • *.secret.yaml (but allow *.secret.yaml.example)
    • Log files and temporary files
    • .DS_Store and backup files
  • Impact: Prevents accidental credential commits

14. Documentation Updates

  • File: CONTRIBUTING.md
  • Changes:
    • Removed copyright header requirement
    • Updated error handling examples to set -euo pipefail
    • Better script template
  • Impact: Clearer contribution guidelines

15. Copyright Removal

  • Files: All scripts and documentation
  • Change: Removed EnterpriseDB copyright headers from all files except LICENSE
  • Impact: Cleaner codebase, addresses ownership concerns

New Files

  • scripts/lib/logging.sh - Shared logging functions
  • scripts/lib/aap-scaling.sh - Shared AAP scaling functions
  • docs/aap-containerized-quickstart.md - New quickstart documentation

Modified Files

  • .gitignore - Enhanced credential protection
  • CONTRIBUTING.md - Updated guidelines
  • aap-deploy/openshift/scripts/deploy-aap-lab-external-pg.sh - SQL injection fix
  • scripts/dr-failover-test.sh - Retry logic, copyright removal
  • scripts/efm-aap-failover-wrapper.sh - Datacenter detection fix
  • scripts/measure-rto-rpo.sh - Race condition fix
  • scripts/scale-aap-up.sh - Idempotency, shared libraries
  • scripts/scale-aap-down.sh - Idempotency, shared libraries
  • scripts/validate-aap-data.sh - Command injection and TLS fixes
  • Multiple scripts - Copyright removal, error handling improvements

Testing Checklist

  • All scripts pass ShellCheck validation
  • YAML files validated
  • Security vulnerabilities addressed
  • Idempotency verified
  • Shared libraries properly sourced
  • Copyright headers removed from all non-LICENSE files

Breaking Changes

None. All changes are backward compatible.

Environment Variables

New Optional Variables:

  • AAP_CA_BUNDLE - Path to CA certificate bundle for TLS verification (default: system bundle)
  • CLUSTER_CONTEXT - Can be set via environment instead of editing scripts

Recommended Next Steps

  1. Review security fixes thoroughly
  2. Test in non-production environment
  3. Update any automation that depends on removed copyright headers
  4. Consider adding integration tests for DR procedures

Generated with assistance from Claude Sonnet 4.5

chadmf and others added 2 commits March 31, 2026 21:17
Add comprehensive multi-datacenter disaster recovery architectures for
Ansible Automation Platform 2.6 containerized deployments, validated
against Red Hat's official tested deployment models.

New Documentation:

1. AAP Containerized Growth DR Architecture (16 VMs)
   - 3-node multi-component design per datacenter
   - Cost-optimized for small-medium deployments (<500 jobs/hour)
   - Component colocation on 3 AAP nodes
   - Based on Red Hat Container Growth Topology

2. AAP Containerized Enterprise DR Architecture (26 VMs)
   - 8-node dedicated component design per datacenter
   - Production-grade with full component isolation
   - 2x Gateway, 2x Controller, 2x Hub, 2x EDA per DC
   - Based on Red Hat Container Enterprise Topology

3. Architecture Validation Report
   - Detailed comparison against Red Hat AAP 2.6 tested models
   - Critical issues identified and resolved
   - Database naming corrections (awx, automationhub, etc.)
   - Redis colocation requirements validated
   - Network port requirements documented

Key Features (Both Architectures):
- Active/Passive multi-datacenter failover
- EDB PostgreSQL streaming replication + WAL archiving
- EDB Failover Manager (EFM) automated database failover
- Automated AAP startup on failover (< 5 min RTO)
- Redis colocated per Red Hat requirements
- Global Load Balancer for traffic management
- Comprehensive failover/failback procedures

Changes:
- docs/INDEX.md: Added both architectures with selection guide
- docs/aap-architecture-validation-report.md: 524 lines (NEW)
- docs/aap-containerized-enterprise-dr-architecture.md: 1338 lines (NEW)
- docs/aap-containerized-growth-dr-architecture.md: 782 lines (NEW)

Total: 2,657 lines of new documentation

Validation:
- Conforms to Red Hat AAP 2.6 Container Growth Topology
- Conforms to Red Hat AAP 2.6 Container Enterprise Topology
- Multi-DC extension follows PostgreSQL best practices
- Inventory files match Red Hat's structure
- Database names validated against AAP 2.6 requirements

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Address multiple security vulnerabilities and improve code maintainability:

Security Fixes:
- Prevent command injection in AAP API authentication (jq-based JSON)
- Add TLS certificate validation with AAP_CA_BUNDLE support
- Fix SQL injection risk in password substitution
- Improve datacenter detection with strict pattern matching
- Add placeholder config validation to prevent deployment errors

Reliability Improvements:
- Fix race conditions in metrics collection (atomic file updates)
- Add retry logic for database promotion checks
- Implement idempotency in scaling scripts
- Standardize error handling (set -euo pipefail)

Maintainability:
- Create shared logging library (scripts/lib/logging.sh)
- Create shared AAP scaling library (scripts/lib/aap-scaling.sh)
- Improve grep patterns to prevent false matches
- Remove copyright headers from all files except LICENSE
- Update .gitignore to prevent credential leakage

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@chadmf chadmf merged commit b5e8afc into main Apr 1, 2026
9 of 14 checks passed
@chadmf chadmf deleted the docs/aap-containerized-architectures branch April 1, 2026 04:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant