Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
196 changes: 48 additions & 148 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# AAP with EDB Postgres Multi-Datacenter Architecture

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
![Status](https://img.shields.io/badge/Status-Production--Ready-green)
![Last Updated](https://img.shields.io/badge/Updated-March%202026-blue)

> **🚀 NEW: [Quick Start Guide](docs/quick-start-guide.md)** - Deploy in 15-30 minutes
> Choose your path: [OpenShift (15 min)](docs/quick-start-guide.md#quick-start-openshift-15-minutes) | [RHEL with TPA (20 min)](docs/quick-start-guide.md#quick-start-rhel-with-tpa-20-minutes) | [Local CRC (30 min)](docs/quick-start-guide.md#quick-start-local-testing-with-crc-30-minutes)

Expand All @@ -8,11 +12,9 @@
## Table of Contents

- [Overview](#overview)
- [Prerequisites](#prerequisites)
- [Quick Links](#quick-links)
- [Installation](#installation)
- [Architecture](#architecture)
- [Operations](#operations)
- [Contributing](#contributing)
- [Repository Structure](#repository-structure)

## Overview

Expand All @@ -30,9 +32,21 @@ datacenters.
- ✅ **Production-ready** - Security, monitoring, backup strategies

**Target RTO/RPO:**
- **In-datacenter failover:** RTO <1 minute, RPO <5 seconds
- **In-datacenter failover:** RTO (Recovery Time Objective) <1 minute, RPO (Recovery Point Objective) <5 seconds
- **Cross-datacenter failover:** RTO <5 minutes, RPO <5 seconds

## Prerequisites

Before getting started, ensure you have:

- **Platform**: OpenShift 4.12+ OR RHEL 8+ with root access
- **Database**: EnterpriseDB subscription for EDB Postgres Advanced Server
- **Storage**: S3-compatible storage for WAL archiving and backups
- **Network**: Network connectivity between datacenters (for replication)
- **Tools**: `oc` or `kubectl` CLI tools installed

📋 See [detailed requirements](docs/quick-start-guide.md#prerequisites) in the Quick Start Guide

## Quick Links

### Getting Started
Expand All @@ -52,154 +66,40 @@ datacenters.
- **[DR Testing Guide](docs/dr-testing-guide.md)** - Complete DR testing framework
- **[Troubleshooting](docs/troubleshooting.md)** - Common issues and solutions

## Installation

**Preferred automation:** Use **[Trusted Postgres Architect (TPA)](https://github.com/EnterpriseDB/tpa)**
from EnterpriseDB for Postgres on **bare metal, cloud instances, or SSH-managed hosts**—see
[docs/install-tpa.md](docs/install-tpa.md) and [EDB TPA documentation](https://www.enterprisedb.com/docs/tpa/latest/).

TPA does **not** deploy the **EDB Postgres on OpenShift** operator; for Postgres **on OpenShift
as pods**, use the operator and manual/GitOps steps in this repo.

### Installation Quick Reference

| Platform | Time | Guide |
|----------|------|-------|
| **OpenShift** | 15 min | [Quick Start - OpenShift](docs/quick-start-guide.md#quick-start-openshift-15-minutes) |
| **RHEL with TPA** | 20 min | [Quick Start - RHEL](docs/quick-start-guide.md#quick-start-rhel-with-tpa-20-minutes) |
| **Local CRC** | 30 min | [Quick Start - CRC](docs/quick-start-guide.md#quick-start-local-testing-with-crc-30-minutes) |

### Detailed Installation Guides

| Area | Description | Guide |
|------|-------------|--------|
| **RHEL / hosts (TPA)** *(recommended)* | `tpaexec` workflows for supported platforms (bare metal, cloud, Docker for testing) | [TPA install](docs/install-tpa.md)<br>[RHEL / Ansible entry](docs/install-tpa.md#rhel-tpa-ansible)<br>[TPA on GitHub](https://github.com/EnterpriseDB/tpa)<br>[EDB TPA docs](https://www.enterprisedb.com/docs/tpa/latest/) |
| **OpenShift** | Operator install, `Cluster` CRs, passive cross-cluster replica (streaming), AAP operator with external EDB Postgres | [Ansible / GitOps pointers](docs/install-kubernetes-manual.md#ansible-gitops)<br>[Manual `oc` / YAML](docs/install-kubernetes-manual.md)<br>[Kustomize EDB Install (`db-deploy/`)](db-deploy/README.md)<br>[Cross-cluster replica](db-deploy/cross-cluster/README.md)<br>[AAP deploy (`aap-deploy/`)](aap-deploy/README.md)<br>[AAP OpenShift manifests](aap-deploy/openshift/README.md)<br>[Operator smoke test](docs/openshift-edb-operator-smoke-test.md)<br>[EDB Postgres on OpenShift architecture](docs/install-kubernetes-manual.md#edb-postgres-on-openshift-architecture)<br>[Scaling (OpenShift)](docs/install-kubernetes-manual.md#scaling-considerations) |
| RHEL EDB Install (manual) | Traditional VM-based install without TPA | [RHEL — Manual](docs/install-rhel-manual.md) |
| OpenShift (manual) | Operator + YAML/`oc` only | [OpenShift — Manual](docs/install-kubernetes-manual.md) |
| **AAP architecture** | Reference layouts for AAP on RHEL vs OpenShift | [RHEL AAP](docs/rhel-aap-architecture.md)<br>[OpenShift AAP](docs/openshift-aap-architecture.md) |
| **Disaster recovery** | DR scenarios and failover planning | [DR scenarios](docs/dr-scenarios.md) |
| **EDB Failover Manager (EFM)** | EFM integration with Postgres | [EFM Integration](docs/enterprisefailovermanager.md) |
| **Troubleshooting** | Diagnostics and issue resolution | [Troubleshooting](docs/troubleshooting.md) |
| **AAP cluster scripts & runbook** | Automation and operational procedures | [Scripts](scripts/README.md)<br>[Runbook](docs/manual-scripts-doc.md) |

## Architecture

### Architecture Overview

The solution implements a **multi-datacenter Active/Passive architecture** with:

- **Two datacenters:** DC1 (active), DC2 (passive/DR)
- **PostgreSQL replication:** Physical streaming replication + WAL archiving to S3
- **AAP deployment:** Separate clusters in each datacenter, scaled based on active/passive state
- **Failover orchestration:** EDB Failover Manager (EFM) integration with AAP scaling scripts
- **Global load balancer:** Routes traffic to active datacenter

![EDB Postgres Multi-Datacenter Architecture](images/AAP_EDB.drawio.png)

### Key Components

1. **Global Load Balancer** - Single entry point with health check-based routing
2. **Ansible Automation Platform (AAP)** - Deployed in both datacenters
3. **PostgreSQL Clusters** - EDB Postgres Advanced with CloudNativePG operator
4. **Replication** - Streaming replication DC1→DC2 with S3 WAL archive fallback
5. **Backup** - Barman Cloud to S3 with 30-day retention and PITR capability

### Architecture Documentation

**📖 [Complete Architecture Documentation](docs/architecture.md)**

Detailed documentation includes:
- Component details (GLB, AAP, PostgreSQL)
- Network connectivity and data flow
- Replication topology and configuration
- Backup and restore strategies
- Scaling considerations
- Deployment architecture for RHEL and OpenShift

**Platform-Specific Architecture:**
- **[RHEL AAP Architecture](docs/rhel-aap-architecture.md)** - Systemd services, HAProxy, manual orchestration
- **[OpenShift AAP Architecture](docs/openshift-aap-architecture.md)** - Operators, native services, automated orchestration

## Operations

### Day-to-Day Operations

- **[Operations Runbook](docs/manual-scripts-doc.md)** - Step-by-step operational procedures
- **[Script Reference](scripts/README.md)** - All automation scripts with usage examples
- **[Troubleshooting Guide](docs/troubleshooting.md)** - Common issues and diagnostics

### Disaster Recovery

- **[DR Scenarios](docs/dr-scenarios.md)** - 6 documented failure scenarios with procedures
- **[DR Testing Guide](docs/dr-testing-guide.md)** - Complete testing framework with quarterly drills
- **[Split-Brain Prevention](docs/split-brain-prevention.md)** - Database role validation and fencing
- **[EDB Failover Manager](docs/enterprisefailovermanager.md)** - EFM integration and configuration

### Automation Scripts

Located in [`scripts/`](scripts/):

**AAP Management:**
- `scale-aap-up.sh` - Scale AAP to operational state
- `scale-aap-down.sh` - Scale AAP to zero (maintenance/DR)

**DR Orchestration:**
- `efm-orchestrated-failover.sh` - Full automated failover
- `dr-failover-test.sh` - DR testing automation
- `validate-aap-data.sh` - Post-failover validation
- `measure-rto-rpo.sh` - RTO/RPO measurement
- `generate-dr-report.sh` - Automated DR test reporting

**Pre-commit Hooks:**
- `hooks/check-script-permissions.sh` - Verify executable permissions
- `hooks/validate-openshift-manifests.sh` - Validate YAML manifests

## Contributing

We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for:

- Documentation standards
- Code standards (shell scripts, YAML)
- Testing requirements
- Pull request process
- Commit message guidelines

### Documentation

All documentation is in [`docs/`](docs/):
### Community
- **[Contributing Guide](CONTRIBUTING.md)** - How to contribute to this project
- **[License](LICENSE)** - Apache 2.0 License

- **[Documentation Index](docs/INDEX.md)** - Complete documentation organized by topic
- **[Quick Start Guide](docs/quick-start-guide.md)** - 15-30 minute deployment paths
- **[Architecture](docs/architecture.md)** - Comprehensive architecture documentation
## Repository Structure

### Repository Structure
<details>
<summary>📁 Click to expand repository structure</summary>

```
EDB_Testing/
├── docs/ # All documentation
│ ├── INDEX.md # Documentation index
── quick-start-guide.md # Quick start (15-30 min)
├── architecture.md # Architecture details
│ ├── dr-testing-guide.md # DR testing framework
── ... # Additional guides
── db-deploy/ # PostgreSQL deployment manifests
├── operator/ # CloudNativePG operator
│ ├── sample-cluster/ # Base cluster manifests
── cross-cluster/ # DC1→DC2 replication
├── aap-deploy/ # AAP deployment
── openshift/ # OpenShift manifests
│ └── edb-bootstrap/ # Database initialization
├── scripts/ # Automation scripts
│ ├── scale-aap-*.sh # AAP scaling
│ ├── dr-*.sh # DR orchestration
│ └── validate-*.sh # Validation scripts
├── openshift/ # OpenShift-specific configs
│ └── dr-testing/ # DR testing CronJob
└── .github/ # CI/CD workflows
└── workflows/ # GitHub Actions
├── aap-deploy/ # AAP deployment manifests
│ ├── openshift/ # OpenShift manifests
── edb-bootstrap/ # Database initialization
├── db-deploy/ # PostgreSQL deployment manifests
│ ├── operator/ # CloudNativePG operator
── sample-cluster/ # Base cluster manifests
│ └── cross-cluster/ # DC1→DC2 replication
├── docs/ # Comprehensive documentation
│ ├── INDEX.md # Documentation index
── quick-start-guide.md # 15-30 min deployment guide
├── architecture.md # Architecture details
── ... # Additional guides
── scripts/ # Operational automation scripts
├── lib/ # Shared libraries (logging, scaling)
│ ├── scale-aap-*.sh # AAP scaling scripts
│ ├── dr-*.sh # DR orchestration
│ └── validate-*.sh # Validation scripts
├── openshift/ # OpenShift-specific resources
│ └── dr-testing/ # DR testing CronJob
└── .github/ # CI/CD workflows
└── workflows/ # GitHub Actions
```

---
See [complete structure](docs/INDEX.md#documentation-structure) in the documentation index.

**Questions?** See [docs/INDEX.md](docs/INDEX.md) for complete documentation or open an issue.
</details>
Loading