Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
__pycache__/
site/

docs/assets/.DS_Store
.DS_Store
94 changes: 58 additions & 36 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
# Design overview
# Architecture

The Percona Operator for PostgreSQL automates and simplifies
deploying and managing open source PostgreSQL clusters on Kubernetes.
The Operator is based on [CrunchyData’s PostgreSQL Operator :octicons-link-external-16:](https://access.crunchydata.com/documentation/postgres-operator/v5/).
This document provides a high-level overview of Percona Operator for PostgreSQL architecture, explaining how the various components connect to create a production-ready PostgreSQL cluster on Kubernetes. See also [How the Operator works](operator-how-it-works.md).

![image](assets/images/pgo.svg)
## Components

PostgreSQL containers deployed with the Operator include the following components:
The StatefulSet, deployed with the Operator includes the following components:

* The [PostgreSQL :octicons-link-external-16:](https://www.postgresql.org/) database management system, including:

Expand All @@ -22,46 +20,70 @@ PostgreSQL containers deployed with the Operator include the following component

* The [pgBouncer :octicons-link-external-16:](http://pgbouncer.github.io/) connection pooler for PostgreSQL,

* The PostgreSQL high-availability implementation based on the [Patroni template :octicons-link-external-16:](https://patroni.readthedocs.io/),
* The PostgreSQL high-availability implementation based on the [Patroni :octicons-link-external-16:](https://patroni.readthedocs.io/) template,

* the [pg_stat_monitor :octicons-link-external-16:](https://github.com/percona/pg_stat_monitor/) PostgreSQL Query Performance Monitoring utility,
* The [pg_stat_monitor :octicons-link-external-16:](https://github.com/percona/pg_stat_monitor/) PostgreSQL query performance monitoring utility,

* LLVM (for JIT compilation).
* PMM Client for observability
Comment on lines 27 to +28
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This component list has inconsistent punctuation/formatting (e.g., “LLVM (for JIT compilation).” followed immediately by “PMM Client for observability” without a period). Make the bullets consistent (capitalization and ending punctuation) to improve readability.

Copilot uses AI. Check for mistakes.

Each PostgreSQL cluster includes one member available for read/write transactions (PostgreSQL primary instance, or leader in terms of Patroni) and a number of replicas which can serve read requests only (standby members of the cluster).
![Operator overview](assets/images/pgo.svg)

To provide high availability from the Kubernetes side the Operator involves [node affinity :octicons-link-external-16:](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
to run PostgreSQL Cluster instances on separate worker nodes if possible. If
some node fails, the Pod with it is automatically re-created on another node.
### Role of each component

![image](assets/images/operator.svg)
* **Percona Distribution for PostgreSQL** is a suite of open source software, tools and services required to deploy and maintain a reliable production cluster for PostgreSQL.

* **Patroni** — a high-availability solution for PostgreSQL that automates replication and **failover**. It maintains the cluster state and coordinates leader election to ensure that a healthy primary node is always available. Patroni simplifies building and operating resilient PostgreSQL clusters by handling node monitoring, failover, and recovery automatically. See the [Patroni documentation :octicons-link-external-16:](https://patroni.readthedocs.io/) for how it integrates with your environment.

To provide data storage for stateful applications, Kubernetes uses
Persistent Volumes. A *PersistentVolumeClaim* (PVC) is used to implement
the automatic storage provisioning to pods. If a failure occurs, the
Container Storage Interface (CSI) should be able to re-mount storage on
a different node.
* **pgBouncer** — A **lightweight connection pooler** in front of PostgreSQL. It sits between client applications and the database server to manage and reuse connections efficiently. Instead of each client opening its own database connection, pgBouncer maintains a pool of connections and serves them to clients on demand, significantly reducing connection overhead and improving performance, especially for applications with many short-lived or concurrent connections.

The Operator functionality extends the Kubernetes API with [Custom Resources
Definitions :octicons-link-external-16:](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#customresourcedefinitions).
These CRDs provide extensions to the Kubernetes API, and, in the case of the
Operator, allow you to perform actions such as creating a PostgreSQL Cluster,
updating PostgreSQL Cluster resource allocations, adding additional utilities to
a PostgreSQL cluster, e.g. [pgBouncer :octicons-link-external-16:](https://www.pgbouncer.org/) for
connection pooling and more.
* **pgBackRest** — Handles **full, incremental, and differential** backups, compression and encryption, parallel processing, and point-in-time recovery using WAL archives. pgBackRest helps ensure data safety by providing efficient, consistent backups and fast restores for both small and large PostgreSQL environments. Backup and restore are integrated with Custom Resources (`PerconaPGBackup`, `PerconaPGRestore`). See [About backups](backups.md) to learn more.

When a new Custom Resource is created or an existing one undergoes some changes
or deletion, the Operator automatically creates/changes/deletes all needed
Kubernetes objects with the appropriate settings to provide a proper Percona
PostgreSQL Cluster operation.
* **pg_stat_monitor** — Collects **query performance** statistics.
* **PMM Client** - a lightweight agent installed on database hosts to collect metrics, logs, and performance data and send them to Percona Monitoring and Management (PMM) Server. It gathers detailed insights from databases and the operating system such as query performance, resource usage, and health metrics. It enables centralized monitoring, troubleshooting, and performance optimization for PostgreSQL clusters.

Following CRDs are created while the Operator installation:
### How components work together

* `perconapgclusters` stores information required to manage a PostgreSQL cluster.
This includes things like the cluster name, what storage and resource classes
to use, which version of PostgreSQL to run, information about how to maintain
a high-availability cluster, etc.
This workflow shows how cluster components work together:

* `perconapgbackups` and `perconapgrestores` are in charge for making backups
and restore them.
1. Your **application** uses a Kubernetes **Service** aimed at pgBouncer.
2. **pgBouncer** accepts many client connections and forwards work through a smaller set of server connections to PostgreSQL Pods.
3. **PostgreSQL** executes queries. **Writes** go to the **primary**. **Reads** can target the primary or **replicas**.
4. Primary streams WAL to replicas via instance Services
5. Patroni monitors the cluster state and coordinates the leader elections if the primary node fails
6. pgBackRest makes backups according to the schedule that you defined or when you manually create a backup object. pgBackRest saves backups to the backup storage that you configured. To learn more about backups, their workflow, and setup, refer to the [About backups](backups.md).
7. PMM Client collects performance metrics and sends them to the PMM Server for you to see and analyze. See [Monitor the database with PMM](monitoring.md) to learn more.


## Default cluster configuration

The default Percona Distribution for PostgreSQL configuration includes:

* 3 PostgreSQL servers, one primary and two replicas.
* 3 pgBouncer instances.
* a pgBackRest repository host instance - a dedicated instance in your cluster that stores filesystem backups made with pgBackRest.
* a PMM client instance - a monitoring and management tool for PostgreSQL that provides a way to monitor and manage your database. It runs as a sidecar container in the database Pods.

### Primary, replicas, and high availability

Each PostgreSQL cluster has **one primary** instance that accepts read/write transactions. **Replicas** are standbys: they replicate from the primary and typically serve **read-only** traffic (depending on how you expose them).

The Operator provides high availability through multiple layers of protection:

#### Pod distribution

The Operator uses [node affinity and anti-affinity :octicons-link-external-16:](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity) to distribute PostgreSQL instances across separate worker nodes when possible. This prevents a single node failure from taking down multiple database instances.

#### Automatic recovery

If a node fails, Kubernetes automatically reschedules the affected Pod on another healthy node. Patroni handles which PostgreSQL instance is primary and ensures replication continuity. For more on HA behavior and operations, see [High-availability](ha-deploy.md).

## Storage and persistent volumes

Stateful workloads rely on durable disk. Kubernetes attaches storage through **PersistentVolumeClaims (PVCs)**; the cluster’s CSI driver provisions **PersistentVolumes** and can **reattach** storage if a Pod moves to another node (subject to your storage class and platform).

If a node fails, the expectation is that the volume can be mounted elsewhere and the Pod recreated, while Patroni and PostgreSQL recover the database layer. For storage troubleshooting, see [Check storage](debug-storage.md).

## Next steps

For a comparison of Percona’s approach with other deployment models, see [Comparison with other solutions](compare.md).
Binary file removed docs/assets/.DS_Store
Binary file not shown.
169 changes: 169 additions & 0 deletions docs/assets/diagrams/operator_flow.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
#!/usr/bin/env python3
"""
Uses the mingrammer *diagrams* library with Kubernetes node icons:
https://diagrams.mingrammer.com/docs/nodes/k8s

Prerequisites:
pip install diagrams
# Graphviz must be installed (provides the `dot` binary):
# macOS: brew install graphviz
# Debian/Ubuntu: apt-get install graphviz

Run from repo root:
python docs/assets/diagrams/operator_flow.py
python docs/assets/diagrams/operator_flow.py --format png
python docs/assets/diagrams/operator_flow.py --format both

Default output: docs/assets/images/operator-flow-diagram.png.

Colors and cluster framing follow docs/assets/images/operator.svg (light blue fill #d4edfb,
cluster stroke #729fcf, connector blue #3465a4, label text #092256).
"""

from __future__ import annotations

import argparse
from pathlib import Path

from diagrams import Cluster, Diagram, Edge
from diagrams.k8s.compute import Pod
from diagrams.k8s.controlplane import CM
from diagrams.k8s.others import CRD

# Resolve paths so the script works when run from any cwd
_HERE = Path(__file__).resolve().parent
_REPO_ROOT = _HERE.parents[2]
_OUT_DIR = _REPO_ROOT / "docs" / "assets" / "images"
_OUT_NAME = "operator-flow-diagram"

# Palette aligned with docs/assets/images/operator.svg (Percona operator diagram)
_BG = "#ffffff"
_CLUSTER_FILL = "#d4edfb" # rgb(212, 237, 251)
_CLUSTER_BORDER = "#729fcf" # rgb(114, 159, 207)
# Primary UI blue in operator.svg (trapezoids / hex icons): rgb(50, 108, 229) — built-in K8s PNG icons match this.
_EDGE = "#3465a4" # rgb(52, 101, 164) — connectors and arrows
_TEXT = "#092256" # rgb(9, 34, 87) — titles and labels


def _cluster_graph_attr() -> dict[str, str]:
"""Rounded cluster box like the large light-blue area in operator.svg."""
return {
"bgcolor": _CLUSTER_FILL,
"style": "rounded",
"color": _CLUSTER_BORDER,
"penwidth": "2",
"fontcolor": _TEXT,
"fontsize": "13",
}


def _graph_attr_for_format(outformat: str) -> dict[str, str]:
"""Graphviz graph attributes; PNG sets dpi for readable raster output."""
ga: dict[str, str] = {
"bgcolor": _BG,
"pad": "0.45",
"fontsize": "13",
"fontname": "Helvetica",
}
if outformat == "png":
ga["dpi"] = "150"
return ga


def _build(outformat: str) -> Path:
"""Write operator-flow-diagram.{svg|png} to docs/assets/images/."""
if outformat not in ("svg", "png"):
raise ValueError(f"unsupported format: {outformat!r}")

_OUT_DIR.mkdir(parents=True, exist_ok=True)
out_path = str(_OUT_DIR / _OUT_NAME)

graph_attr = _graph_attr_for_format(outformat)

node_attr = {
"fontcolor": _TEXT,
"fontsize": "12",
"fontname": "Helvetica",
}

edge_attr = {
"color": _EDGE,
"fontcolor": _TEXT,
"fontsize": "10",
"fontname": "Helvetica",
"penwidth": "1.5",
}

# Left-to-right main flow (CRD → … → Application); colors match operator.svg
with Diagram(
"",
filename=out_path,
show=False,
direction="LR",
graph_attr=graph_attr,
node_attr=node_attr,
edge_attr=edge_attr,
outformat=outformat,
):
with Cluster("Kubernetes Cluster", graph_attr=_cluster_graph_attr()):
crd = CRD("Custom Resource\nDefinition (CRD)")
# No dedicated "Custom Resource instance" icon in k8s provider; Pod denotes workload API objects.
custom_resources = Pod("Custom Resources")
operator = CM("Operator Deployment")

# Stack pods horizontally inside the group.
with Cluster(
"Application",
direction="LR",
graph_attr=_cluster_graph_attr(),
):
# Three workload replicas (labels echo operator.svg DB Pod 1 / 2 / N pattern).
app_pods = [
Pod("DB Pod 1"),
Pod("DB Pod 2"),
Pod("DB Pod N"),
]

# CRD defines schema available to the API; users then create CR instances.
crd >> Edge(label="defines schema", color=_EDGE) >> custom_resources

# Bidirectional: Operator watches CRs and reconciles cluster state.
custom_resources << Edge(
forward=True,
reverse=True,
label="watch / reconcile",
color=_EDGE,
) >> operator

# Operator drives the managed application (StatefulSets, Services, etc.).
operator >> Edge(label="manages", color=_EDGE) >> app_pods[1]

return _OUT_DIR / f"{_OUT_NAME}.{outformat}"


def _parse_args() -> argparse.Namespace:
p = argparse.ArgumentParser(
description="Render the operator control-flow diagram (CRD → … → Application).",
)
p.add_argument(
"-f",
"--format",
choices=("svg", "png", "both"),
default="png",
help="Output format: vector SVG, raster PNG, or both (default: png).",
)
return p.parse_args()


def main() -> None:
args = _parse_args()
if args.format == "both":
paths = [_build("svg"), _build("png")]
else:
paths = [_build(args.format)]
for path in paths:
print(f"Wrote {path}")


if __name__ == "__main__":
main()
2 changes: 1 addition & 1 deletion docs/assets/fragments/what-you-install.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ The default Percona Distribution for PostgreSQL configuration includes:
* a pgBackRest repository host instance – a dedicated instance in your cluster that stores filesystem backups made with pgBackRest - a backup and restore utility.
* a PMM client instance - a monitoring and management tool for PostgreSQL that provides a way to monitor and manage your database. It runs as a sidecar container in the database Pods.

Read more about the default components in the [Architecture](architecture.md) section.
Read more about the default components in [Architecture](architecture.md).
Binary file added docs/assets/images/operator-flow-diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading