Skip to content

Conversation

@nogueiraanderson
Copy link
Contributor

@nogueiraanderson nogueiraanderson commented Dec 1, 2025

Summary

Add Jenkins pipelines and shared library for PMM HA testing on Red Hat OpenShift Service on AWS (ROSA) with Hosted Control Plane (HCP).

This implements a complete testing environment for PMM High Availability on OpenShift, addressing the requirements in PMM-14347.

Changes

New Files

  • pmm/v3/pmm3-ha-rosa.groovy - Main pipeline for creating ROSA HCP clusters and deploying PMM HA
  • pmm/v3/pmm3-ha-rosa-cleanup.groovy - Cleanup pipeline with cron support for cost management
  • vars/pmmHaRosa.groovy - Shared library with reusable functions for ROSA operations

Features

  • Creates ROSA HCP clusters with configurable OpenShift version (4.16, 4.17, 4.18)
  • Installs PMM HA using Helm charts from percona-helm-charts repository
  • Supports configurable worker node count and instance types
  • Creates Route53 DNS entries for external access
  • Implements cluster quota management (max 5 clusters)
  • Automated cleanup via cron (twice daily) for clusters older than 24h
  • Custom Security Context Constraints (SCC) for ROSA HCP compatibility
  • Global OpenShift pull secret for Docker Hub authentication

Related

Add Jenkins pipelines and shared libraries for PMM High Availability
testing on Red Hat OpenShift Service on AWS (ROSA) with Hosted Control
Planes (HCP).

New files:
- pmm/openshift/rosa_cluster_create.groovy - ROSA HCP cluster creation
- pmm/openshift/rosa_cluster_destroy.groovy - Cluster cleanup
- pmm/openshift/rosa_cluster_list.groovy - List existing clusters
- pmm/v3/pmm3-ha-rosa.groovy - PMM HA deployment pipeline
- pmm/v3/pmm3-ha-rosa-cleanup.groovy - PMM HA cleanup with cron
- vars/openshiftRosa.groovy - Generic ROSA operations library
- vars/pmmHaRosa.groovy - PMM HA specific operations library

Features:
- ROSA HCP cluster provisioning (~15 min vs ~45 min for IPI)
- PMM HA deployment via Helm charts
- PMM-specific OIDC config and operator roles (avoids conflicts)
- ECR pull-through cache support
- Cluster quota management (max 5 clusters)
- Automated cleanup via cron

Jira: PMM-14347
Account roles (Installer, Support, Worker) must be created before
OIDC config and operator roles. This fixes the permission error when
trying to use another team's installer roles.

Fixes: rosa create cluster failing with ListRoleTags permission error
The --role-arn, --support-role-arn, and --worker-iam-role flags
must be explicitly set to use PMM-specific account roles instead
of ROSA trying to find roles from the OIDC config metadata.
Extract OIDC config ID from existing operator role trust policies
instead of creating new configs. This prevents the mismatch between
operator roles and OIDC config that caused the trusted relationship
error.
- Add IaC/PerconaOpenShiftIAM.yml CloudFormation template for ROSA
  cluster management IAM user (percona-openshift-user)
- Include OIDC provider permissions required by rosa create cluster
- Add helm repo setup in pmm3-ha-rosa pipeline to fix dependency build
  (victoriametrics, altinity, percona repos)
Add Kyverno policy engine to automatically rewrite Docker Hub images
to use Percona DevServices registry (reg-19jf01na.percona.com) avoiding
rate limits.

Changes:
- Add 'Install Kyverno' stage with ClusterPolicy for image rewriting
- Update default OpenShift version to 4.18 (required for Kyverno 3.6.1)
- Update default helm chart branch to PMM-14324-pmm-ha-monitoring
- Support Docker Hub images: explicit (docker.io/), org/, and library

The ClusterPolicy handles:
- Containers and init containers
- Explicit docker.io/ prefixed images
- Implicit org/image format (bitnami/redis)
- Implicit library images (nginx, redis)
- Excludes kube-system, kyverno, openshift-* namespaces
Move openshiftRosa.groovy and pmmHaRosa.groovy from vars/ to pmm/v3/vars/
as requested in PR review. These are PMM-specific helpers, not general
shared library functions.

Update all 5 ROSA pipeline files to load the helpers from pmm/v3/vars/
using checkout + load pattern instead of relying on @Library.

Files moved:
- vars/openshiftRosa.groovy -> pmm/v3/vars/openshiftRosa.groovy
- vars/pmmHaRosa.groovy -> pmm/v3/vars/pmmHaRosa.groovy

Pipelines updated:
- pmm/v3/pmm3-ha-rosa.groovy
- pmm/v3/pmm3-ha-rosa-cleanup.groovy
- pmm/openshift/rosa_cluster_create.groovy
- pmm/openshift/rosa_cluster_destroy.groovy
- pmm/openshift/rosa_cluster_list.groovy
- Add all PMM HA service accounts to SCC in pmm3-ha-rosa.groovy
- Remove ECR pull-through cache code from openshiftRosa.groovy
- Remove ECR pull-through cache code from pmmHaRosa.groovy
- Kyverno policy handles image rewriting to DevServices registry

Docker Hub rate limits are now handled by the Kyverno ClusterPolicy
that rewrites images to reg-19jf01na.percona.com/dockerhub-cache/.
Remove redundant files:
- IaC/PerconaOpenShiftIAM.yml (moved to separate PR #3732)
- pmm/openshift/rosa_cluster_*.groovy (duplicates main pipeline)
- pmm/v3/vars/pmmHaRosa.groovy (thin wrapper, merged into main)

Update pmm3-ha-rosa.groovy to use openshiftRosa directly.

Final PR contains only 3 files:
- pmm/v3/pmm3-ha-rosa.groovy (deploy PMM HA)
- pmm/v3/pmm3-ha-rosa-cleanup.groovy (cleanup clusters)
- pmm/v3/vars/openshiftRosa.groovy (ROSA operations library)
The previous policy checked for dots in image names to exclude registry
prefixes (e.g., gcr.io/). However, this also excluded images with dots
in their tags (e.g., altinity/clickhouse-operator:0.25.4).

The registry check using Kyverno's images context is sufficient to
identify Docker Hub images, so the dot check is no longer needed.
Changed service name from 'pmm-ha' to 'monitoring-service' to match
the actual service name created by the pmm-ha Helm chart.

Also added progress output during LoadBalancer wait loop.
Add explicit waits for VictoriaMetrics, ClickHouse, and PostgreSQL
operators to be ready before running helm install. This prevents
webhook timeout errors during PMM HA chart installation.
Prevents "Waiting for LoadBalancer..." messages from polluting the
build description. Only the final URL or "pending" is now captured
by returnStdout.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants