Customization Guide

How to adapt saif-gitops for your environment.

Overview

This repository is designed for the SAIF environment but can be customized for:

Different registries (air-gap, enterprise)
Different Splunk endpoints
Different cluster configurations
Different operator versions

Registry Configuration

Changing Image Registry

All images are pulled from the internal registry. To change:

Update IDMS in apps/platform-idms/

# apps/platform-idms/idms-*.yaml
spec:
  imageDigestMirrors:
    - mirrors:
        - your-registry.example.com:5000/openshift4
      source: quay.io/openshift-release-dev/ocp-v4.0-art-dev

Update Helm values in cluster overlays

# clusters/_base/tier3-nvidia/gpu-operator.yaml
spec:
  source:
    helm:
      values: |
        operator:
          repository: your-registry.example.com:5000/nvidia

Air-Gap Requirements

For fully disconnected environments:

Mirror images using saif-sys-admin/sync-images.yaml
Update IDMS manifests
Ensure CatalogSources point to mirrored operator indexes

Splunk Configuration

Changing Splunk Endpoint

Edit the Splunk OTEL configuration:

# clusters/_base/tier4-observability/splunk-otel-values.yaml
splunkObservability:
  realm: us1                    # Change to your realm
  accessToken: "${SPLUNK_ACCESS_TOKEN}"

clusterReceiver:
  config:
    exporters:
      signalfx:
        api_url: https://api.us1.signalfx.com  # Change endpoint
        ingest_url: https://ingest.us1.signalfx.com

Using Different Observability Backend

Replace Splunk OTEL with your preferred collector:

Remove splunk-otel.yaml from cluster folders
Create new Application pointing to your collector config
Ensure metrics endpoints are scraped:
- Cilium: :9962
- Hubble: :9965
- Tetragon: :2112
- DCGM: :9400

Cluster Configuration

Adding a New Cluster

Create cluster folder:

mkdir clusters/my-cluster

Add base applications:

# Copy from existing cluster
cp clusters/ai-pod-1/platform-idms.yaml clusters/my-cluster/
cp clusters/ai-pod-1/tetragon.yaml clusters/my-cluster/
# Add others as needed

Bootstrap ArgoCD (in saif-ai-pod):

gh workflow run openshift-post-install.yaml \
  -f cluster_name=my-cluster \
  -f apply_argocd=true

Inject secrets:

gh workflow run gitops-sync.yaml \
  -f cluster=my-cluster \
  -f inject_secrets=true

GPU vs Non-GPU Clusters

For clusters without GPU:

# Don't include these in cluster folder:
# - gpu-operator.yaml
# - nim-operator.yaml
# - nim-llm.yaml (model deployment)

Cluster-Specific Values

Use Kustomize patches for per-cluster configuration:

# clusters/my-cluster/hubble-timescape.yaml
spec:
  source:
    path: apps/hubble-timescape
    kustomize:
      patches:
        - target:
            kind: Service
            name: hubble-timescape-ui
          patch: |
            - op: replace
              path: /spec/loadBalancerIP
              value: "10.0.1.100"  # Your IP

Operator Versions

Upgrading Operators

Check compatibility with your OpenShift version
Update subscription channel:

# apps/gpu-operator/subscription.yaml
spec:
  channel: v24.6    # Change from v25.10

Commit and push - ArgoCD syncs automatically

Pinning Specific Versions

Use startingCSV for exact version control:

spec:
  channel: v25.10
  startingCSV: gpu-operator-certified.v25.10.0
  installPlanApproval: Manual  # Prevent auto-upgrades

Secrets Management

Using External Secrets

Replace Sealed Secrets with External Secrets Operator:

Remove sealed-secrets-controller Application
Add External Secrets Operator subscription
Create ExternalSecret resources pointing to your vault

Using Vault

# apps/external-secrets/vault-secret-store.yaml
apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
  name: vault
spec:
  provider:
    vault:
      server: "https://vault.example.com"
      path: "secret"
      auth:
        kubernetes:
          mountPath: "kubernetes"
          role: "external-secrets"

Network Configuration

LoadBalancer IPs

Cilium L2 announcements require explicit IPs:

# Per-service annotation
metadata:
  annotations:
    io.cilium/lb-ipam-ips: "10.0.1.80"

Or use IP pools:

# apps/cilium-config/ip-pool.yaml
apiVersion: cilium.io/v2alpha1
kind: CiliumLoadBalancerIPPool
metadata:
  name: default-pool
spec:
  cidrs:
    - cidr: 10.0.1.80/29  # Adjust for your network

Ingress Configuration

For clusters with Ingress instead of LoadBalancer:

# Replace LoadBalancer services with Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: hubble-timescape-ui
spec:
  rules:
    - host: hubble.apps.my-cluster.example.com
      http:
        paths:
          - path: /
            backend:
              service:
                name: hubble-timescape-ui
                port:
                  number: 80

Tetragon Configuration

Custom Tracing Policies

Add your own TracingPolicies:

# apps/tetragon/custom-policies/my-policy.yaml
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: my-custom-policy
spec:
  kprobes:
    - call: "security_file_open"
      # Your policy configuration

Adjusting Export Settings

# apps/tetragon/operator-config.yaml
spec:
  tracingPolicy:
    exportFilename: /var/run/cilium/hubble/tetragon.log
    connectionLogFilename: /var/run/cilium/hubble/tetragon-connections.log

Hubble Timescape

Storage Configuration

Adjust ClickHouse storage for your environment:

# charts/hubble-timescape/values.yaml
clickhouse:
  persistence:
    size: 100Gi  # Adjust based on retention needs
  resources:
    requests:
      memory: 4Gi
      cpu: 2

Retention Settings

hubbleServer:
  retention:
    flowsMaxAge: 7d      # Adjust retention period
    connectionMaxAge: 7d

Development Workflow

Testing Changes

Fork the repo for testing
Update ArgoCD to point to your fork:

oc -n openshift-gitops patch application saif-apps \
  --type merge \
  --patch '{"spec":{"source":{"repoURL":"https://github.com/YOUR_ORG/saif-gitops.git"}}}'

Test changes on development cluster
Merge to main when validated

Local Validation

# Validate YAML syntax
find apps/ -name "*.yaml" -exec yamllint {} \;

# Validate Kubernetes manifests
kubectl apply --dry-run=client -f apps/my-app/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Customization Guide

Overview

Registry Configuration

Changing Image Registry

Air-Gap Requirements

Splunk Configuration

Changing Splunk Endpoint

Using Different Observability Backend

Cluster Configuration

Adding a New Cluster

GPU vs Non-GPU Clusters

Cluster-Specific Values

Operator Versions

Upgrading Operators

Pinning Specific Versions

Secrets Management

Using External Secrets

Using Vault

Network Configuration

LoadBalancer IPs

Ingress Configuration

Tetragon Configuration

Custom Tracing Policies

Adjusting Export Settings

Hubble Timescape

Storage Configuration

Retention Settings

Development Workflow

Testing Changes

Local Validation

Related Documentation

FilesExpand file tree

CUSTOMIZATION.md

Latest commit

History

CUSTOMIZATION.md

File metadata and controls

Customization Guide

Overview

Registry Configuration

Changing Image Registry

Air-Gap Requirements

Splunk Configuration

Changing Splunk Endpoint

Using Different Observability Backend

Cluster Configuration

Adding a New Cluster

GPU vs Non-GPU Clusters

Cluster-Specific Values

Operator Versions

Upgrading Operators

Pinning Specific Versions

Secrets Management

Using External Secrets

Using Vault

Network Configuration

LoadBalancer IPs

Ingress Configuration

Tetragon Configuration

Custom Tracing Policies

Adjusting Export Settings

Hubble Timescape

Storage Configuration

Retention Settings

Development Workflow

Testing Changes

Local Validation

Related Documentation