How to adapt saif-gitops for your environment.
This repository is designed for the SAIF environment but can be customized for:
- Different registries (air-gap, enterprise)
- Different Splunk endpoints
- Different cluster configurations
- Different operator versions
All images are pulled from the internal registry. To change:
- Update IDMS in
apps/platform-idms/
# apps/platform-idms/idms-*.yaml
spec:
imageDigestMirrors:
- mirrors:
- your-registry.example.com:5000/openshift4
source: quay.io/openshift-release-dev/ocp-v4.0-art-dev- Update Helm values in cluster overlays
# clusters/_base/tier3-nvidia/gpu-operator.yaml
spec:
source:
helm:
values: |
operator:
repository: your-registry.example.com:5000/nvidiaFor fully disconnected environments:
- Mirror images using
saif-sys-admin/sync-images.yaml - Update IDMS manifests
- Ensure CatalogSources point to mirrored operator indexes
Edit the Splunk OTEL configuration:
# clusters/_base/tier4-observability/splunk-otel-values.yaml
splunkObservability:
realm: us1 # Change to your realm
accessToken: "${SPLUNK_ACCESS_TOKEN}"
clusterReceiver:
config:
exporters:
signalfx:
api_url: https://api.us1.signalfx.com # Change endpoint
ingest_url: https://ingest.us1.signalfx.comReplace Splunk OTEL with your preferred collector:
- Remove
splunk-otel.yamlfrom cluster folders - Create new Application pointing to your collector config
- Ensure metrics endpoints are scraped:
- Cilium:
:9962 - Hubble:
:9965 - Tetragon:
:2112 - DCGM:
:9400
- Cilium:
- Create cluster folder:
mkdir clusters/my-cluster- Add base applications:
# Copy from existing cluster
cp clusters/ai-pod-1/platform-idms.yaml clusters/my-cluster/
cp clusters/ai-pod-1/tetragon.yaml clusters/my-cluster/
# Add others as needed- Bootstrap ArgoCD (in saif-ai-pod):
gh workflow run openshift-post-install.yaml \
-f cluster_name=my-cluster \
-f apply_argocd=true- Inject secrets:
gh workflow run gitops-sync.yaml \
-f cluster=my-cluster \
-f inject_secrets=trueFor clusters without GPU:
# Don't include these in cluster folder:
# - gpu-operator.yaml
# - nim-operator.yaml
# - nim-llm.yaml (model deployment)Use Kustomize patches for per-cluster configuration:
# clusters/my-cluster/hubble-timescape.yaml
spec:
source:
path: apps/hubble-timescape
kustomize:
patches:
- target:
kind: Service
name: hubble-timescape-ui
patch: |
- op: replace
path: /spec/loadBalancerIP
value: "10.0.1.100" # Your IP- Check compatibility with your OpenShift version
- Update subscription channel:
# apps/gpu-operator/subscription.yaml
spec:
channel: v24.6 # Change from v25.10- Commit and push - ArgoCD syncs automatically
Use startingCSV for exact version control:
spec:
channel: v25.10
startingCSV: gpu-operator-certified.v25.10.0
installPlanApproval: Manual # Prevent auto-upgradesReplace Sealed Secrets with External Secrets Operator:
- Remove
sealed-secrets-controllerApplication - Add External Secrets Operator subscription
- Create
ExternalSecretresources pointing to your vault
# apps/external-secrets/vault-secret-store.yaml
apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
name: vault
spec:
provider:
vault:
server: "https://vault.example.com"
path: "secret"
auth:
kubernetes:
mountPath: "kubernetes"
role: "external-secrets"Cilium L2 announcements require explicit IPs:
# Per-service annotation
metadata:
annotations:
io.cilium/lb-ipam-ips: "10.0.1.80"Or use IP pools:
# apps/cilium-config/ip-pool.yaml
apiVersion: cilium.io/v2alpha1
kind: CiliumLoadBalancerIPPool
metadata:
name: default-pool
spec:
cidrs:
- cidr: 10.0.1.80/29 # Adjust for your networkFor clusters with Ingress instead of LoadBalancer:
# Replace LoadBalancer services with Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: hubble-timescape-ui
spec:
rules:
- host: hubble.apps.my-cluster.example.com
http:
paths:
- path: /
backend:
service:
name: hubble-timescape-ui
port:
number: 80Add your own TracingPolicies:
# apps/tetragon/custom-policies/my-policy.yaml
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: my-custom-policy
spec:
kprobes:
- call: "security_file_open"
# Your policy configuration# apps/tetragon/operator-config.yaml
spec:
tracingPolicy:
exportFilename: /var/run/cilium/hubble/tetragon.log
connectionLogFilename: /var/run/cilium/hubble/tetragon-connections.logAdjust ClickHouse storage for your environment:
# charts/hubble-timescape/values.yaml
clickhouse:
persistence:
size: 100Gi # Adjust based on retention needs
resources:
requests:
memory: 4Gi
cpu: 2hubbleServer:
retention:
flowsMaxAge: 7d # Adjust retention period
connectionMaxAge: 7d- Fork the repo for testing
- Update ArgoCD to point to your fork:
oc -n openshift-gitops patch application saif-apps \
--type merge \
--patch '{"spec":{"source":{"repoURL":"https://github.com/YOUR_ORG/saif-gitops.git"}}}'- Test changes on development cluster
- Merge to main when validated
# Validate YAML syntax
find apps/ -name "*.yaml" -exec yamllint {} \;
# Validate Kubernetes manifests
kubectl apply --dry-run=client -f apps/my-app/- Architecture - GitOps patterns
- saif-ai-pod CUSTOMIZATION.md - Infrastructure customization