This guide covers upgrading the Team Operator: pre-upgrade preparation, upgrade procedures, version-specific migrations, and troubleshooting.
Starting with v1.15.0, the operator automatically applies its own CRDs at startup using server-side apply. This ensures the CRD schema always matches the running operator binary, even in cases where only the container image is updated without a full Helm chart upgrade (e.g., adhoc images for testing).
The operator uses the --manage-crds flag (default: true) to control this behavior. To opt out (for example, if you manage CRDs via Flux or ArgoCD), set:
controllerManager:
container:
args:
- "--manage-crds=false"When --manage-crds=false, the operator starts without touching CRDs, and you are responsible for keeping them in sync with the operator version.
Benefits of automatic CRD management:
- CRDs are always in sync with the operator version
- Works with adhoc images (e.g., PR branches) without requiring Helm chart changes
- Uses server-side apply (SSA) which is idempotent and only updates when schema differs
- No manual CRD management needed for most deployments
When to disable:
- GitOps workflows (Flux, ArgoCD) that manage CRDs separately
- Security policies requiring explicit CRD review before application
- Multi-tenant clusters where CRD updates require approval
RBAC Permissions: The operator requires the following RBAC permissions on its own CRDs:
get- to check if CRDs existpatch- to apply schema updates via server-side applyupdate- to modify CRD metadata
The Helm chart automatically grants these permissions. The operator intentionally omits the delete verb to prevent accidental data loss.
Note on CRD deletion: Because the operator's RBAC omits the delete verb for CRDs, if a future operator version removes a resource type, the now-orphaned CRD will remain in the cluster and must be removed manually:
kubectl delete crd <crd-name>.core.posit.teamBefore deleting an orphaned CRD, ensure all custom resources of that type have been removed to avoid losing data:
kubectl get <resource-plural> -A # verify no instances remain
kubectl delete crd <crd-name>.core.posit.teamBefore performing any upgrade, create backups of critical resources:
# Backup all Site resources
kubectl get sites -A -o yaml > sites-backup.yaml
# Backup all product resources
kubectl get workbenches -A -o yaml > workbenches-backup.yaml
kubectl get connects -A -o yaml > connects-backup.yaml
kubectl get packagemanagers -A -o yaml > packagemanagers-backup.yaml
kubectl get chronicles -A -o yaml > chronicles-backup.yaml
kubectl get flightdecks -A -o yaml > flightdecks-backup.yaml
kubectl get postgresdatabases -A -o yaml > postgresdatabases-backup.yaml
# Backup all Posit Team resources at once
kubectl get sites,workbenches,connects,packagemanagers,chronicles,flightdecks,postgresdatabases -A -o yaml > posit-team-resources-backup.yaml# Backup secrets in the Posit Team namespace
kubectl get secrets -n posit-team -o yaml > secrets-backup.yaml
# For sensitive backups, consider encrypting
kubectl get secrets -n posit-team -o yaml | gpg -c > secrets-backup.yaml.gpgIf using external databases for products (Connect, Workbench, Package Manager), back up databases before upgrading. The operator manages PostgresDatabase resources that schema changes may affect.
# List managed databases
kubectl get postgresdatabases -A
# For each database, create a backup using your database backup procedures
# Example for PostgreSQL:
# pg_dump -h <host> -U <user> -d <database> > database-backup.sqlVerify your current installation:
# Check Helm release version
helm list -n posit-team-system
# Check operator deployment image
kubectl get deployment team-operator-controller-manager -n posit-team-system -o jsonpath='{.spec.template.spec.containers[0].image}'
# Check CRD versions
kubectl get crds | grep posit.teamReview the CHANGELOG.md for breaking changes between your current version and the target version. Look for:
- Breaking changes that require configuration updates
- Deprecated fields that need migration
- New required fields
Critical: Test upgrades in a non-production environment first:
- Create a staging cluster or namespace that mirrors production
- Apply the same Site configuration
- Perform the upgrade
- Verify all products function
- Test any automated integrations
The recommended upgrade method is Helm:
# Update Helm repository (if using external repo)
helm repo update
# View changes before applying
helm diff upgrade team-operator ./dist/chart \
--namespace posit-team-system \
--values my-values.yaml
# Perform the upgrade
helm upgrade team-operator ./dist/chart \
--namespace posit-team-system \
--values my-values.yamlhelm upgrade team-operator ./dist/chart \
--namespace posit-team-system \
--set controllerManager.container.image.tag=v1.2.0 \
--values my-values.yamlCRDs are updated during Helm upgrade when crd.enable: true (default). If you've disabled CRD management:
# Manually apply CRD updates first
kubectl apply -f dist/chart/templates/crd/
# Then upgrade the operator
helm upgrade team-operator ./dist/chart \
--namespace posit-team-system \
--values my-values.yamlIf using Kustomize for deployment:
# Update the kustomization.yaml to reference the new version
# Then apply:
kubectl apply -k config/default
# Or for specific overlays:
kubectl apply -k config/overlays/productionCRDs require attention during upgrades:
-
CRDs Persist Across Helm Uninstall: By default (
crd.keep: true), CRDs remain in the cluster afterhelm uninstall. This prevents accidental data loss but requires careful CRD management. -
CRD Version Compatibility: The operator manages CRDs at API version
core.posit.team/v1beta1(andkeycloak.k8s.keycloak.org/v2alpha1for Keycloak). Your CRs must be compatible with the CRD schema in the new version. -
Schema Validation: After CRD updates, existing CRs are validated against the new schema. Invalid CRs may prevent reconciliation.
# Verify CRDs are updated
kubectl get crds sites.core.posit.team -o jsonpath='{.metadata.resourceVersion}'
# Check for validation issues
kubectl get sites -A -o json | jq '.items[] | select(.status.conditions[]?.reason == "InvalidSpec")'Breaking Change: Database Password Secret Rename
The Kubernetes Secret used to store the database password for each product component has been renamed from <component-name> to <component-name>-db-password.
If you are upgrading an existing installation that has already run the operator against live clusters, you must migrate the existing secrets before upgrading. Otherwise, the operator will create new secrets at the new name with freshly generated passwords, leaving the old secrets orphaned and causing database authentication failures.
Migration steps (run before upgrading the operator):
-
Identify the components with existing DB password secrets:
for comp in workbench connect packagemanager; do kubectl get secret "${comp}" -n posit-team --ignore-not-found -o name done
-
For each component (workbench, connect, packagemanager), rename the secret:
Warning: If
${NEW_NAME}already exists in the cluster, do not apply this migration — the operator has already generated a new password and you must re-synchronize the database password manually.# Get the old secret data OLD_NAME=<component-name> NEW_NAME="${OLD_NAME}-db-password" NAMESPACE=posit-team # Create new secret with old data kubectl get secret "${OLD_NAME}" -n "${NAMESPACE}" -o json \ | python3 -c "import json,sys; d=json.load(sys.stdin); d['metadata']['name']='${NEW_NAME}'; [d['metadata'].pop(k,None) for k in ['resourceVersion','uid','creationTimestamp','managedFields','ownerReferences']]; print(json.dumps(d))" \ | kubectl apply -f - # Delete old secret kubectl delete secret "${OLD_NAME}" -n "${NAMESPACE}"
-
Proceed with the operator upgrade.
If you are performing a fresh installation or upgrading a cluster that has never had the operator running against it, no migration is needed.
New Features:
- Added
CreateOrUpdateResourcehelper for improved reconciliation - Post-mutation label validation for Traefik resources
Deprecations:
BasicCreateOrUpdatefunction is deprecated in favor ofCreateOrUpdateResource
No configuration changes required for users.
New Features:
- Added
tolerationsandnodeSelectorsupport for controller manager
Migration: If you used workarounds for pod scheduling, update your values:
controllerManager:
tolerations:
- key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
effect: "NoSchedule"
nodeSelector:
kubernetes.io/os: linuxBug Fixes:
- Removed
kustomize-adopthook that could fail on tainted clusters
No migration required.
Initial Release:
- Migration from
rstudio/ptdrepository
If upgrading from the legacy rstudio/ptd operator, contact Posit support for migration assistance.
The following fields are deprecated and will be removed in future versions:
| CRD | Field | Replacement | Notes |
|---|---|---|---|
| Site | spec.secretType |
spec.secret.type |
Use the new Secret configuration block |
| Workbench | spec.config.databricks.conf |
spec.secretConfig.databricks |
Databricks config moved to SecretConfig |
| PackageManager | spec.config.CRAN |
N/A | PackageManagerCRANConfig is deprecated |
Migration Example - Databricks Configuration:
Before (deprecated):
apiVersion: core.posit.team/v1beta1
kind: Workbench
spec:
config:
databricks.conf:
workspace1:
name: "My Workspace"
url: "https://workspace.cloud.databricks.com"After (recommended):
apiVersion: core.posit.team/v1beta1
kind: Site
spec:
workbench:
databricks:
workspace1:
name: "My Workspace"
url: "https://workspace.cloud.databricks.com"
clientId: "<client-id>"The operator migrates legacy UUID-format and binary-format encryption keys to the new hex256 format. This happens during reconciliation. Monitor logs for migration messages:
kubectl logs -n posit-team-system deployment/team-operator-controller-manager | grep -i "migrating"# Verify the operator pod is running
kubectl get pods -n posit-team-system -l control-plane=controller-manager
# Check operator logs for errors
kubectl logs -n posit-team-system deployment/team-operator-controller-manager --tail=100
# Verify health endpoints
kubectl exec -n posit-team-system deployment/team-operator-controller-manager -- wget -qO- http://localhost:8081/healthz
kubectl exec -n posit-team-system deployment/team-operator-controller-manager -- wget -qO- http://localhost:8081/readyz# List all Posit Team CRDs with versions
kubectl get crds -o custom-columns=NAME:.metadata.name,VERSION:.spec.versions[0].name | grep posit.team
# Expected output:
# chronicles.core.posit.team v1beta1
# connects.core.posit.team v1beta1
# flightdecks.core.posit.team v1beta1
# packagemanagers.core.posit.team v1beta1
# postgresdatabases.core.posit.team v1beta1
# sites.core.posit.team v1beta1
# workbenches.core.posit.team v1beta1# Check all Sites are reconciling
kubectl get sites -A
# Check individual product resources
kubectl get workbenches -A
kubectl get connects -A
kubectl get packagemanagers -A
# Verify deployments are healthy
kubectl get deployments -n posit-team
# Test product endpoints
curl -I https://workbench.<your-domain>
curl -I https://connect.<your-domain>
curl -I https://packagemanager.<your-domain>Watch operator logs for the first 15-30 minutes after upgrade:
kubectl logs -n posit-team-system deployment/team-operator-controller-manager -fLook for:
- Reconciliation errors
- CRD validation failures
- Database connection issues
- Certificate/TLS errors
If issues occur after upgrade, rollback to the previous release:
# List release history
helm history team-operator -n posit-team-system
# Rollback to previous revision
helm rollback team-operator <revision-number> -n posit-team-system
# Example: rollback to revision 2
helm rollback team-operator 2 -n posit-team-systemImportant: CRDs are not rolled back with Helm rollback due to the keep annotation. If the new CRDs added fields, older operator versions may still work but won't recognize new fields.
If CRD rollback is necessary:
# Save current CRs
kubectl get sites,workbenches,connects,packagemanagers -A -o yaml > pre-rollback-backup.yaml
# Apply old CRDs (from your backup or previous chart version)
kubectl apply -f old-crds/
# Verify CRs are still valid
kubectl get sites -AConsider these data implications during rollback:
-
Database Schema Changes: If the upgrade included database schema changes, rollback may require database schema rollback as well.
-
Secret Format Changes: The operator's automatic key migration is one-way. Rolled-back operators will still work with migrated keys.
-
Configuration Changes: CRs modified to use new fields will need manual cleanup if rolling back to a version that doesn't support those fields.
-
Use Maintenance Windows: Schedule upgrades during low-traffic periods.
-
Rolling Update Strategy: The operator uses a single replica by default. During operator restarts:
- Products continue running if the operator is briefly unavailable
- No reconciliation occurs during operator restart (typically < 30 seconds)
-
Staged Rollout:
# First, upgrade operator in staging helm upgrade team-operator ./dist/chart -n posit-team-system-staging # Verify staging works # Then upgrade production helm upgrade team-operator ./dist/chart -n posit-team-system
-
Health Check:
- Liveness probe:
/healthz(port 8081) - Readiness probe:
/readyz(port 8081) - These probes ensure the operator is ready before receiving reconciliation requests
- Liveness probe:
-
Leader Election: If running multiple operator replicas (uncommon), leader election ensures one active reconciler:
controllerManager: container: args: - "--leader-elect"
- Workbench: Sessions continue running; new sessions may be delayed
- Connect: Published content remains accessible
- Package Manager: Package downloads continue working
- Flightdeck: Landing page remains accessible
Only reconciliation (applying changes) is affected during operator restart.
Symptom: CRs fail validation after CRD update
# Check for invalid CRs
kubectl get sites -A 2>&1 | grep -i error
# View validation errors
kubectl describe site <site-name> -n <namespace>Solution: Update CRs to match new schema requirements or remove deprecated fields.
Symptom: Admission webhook errors after upgrade
# Check webhook configuration
kubectl get validatingwebhookconfigurations | grep posit
kubectl get mutatingwebhookconfigurations | grep posit
# If webhooks are causing issues and you need to disable temporarily
kubectl delete validatingwebhookconfigurations <webhook-name>Solution: Ensure cert-manager is properly configured if webhooks are enabled.
Symptom: Operator pod fails to start
# Check pod events
kubectl describe pod -n posit-team-system -l control-plane=controller-manager
# Check logs
kubectl logs -n posit-team-system -l control-plane=controller-manager --previousCommon Causes:
- Missing RBAC permissions for new resources
- Invalid environment variables
- Certificate issues
Solution: Check Helm values and ensure all required permissions are granted.
Symptom: Operator continuously reconciles resources without reaching a stable state
# Watch operator logs for repeated reconciliation
kubectl logs -n posit-team-system deployment/team-operator-controller-manager -f | grep "Reconciling"Solution: Check for label/annotation conflicts or resources being modified by multiple controllers.
Symptom: Products fail to start due to database errors
# Check database connectivity
kubectl logs -n posit-team <product-pod> | grep -i databaseSolution: Verify database credentials in secrets and ensure network policies allow database access.
If you encounter issues not covered in this guide:
-
Check Operator Logs:
kubectl logs -n posit-team-system deployment/team-operator-controller-manager --tail=200
-
Review GitHub Issues: Check existing issues
-
Contact Support: Contact Posit for enterprise support
-
Collect Diagnostic Information:
# Create a diagnostic bundle kubectl get all -n posit-team-system -o yaml > diag-system.yaml kubectl get sites,workbenches,connects,packagemanagers -A -o yaml > diag-resources.yaml kubectl logs -n posit-team-system deployment/team-operator-controller-manager > diag-logs.txt
- Helm Chart README - Installation and configuration reference
- Site Management Guide - Managing Posit Team sites
- CHANGELOG - Version history and release notes