-
Notifications
You must be signed in to change notification settings - Fork 114
Description
Summary
Resources configured with UseAsync = true in upjet create duplicate cloud resources after provider pod restarts or Kubernetes cluster backup/restore operations (e.g., Velero). This happens because the Terraform workspace state stored in ephemeral pod storage (/tmp/<workspace-id>/) is lost, causing the provider to think it needs to create new resources instead of managing existing ones.
Impact
- Severity: Critical
- Affected Resources: All resources with
UseAsync = trueconfiguration - Symptoms:
- Duplicate cloud resources created after provider pod restart
- Duplicate resources created after Velero backup/restore
external-nameannotation gets updated with new resource ID, losing connection to original resource- Original cloud resources become orphaned (not managed by Crossplane)
Environment
- upjet version: v2.2.0 (also affects latest version)
- Crossplane version: 1.x+
- Tested with: provider-ovh (OVH Managed Kubernetes Clusters)
- Reproduction scenarios:
- Provider pod restart (kubectl delete pod or pod crash)
- Kubernetes cluster backup/restore with Velero
- Node failure causing pod rescheduling
Root Cause
Technical Analysis
-
Terraform Workspace Storage: Upjet stores Terraform workspace files in ephemeral pod storage at
/tmp/<workspace-id>/. This includes:terraform.tfstate- Current state of managed resources- Provider configuration files
- Terraform lock files
-
Async Operation State: Resources with
UseAsync = trueare configured for long-running operations that require tracking async operation state. This state is stored within the Terraform workspace. -
State Loss on Pod Restart: When the provider pod restarts:
- Ephemeral
/tmp/storage is cleared - Terraform workspace files are lost
- Only Kubernetes resource metadata persists (including
external-nameannotation andexternal-create-succeededannotation)
- Ephemeral
-
Incorrect Reconciliation Flow: After pod restart, when reconciling a resource with lost workspace state:
- The
Observe()function callsRefresh()to sync state Refresh()tries to read the tfstate file, finds it missing, and treats the resource as not existing in Terraform state- Creates a new Terraform workspace with empty state
- The reconciler then thinks it needs to Create the resource instead of managing the existing one
- This triggers creation of a duplicate cloud resource
- The
Code Flow
external.Observe()
-> workspace.Refresh()
-> FileProducer.EnsureTFState() - Creates empty state when workspace missing
-> terraform refresh - Sees no existing state, returns "doesn't exist"
-> Returns ResourceExists=false
external.Create()
-> Creates duplicate resource in cloud provider
-> Updates external-name with new resource ID
Reproduction Steps
-
Setup:
apiVersion: kube.ovh.m.example.io/v1alpha1 kind: Cluster metadata: name: test-cluster namespace: default spec: forProvider: serviceName: "my-project-id" region: "EU-WEST-PAR" # ... other parameters
-
Create resource:
kubectl apply -f cluster.yaml # Wait for resource to be created and synced kubectl wait --for=condition=Ready cluster/test-cluster --timeout=600s
-
Verify external-name:
kubectl get cluster test-cluster -o jsonpath='{.metadata.annotations.crossplane\.io/external-name}' # Output: abc123def-original-cluster-id
-
Restart provider pod:
kubectl delete pod -n crossplane-system -l pkg.crossplane.io/provider=provider-ovh # Wait for pod to restart -
Observe duplicate creation:
# Check external-name - it will have changed kubectl get cluster test-cluster -o jsonpath='{.metadata.annotations.crossplane\.io/external-name}' # Output: xyz789ghi-new-duplicate-cluster-id # Check cloud provider - two clusters now exist # Original: abc123def-original-cluster-id # Duplicate: xyz789ghi-new-duplicate-cluster-id
Proposed Solution
Approach: Import Fallback for Async Resources
When an async resource has the external-create-succeeded annotation (indicating it was previously created successfully) but the Terraform workspace state is missing, use Import instead of Refresh to reconstruct the state directly from the cloud provider API.
Implementation
File: pkg/controller/external.go
Location: In the Observe() function, before calling Refresh()
// For async resources that were previously created, use Import instead
// of Refresh if the resource has been successfully created before.
// This prevents duplicate resource creation after provider pod restarts
// when the ephemeral workspace state in /tmp is lost.
// The external-create-succeeded annotation persists in Kubernetes and
// indicates the resource was successfully created or imported previously.
if e.config.UseAsync && meta.GetExternalName(tr) != "" {
annotations := tr.GetAnnotations()
if _, hasCreateSucceeded := annotations["crossplane.io/external-create-succeeded"]; hasCreateSucceeded {
e.logger.Debug("Using Import instead of Refresh for async resource with external-create-succeeded annotation",
"external-name", meta.GetExternalName(tr))
return e.Import(ctx, tr)
}
e.logger.Debug("Async resource missing external-create-succeeded annotation, using Refresh",
"external-name", meta.GetExternalName(tr),
"annotations", annotations)
} else {
e.logger.Debug("Not using Import fallback",
"useAsync", e.config.UseAsync,
"externalName", meta.GetExternalName(tr))
}Required import:
import (
// ... existing imports
"github.com/crossplane/crossplane-runtime/v2/pkg/meta"
)How the Fix Works
- Detection: Check if resource is async AND has
external-create-succeededannotation - Import: Use
Import()to reconstruct Terraform state from cloud provider API using theexternal-nameas the resource ID - State Reconstruction: Import queries the cloud provider API and rebuilds the tfstate file
- Normal Flow: After Import succeeds, reconciliation continues normally with proper state
Why This Works
- ✅
external-nameannotation persists in Kubernetes (not lost on pod restart) - ✅
external-create-succeededannotation persists in Kubernetes - ✅ Import reconstructs state directly from cloud provider API
- ✅ No duplicate resources created
- ✅ No manual intervention required
- ✅ Works with Velero backup/restore (annotations are backed up)
- ✅ Minimal code change, low risk
- ✅ Only affects async resources with confirmed prior creation
Testing
Test Scenarios
-
Provider Pod Restart:
- ✅ Create async resource
- ✅ Verify creation succeeded
- ✅ Delete provider pod
- ✅ Wait for pod restart
- ✅ Verify no duplicate created
- ✅ Verify resource remains synced
-
Velero Backup/Restore:
- ✅ Create async resource
- ✅ Backup with Velero
- ✅ Reset Kubernetes cluster
- ✅ Restore with Velero
- ✅ Verify no duplicate created
- ✅ Verify resource synced with existing cloud resource
-
Node Failure:
- ✅ Create async resource on node A
- ✅ Drain/cordon node A
- ✅ Pod reschedules to node B
- ✅ Verify no duplicate created
Test Results
All scenarios tested successfully with OVH Managed Kubernetes Clusters:
- No duplicate resources created
external-nameremains stable- Resources properly synced after recovery
Alternative Solutions Considered
1. PersistentVolume for Terraform Workspaces
Rejected: Adds complexity, requires storage provisioning, doesn't work well with pod scaling
2. Store tfstate in Kubernetes Secrets
Rejected: Large state files could exceed secret size limits, performance concerns
3. Disable UseAsync
Rejected: Removes async operation tracking, breaks long-running operations
4. Velero Filesystem Backup
Rejected: Only solves Velero case, doesn't help with pod restarts or node failures
5. Check for Resource Existence Before Create
Partially Rejected: Doesn't handle all edge cases, Import is more robust and already implemented
Related Issues
- Similar issue reported in provider-aws with long-running RDS operations
- Community discussions about ephemeral storage limitations in upjet
Additional Notes
External Name Configuration Considerations
When implementing this fix, ensure that your GetIDFn configurations handle empty externalName values correctly during initial resource creation:
GetIDFn: func(ctx context.Context, externalName string, parameters map[string]any, providerConfig map[string]any) (string, error) {
// Return empty string if external-name is not set yet (resource being created)
if externalName == "" {
return "", nil
}
// ... construct composite ID
}This prevents incomplete IDs (e.g., service_name/ instead of service_name/resource_id) from being set in tfstate before resource creation completes.
References
- Crossplane documentation on external-name: https://docs.crossplane.io/latest/concepts/managed-resources/#external-name
- Upjet architecture: https://github.com/crossplane/upjet/blob/main/docs/architecture.md
- Terraform Import: https://developer.hashicorp.com/terraform/cli/import