Skip to content

Async resources with UseAsync=true create duplicate cloud resources after provider pod restart #561

@fabiencastarede

Description

@fabiencastarede

Summary

Resources configured with UseAsync = true in upjet create duplicate cloud resources after provider pod restarts or Kubernetes cluster backup/restore operations (e.g., Velero). This happens because the Terraform workspace state stored in ephemeral pod storage (/tmp/<workspace-id>/) is lost, causing the provider to think it needs to create new resources instead of managing existing ones.

Impact

  • Severity: Critical
  • Affected Resources: All resources with UseAsync = true configuration
  • Symptoms:
    • Duplicate cloud resources created after provider pod restart
    • Duplicate resources created after Velero backup/restore
    • external-name annotation gets updated with new resource ID, losing connection to original resource
    • Original cloud resources become orphaned (not managed by Crossplane)

Environment

  • upjet version: v2.2.0 (also affects latest version)
  • Crossplane version: 1.x+
  • Tested with: provider-ovh (OVH Managed Kubernetes Clusters)
  • Reproduction scenarios:
    1. Provider pod restart (kubectl delete pod or pod crash)
    2. Kubernetes cluster backup/restore with Velero
    3. Node failure causing pod rescheduling

Root Cause

Technical Analysis

  1. Terraform Workspace Storage: Upjet stores Terraform workspace files in ephemeral pod storage at /tmp/<workspace-id>/. This includes:

    • terraform.tfstate - Current state of managed resources
    • Provider configuration files
    • Terraform lock files
  2. Async Operation State: Resources with UseAsync = true are configured for long-running operations that require tracking async operation state. This state is stored within the Terraform workspace.

  3. State Loss on Pod Restart: When the provider pod restarts:

    • Ephemeral /tmp/ storage is cleared
    • Terraform workspace files are lost
    • Only Kubernetes resource metadata persists (including external-name annotation and external-create-succeeded annotation)
  4. Incorrect Reconciliation Flow: After pod restart, when reconciling a resource with lost workspace state:

    • The Observe() function calls Refresh() to sync state
    • Refresh() tries to read the tfstate file, finds it missing, and treats the resource as not existing in Terraform state
    • Creates a new Terraform workspace with empty state
    • The reconciler then thinks it needs to Create the resource instead of managing the existing one
    • This triggers creation of a duplicate cloud resource

Code Flow

external.Observe()
  -> workspace.Refresh()
    -> FileProducer.EnsureTFState() - Creates empty state when workspace missing
    -> terraform refresh - Sees no existing state, returns "doesn't exist"
  -> Returns ResourceExists=false

external.Create()
  -> Creates duplicate resource in cloud provider
  -> Updates external-name with new resource ID

Reproduction Steps

  1. Setup:

    apiVersion: kube.ovh.m.example.io/v1alpha1
    kind: Cluster
    metadata:
      name: test-cluster
      namespace: default
    spec:
      forProvider:
        serviceName: "my-project-id"
        region: "EU-WEST-PAR"
        # ... other parameters
  2. Create resource:

    kubectl apply -f cluster.yaml
    # Wait for resource to be created and synced
    kubectl wait --for=condition=Ready cluster/test-cluster --timeout=600s
  3. Verify external-name:

    kubectl get cluster test-cluster -o jsonpath='{.metadata.annotations.crossplane\.io/external-name}'
    # Output: abc123def-original-cluster-id
  4. Restart provider pod:

    kubectl delete pod -n crossplane-system -l pkg.crossplane.io/provider=provider-ovh
    # Wait for pod to restart
  5. Observe duplicate creation:

    # Check external-name - it will have changed
    kubectl get cluster test-cluster -o jsonpath='{.metadata.annotations.crossplane\.io/external-name}'
    # Output: xyz789ghi-new-duplicate-cluster-id
    
    # Check cloud provider - two clusters now exist
    # Original: abc123def-original-cluster-id
    # Duplicate: xyz789ghi-new-duplicate-cluster-id

Proposed Solution

Approach: Import Fallback for Async Resources

When an async resource has the external-create-succeeded annotation (indicating it was previously created successfully) but the Terraform workspace state is missing, use Import instead of Refresh to reconstruct the state directly from the cloud provider API.

Implementation

File: pkg/controller/external.go

Location: In the Observe() function, before calling Refresh()

// For async resources that were previously created, use Import instead
// of Refresh if the resource has been successfully created before.
// This prevents duplicate resource creation after provider pod restarts
// when the ephemeral workspace state in /tmp is lost.
// The external-create-succeeded annotation persists in Kubernetes and
// indicates the resource was successfully created or imported previously.
if e.config.UseAsync && meta.GetExternalName(tr) != "" {
    annotations := tr.GetAnnotations()
    if _, hasCreateSucceeded := annotations["crossplane.io/external-create-succeeded"]; hasCreateSucceeded {
        e.logger.Debug("Using Import instead of Refresh for async resource with external-create-succeeded annotation",
            "external-name", meta.GetExternalName(tr))
        return e.Import(ctx, tr)
    }
    e.logger.Debug("Async resource missing external-create-succeeded annotation, using Refresh",
        "external-name", meta.GetExternalName(tr),
        "annotations", annotations)
} else {
    e.logger.Debug("Not using Import fallback",
        "useAsync", e.config.UseAsync,
        "externalName", meta.GetExternalName(tr))
}

Required import:

import (
    // ... existing imports
    "github.com/crossplane/crossplane-runtime/v2/pkg/meta"
)

How the Fix Works

  1. Detection: Check if resource is async AND has external-create-succeeded annotation
  2. Import: Use Import() to reconstruct Terraform state from cloud provider API using the external-name as the resource ID
  3. State Reconstruction: Import queries the cloud provider API and rebuilds the tfstate file
  4. Normal Flow: After Import succeeds, reconciliation continues normally with proper state

Why This Works

  • external-name annotation persists in Kubernetes (not lost on pod restart)
  • external-create-succeeded annotation persists in Kubernetes
  • ✅ Import reconstructs state directly from cloud provider API
  • ✅ No duplicate resources created
  • ✅ No manual intervention required
  • ✅ Works with Velero backup/restore (annotations are backed up)
  • ✅ Minimal code change, low risk
  • ✅ Only affects async resources with confirmed prior creation

Testing

Test Scenarios

  1. Provider Pod Restart:

    • ✅ Create async resource
    • ✅ Verify creation succeeded
    • ✅ Delete provider pod
    • ✅ Wait for pod restart
    • ✅ Verify no duplicate created
    • ✅ Verify resource remains synced
  2. Velero Backup/Restore:

    • ✅ Create async resource
    • ✅ Backup with Velero
    • ✅ Reset Kubernetes cluster
    • ✅ Restore with Velero
    • ✅ Verify no duplicate created
    • ✅ Verify resource synced with existing cloud resource
  3. Node Failure:

    • ✅ Create async resource on node A
    • ✅ Drain/cordon node A
    • ✅ Pod reschedules to node B
    • ✅ Verify no duplicate created

Test Results

All scenarios tested successfully with OVH Managed Kubernetes Clusters:

  • No duplicate resources created
  • external-name remains stable
  • Resources properly synced after recovery

Alternative Solutions Considered

1. PersistentVolume for Terraform Workspaces

Rejected: Adds complexity, requires storage provisioning, doesn't work well with pod scaling

2. Store tfstate in Kubernetes Secrets

Rejected: Large state files could exceed secret size limits, performance concerns

3. Disable UseAsync

Rejected: Removes async operation tracking, breaks long-running operations

4. Velero Filesystem Backup

Rejected: Only solves Velero case, doesn't help with pod restarts or node failures

5. Check for Resource Existence Before Create

Partially Rejected: Doesn't handle all edge cases, Import is more robust and already implemented

Related Issues

  • Similar issue reported in provider-aws with long-running RDS operations
  • Community discussions about ephemeral storage limitations in upjet

Additional Notes

External Name Configuration Considerations

When implementing this fix, ensure that your GetIDFn configurations handle empty externalName values correctly during initial resource creation:

GetIDFn: func(ctx context.Context, externalName string, parameters map[string]any, providerConfig map[string]any) (string, error) {
    // Return empty string if external-name is not set yet (resource being created)
    if externalName == "" {
        return "", nil
    }
    // ... construct composite ID
}

This prevents incomplete IDs (e.g., service_name/ instead of service_name/resource_id) from being set in tfstate before resource creation completes.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions