Skip to content

Feature Request: Support for Helm Chart Testing in ClusterProfile #668

@kahirokunn

Description

@kahirokunn

Summary

Enable Sveltos to execute Helm chart tests (helm test) as part of the addon deployment lifecycle, providing automated validation and operational gating capabilities.

Motivation

Helm charts often include test hooks that validate deployments, but currently there's no native way to:

  1. Automatically run tests after Helm releases are deployed via Sveltos
  2. Gate deployments based on test results before proceeding to next clusters
  3. Detect regressions when drift corrections or upgrades occur
  4. Schedule periodic validation of running releases
  5. Integrate test results into Sveltos' observability and status reporting

This would align Helm-based addons with Sveltos' existing validation capabilities for raw manifests, providing a unified approach to deployment verification.

Use Cases

1. Progressive Rollout with Validation

Deploy a new chart version to staging clusters, run tests, and only proceed to production if tests pass.

2. Post-Drift Validation

After Sveltos corrects drift, automatically verify that the application still functions correctly.

3. Scheduled Health Checks

Run tests periodically (e.g., every 6 hours) to catch issues that might develop over time.

4. Manual Smoke Testing

Trigger on-demand tests via CLI or label annotation during troubleshooting.

Proposed Solution

Basic Test Execution

Start with essential test execution capabilities that integrate naturally with Sveltos' existing reconciliation model.

apiVersion: config.projectsveltos.io/v1beta1
kind: ClusterProfile
metadata:
  name: nginx-with-tests
spec:
  clusterSelector:
    matchLabels:
      env: staging

  helmCharts:
  - repositoryURL: https://charts.example.com
    repositoryName: myrepo
    chartName: myrepo/nginx
    chartVersion: 1.2.3
    releaseName: nginx-app
    releaseNamespace: default
    helmChartAction: Install

    # New test configuration
    test:
      enabled: true
      
      # When to run tests
      phases:
      - PostInstall    # After initial installation
      - PostUpgrade    # After upgrades
      
      # Wait before running tests (allows pods to stabilize)
      delaySeconds: 30
      
      # Overall timeout for test execution
      timeoutSeconds: 300
      
      # What to do with test pods after completion
      cleanupPolicy: OnSuccess  # Always | OnSuccess | OnFailure | Never

Behavior

  • Tests run automatically in specified phases
  • Failures are recorded in status but do not block addon deployment by default
  • Test results appear in ClusterProfile status for observability

Preconditions (Optional)

Add ability to wait for specific conditions before running tests.

    test:
      enabled: true
      phases: [PostInstall, PostUpgrade]
      
      # Prerequisites before running tests
      preconditions:
      - apiVersion: apps/v1
        kind: Deployment
        namespace: default
        name: nginx-app
        condition: |
          status.conditions.exists(c, 
            c.type == "Available" && c.status == "True"
          )
        timeoutSeconds: 600
      
      # Action if preconditions aren't met
      onPreconditionFailure: Skip  # Skip | Fail

Advanced Triggers (Optional)

Add sophisticated trigger policies for different operational scenarios.

    test:
      enabled: true
      
      triggerPolicy:
        # When to execute tests
        mode: OnChange  # Always | OnChange | OnDrift | OnSchedule | Manual
        
        # For OnChange mode: what constitutes a "change"
        changeDetection:
          basis: RenderedManifests  # HelmReleaseVersion | RenderedManifests | ValuesDigest
        
        # For OnDrift mode: which resources trigger tests
        driftScope:
          kinds: ["*"]  # or specific kinds like ["Deployment", "Service"]
        
        # For OnSchedule mode
        schedule: "0 */6 * * *"  # Cron expression
        
        # Rate limiting
        cooldownSeconds: 300

Trigger Policy Details

Always: Run tests on every reconciliation (respects cooldownSeconds)

  • Use case: Critical systems requiring continuous validation
  • Works with: continues: true

OnChange (recommended default): Run tests only when deployment actually changes

  • Use case: Most production scenarios
  • Detection basis configurable (version, rendered manifests, or values)

OnDrift: Run tests when drift is detected and corrected

  • Use case: Validation after auto-remediation
  • Requires: continuesOnDriftDetection: true

OnSchedule: Run tests on a schedule, independent of deployments

  • Use case: Periodic health checks
  • Schedule uses cron format

Manual: Run tests only when explicitly triggered

  • Use case: On-demand troubleshooting
  • Trigger via: label or sveltosctl command

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions