-
Notifications
You must be signed in to change notification settings - Fork 24
Description
Summary
Enable Sveltos to execute Helm chart tests (helm test) as part of the addon deployment lifecycle, providing automated validation and operational gating capabilities.
Motivation
Helm charts often include test hooks that validate deployments, but currently there's no native way to:
- Automatically run tests after Helm releases are deployed via Sveltos
- Gate deployments based on test results before proceeding to next clusters
- Detect regressions when drift corrections or upgrades occur
- Schedule periodic validation of running releases
- Integrate test results into Sveltos' observability and status reporting
This would align Helm-based addons with Sveltos' existing validation capabilities for raw manifests, providing a unified approach to deployment verification.
Use Cases
1. Progressive Rollout with Validation
Deploy a new chart version to staging clusters, run tests, and only proceed to production if tests pass.
2. Post-Drift Validation
After Sveltos corrects drift, automatically verify that the application still functions correctly.
3. Scheduled Health Checks
Run tests periodically (e.g., every 6 hours) to catch issues that might develop over time.
4. Manual Smoke Testing
Trigger on-demand tests via CLI or label annotation during troubleshooting.
Proposed Solution
Basic Test Execution
Start with essential test execution capabilities that integrate naturally with Sveltos' existing reconciliation model.
apiVersion: config.projectsveltos.io/v1beta1
kind: ClusterProfile
metadata:
name: nginx-with-tests
spec:
clusterSelector:
matchLabels:
env: staging
helmCharts:
- repositoryURL: https://charts.example.com
repositoryName: myrepo
chartName: myrepo/nginx
chartVersion: 1.2.3
releaseName: nginx-app
releaseNamespace: default
helmChartAction: Install
# New test configuration
test:
enabled: true
# When to run tests
phases:
- PostInstall # After initial installation
- PostUpgrade # After upgrades
# Wait before running tests (allows pods to stabilize)
delaySeconds: 30
# Overall timeout for test execution
timeoutSeconds: 300
# What to do with test pods after completion
cleanupPolicy: OnSuccess # Always | OnSuccess | OnFailure | NeverBehavior
- Tests run automatically in specified phases
- Failures are recorded in status but do not block addon deployment by default
- Test results appear in ClusterProfile status for observability
Preconditions (Optional)
Add ability to wait for specific conditions before running tests.
test:
enabled: true
phases: [PostInstall, PostUpgrade]
# Prerequisites before running tests
preconditions:
- apiVersion: apps/v1
kind: Deployment
namespace: default
name: nginx-app
condition: |
status.conditions.exists(c,
c.type == "Available" && c.status == "True"
)
timeoutSeconds: 600
# Action if preconditions aren't met
onPreconditionFailure: Skip # Skip | FailAdvanced Triggers (Optional)
Add sophisticated trigger policies for different operational scenarios.
test:
enabled: true
triggerPolicy:
# When to execute tests
mode: OnChange # Always | OnChange | OnDrift | OnSchedule | Manual
# For OnChange mode: what constitutes a "change"
changeDetection:
basis: RenderedManifests # HelmReleaseVersion | RenderedManifests | ValuesDigest
# For OnDrift mode: which resources trigger tests
driftScope:
kinds: ["*"] # or specific kinds like ["Deployment", "Service"]
# For OnSchedule mode
schedule: "0 */6 * * *" # Cron expression
# Rate limiting
cooldownSeconds: 300Trigger Policy Details
Always: Run tests on every reconciliation (respects cooldownSeconds)
- Use case: Critical systems requiring continuous validation
- Works with:
continues: true
OnChange (recommended default): Run tests only when deployment actually changes
- Use case: Most production scenarios
- Detection basis configurable (version, rendered manifests, or values)
OnDrift: Run tests when drift is detected and corrected
- Use case: Validation after auto-remediation
- Requires:
continuesOnDriftDetection: true
OnSchedule: Run tests on a schedule, independent of deployments
- Use case: Periodic health checks
- Schedule uses cron format
Manual: Run tests only when explicitly triggered
- Use case: On-demand troubleshooting
- Trigger via: label or sveltosctl command