Skip to content

Consolidated preflight upgrade tests with tighter polling#11202

Open
brooke-hamilton wants to merge 2 commits intoradius-project:mainfrom
brooke-hamilton:brooke-hamilton/fix-flaky-upgrade-tests
Open

Consolidated preflight upgrade tests with tighter polling#11202
brooke-hamilton wants to merge 2 commits intoradius-project:mainfrom
brooke-hamilton:brooke-hamilton/fix-flaky-upgrade-tests

Conversation

@brooke-hamilton
Copy link
Member

@brooke-hamilton brooke-hamilton commented Feb 8, 2026

Description

Scheduled functional tests were sometimes failing due to timeouts. This PR fixes the issue by adjusting timeout values and consolidating tests. The impact of this change should be to reduce the duration of the tests, while addressing the timeout failures.

Changes

Test consolidation: Reduce 4 independent test functions (FreshInstall, PreflightDisabled, JobConfiguration, PreflightOnly) to 2 sequential subtests under a single Test_PreflightContainer parent, eliminating 2 full Helm install/uninstall cycles (~3-6 min each).

  • Merge FreshInstall and JobConfiguration into the "Enabled" subtest since FreshInstall was a strict subset (only checked job existence, not config) and JobConfiguration already verified everything it did plus TTL settings
  • Remove PreflightOnly test which was identical to FreshInstall (same helm values, same verification call)
  • Keep PreflightDisabled as the "Disabled" subtest unchanged

Extracted shared helpers: Extract shared logic into reusable helpers (helmInstall, helmUpgrade, helmUninstall, waitForControlPlane, findPreflightJob, logJobDetails, cleanupAndWait, runCommand) to eliminate ~120 lines of duplication across tests.

Tightened polling intervals: Reduce idle time during state transitions:

  • Control plane and cleanup polling: 5s → 3s
  • K8s client fallback sleep: 15s → 10s
  • Post-cleanup API deregistration wait: 5s → 3s

Proper cleanup waiting: cleanupAndWait now polls for all pods in the radius-system namespace to fully terminate before returning, preventing aggregated API service conflicts when the next helm install runs before the previous resources are fully cleaned up. Falls back to a static sleep if the k8s client cannot be created.

Note: These tests cannot run in parallel since they share the radius-system namespace and radius helm release.

Type of change

  • This pull request is a minor refactor, code cleanup, test improvement, or other maintenance task and does not change the functionality of Radius (issue link optional).

Contributor checklist

Please verify that the PR meets the following requirements, where applicable:

  • An overview of proposed schema changes is included in a linked GitHub issue.
    • Yes
    • Not applicable
  • A design document PR is created in the design-notes repository, if new APIs are being introduced.
    • Yes
    • Not applicable
  • The design document has been reviewed and approved by Radius maintainers/approvers.
    • Yes
    • Not applicable
  • A PR for the samples repository is created, if existing samples are affected by the changes in this PR.
    • Yes
    • Not applicable
  • A PR for the documentation repository is created, if the changes in this PR affect the documentation or any user facing updates are made.
    • Yes
    • Not applicable
  • A PR for the recipes repository is created, if existing recipes are affected by the changes in this PR.
    • Yes
    • Not applicable

Signed-off-by: Brooke Hamilton <45323234+brooke-hamilton@users.noreply.github.com>
Signed-off-by: Brooke Hamilton <45323234+brooke-hamilton@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant