Skip to content

Conversation

@ederign
Copy link
Member

@ederign ederign commented Feb 2, 2026

Summary

Fixes race condition in upload_pipeline() that caused "Cannot retrieve pipeline version" errors when creating new pipelines.

  • Removed unnecessary version creation/deletion logic that was prone to race conditions
  • New pipelines now use the default version created by upload_pipeline() directly
  • Existing pipeline uploads unchanged (still create versions with random names)
Kapture.2026-02-01.at.11.30.11.mp4

Fixes #581

Root Cause

The previous code created a pipeline (which auto-creates a default version), then created a second version with a random name, then tried to delete the default version by sorting versions by created_at timestamp.

If both versions had the same timestamp (common with second-precision timestamps), the sort order was undefined and could delete the wrong version - the one we wanted to keep. The returned version ID then pointed to a deleted version, causing the "Cannot retrieve pipeline version" error.

Test Plan

  • Deploy a new pipeline via Kale and click the "Done" link - should navigate to the pipeline version without error
  • Deploy to an existing pipeline - should continue to work as before

… version" Error

Fix race condition in upload_pipeline() that caused "Cannot retrieve
pipeline version" errors when creating new pipelines.

The previous code created a pipeline (which auto-creates a default version),
then created a second version with a random name, then tried to delete the
default version by sorting versions by created_at timestamp. If both versions
had the same timestamp (common with second-precision timestamps), the sort
order was undefined and could delete the wrong version - the one we wanted
to keep. The returned version ID then pointed to a deleted version.

Simplified the flow to just use the default version that upload_pipeline()
creates, eliminating the race condition entirely:

- New pipelines: use the default version created by upload_pipeline()
- Existing pipelines: upload a new version with random name (unchanged)

Signed-off-by: Eder Ignatowicz <ignatowicz@gmail.com>
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from ederign. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ederign
Copy link
Member Author

ederign commented Feb 2, 2026

@StefanoFioravanzo do you recall why the 'upload_pipeline' has such complex logic?

@ederign ederign added this to the Kale 2.0 milestone Feb 2, 2026
@jesuino
Copy link
Collaborator

jesuino commented Feb 2, 2026

We saw this sometimes bu we were never able to consistently reproduce this! I do believe this may be related: #522

Copy link
Collaborator

@jesuino jesuino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did test starting multiple versions or multiple new pipelines and never had an error with this change, hence I do believe it is working as expected.

@google-oss-prow google-oss-prow bot removed the lgtm label Feb 3, 2026
@google-oss-prow
Copy link

New changes are detected. LGTM label has been removed.

@ederign
Copy link
Member Author

ederign commented Feb 3, 2026

@StefanoFioravanzo when you have a chance to review this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[backend] Pipeline Version Link Returns "Cannot retrieve pipeline version" Error

2 participants