Auto-retry failed Tasks in spawner dedup logic#428
Closed
axon-agent[bot] wants to merge 1 commit intomainfrom
Closed
Auto-retry failed Tasks in spawner dedup logic#428axon-agent[bot] wants to merge 1 commit intomainfrom
axon-agent[bot] wants to merge 1 commit intomainfrom
Conversation
When a Task created by a TaskSpawner fails, the spawner's dedup logic previously treated it the same as a succeeded task — the task name existed, so the work item was skipped. This meant failed cron tasks would block retries until TTL cleanup (up to 10 days for the strategist spawner). Now the spawner detects failed Tasks during dedup, deletes them, and recreates them for immediate retry. Succeeded tasks remain deduplicated as before. The retry still respects maxConcurrency and maxTotalTasks limits. Partially addresses #287 (problem 1: cron dedup prevents retry after failure). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 Axon Agent @gjkim42
Summary
When a Task created by a TaskSpawner fails, the spawner's dedup logic previously treated it the same as a succeeded task — the task name existed in the
existingTasksmap, so the work item was skipped on subsequent poll cycles. This meant:axon-fake-strategist(TTL: 864000s = 10 days), a single failure could waste 10 days. Foraxon-workers(TTL: 3600s), it wasted an hour.Changes
cmd/axon-spawner/main.go:newItemslist (eligible for retry)NotFoundgracefully in case TTL already cleaned it upcmd/axon-spawner/main_test.go:TestRunCycleWithSource_FailedTaskRetriedImmediately— verifies failed task is deleted and recreatedTestRunCycleWithSource_SucceededTaskNotRetried— verifies succeeded tasks remain deduplicatedTestRunCycleWithSource_FailedTaskRetryRespectsMaxConcurrency— verifies retries still respect concurrency limitsTestRunCycleWithSource_CompletedTasksDontCountTowardsLimitto use two succeeded tasks (the original used one failed task which is now retried by design)Design decisions
Why unconditional retry (no retry limit)? This is the simplest correct behavior — the spawner already has
maxTotalTasksto cap total task creation, andmaxConcurrencyto limit parallelism. A retry-specific limit would add complexity for little gain. Users who want bounded retries can usemaxTotalTasks. A futureretryPolicyfield (#298) could add more sophisticated control.Why delete-and-recreate instead of updating? Task spec has an immutability validation rule (
self == oldSelf), so the failed task cannot be updated. Deleting and recreating with the same name preserves the spawner's naming convention.Relates to
Test plan
go test ./cmd/axon-spawner/ -v— all 27 tests passmake verify— passesmake build— compiles cleanlySummary by cubic
Automatically retries failed Tasks by detecting failures during spawner dedup, deleting the failed Task, and recreating it with the same name. This prevents cron and issue spawners from getting stuck until TTL cleanup and still honors maxConcurrency and maxTotalTasks.
Written for commit cf6c3bf. Summary will update on new commits.