-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Description
When a process started by Fleet is killed externally (e.g., via kill <pid>), Fleet occasionally ends up in a deadlock state.
In this situation, the orchestrator no longer reacts as expected:
- Pipelines/jobs depending on the killed process hang indefinitely
- Fleet doesn’t recover or release the lock without manual intervention
This seems related to how Fleet manages child processes and their async join handles.
Steps to Reproduce
- Start a project with a pipeline that launches a long-running process (e.g.,
sleep 1000or a Docker container). - From outside Fleet, kill the process (e.g.,
kill <pid>). - Observe Fleet’s behavior:
- Sometimes it cleans up correctly
- Sometimes it deadlocks, leaving the pipeline stuck forever and daemon not responding to cli
Expected Behavior
Fleet should:
- Detect when a child process is killed externally
- Gracefully handle cleanup (release locks, mark the job as failed, and continue)
- Avoid getting stuck in a deadlock state
Actual Behavior
- Deadlock occurs occasionally (not deterministic).
- Requires manual intervention (restart Fleet or stop/restart the pipeline).
Possible Cause (Hypothesis)
This may be due to:
tokio::process::Childnot propagating external kills properly- Incomplete cleanup of async tasks waiting on the child’s
await - Lock contention when the job manager tries to update state after the process disappears unexpectedly
Additional Context
- Issue is intermittent → seems related to race conditions in async handling.
- Might require using
wait_with_outputor explicit signal handling to avoid dangling futures. - Logs (if available) could help identify the exact deadlock point.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working