Skip to content

Conversation

@goasChris
Copy link

@goasChris goasChris commented Feb 6, 2026

If we get no heartbeat from the subprocess, there is no way to ensure it is running safely.

Killing ardupilot will now agressively prune ardupilot processes if it cannot stop the one it is attached to. Prior to this change, if we could not stop our own process, we would be in limbo, with no way to recover.

Testing

  • Tested locally on my machine via docker and SITL. I would essentially breakpoint/pause the binary. Also tried with sigchld (which could induce a zombie state'ish)
  • Deployed on pi4, running ardurover, repeated the same thing (linux board)
  • Tried pausing and unpausing the binary, to ensure the heartbeat failure counter would reset to 0 if heartbeat received

Summary by Sourcery

Add heartbeat-based monitoring to automatically restart Ardupilot on consecutive MAVLink heartbeat failures and harden shutdown behavior to avoid lingering or zombie Ardupilot processes.

New Features:

  • Introduce consecutive heartbeat failure tracking with a configurable threshold to trigger automatic Ardupilot restarts when MAVLink heartbeats stop.

Bug Fixes:

  • Ensure Ardupilot system processes are aggressively pruned even when the main subprocess cannot be cleanly terminated, preventing limbo or zombie states.

Enhancements:

  • Reset the heartbeat failure counter whenever a valid heartbeat is received to avoid unnecessary restarts.

If we get no heartbeat from the subprocess, there is no way to ensure it
is running safely.

Killing ardupilot will now agressively prune ardupilot processes
if it cannot stop the one it is attached to. Prior to this change, if we
could not stop our own process, we would be in limbo, with no way to
recover.
@sourcery-ai
Copy link

sourcery-ai bot commented Feb 6, 2026

Reviewer's Guide

Adds consecutive heartbeat failure monitoring to automatically restart Ardupilot when MAVLink heartbeats stop, and makes Ardupilot shutdown more robust by always pruning processes even if graceful termination fails.

Sequence diagram for autopilot auto-restart on consecutive heartbeat failures

sequenceDiagram
    participant AutoRestartTask
    participant AutopilotManager
    participant VehicleManager
    participant ArduPilotProcess

    loop every_5_seconds
        AutoRestartTask->>AutopilotManager: auto_restart_ardupilot()
        alt should_be_running_and_process_not_running
            AutopilotManager->>AutopilotManager: start_ardupilot()
        end

        alt should_be_running_and_is_running
            AutopilotManager->>VehicleManager: is_heart_beating()
            alt heartbeat_ok
                VehicleManager-->>AutopilotManager: true
                AutopilotManager->>AutopilotManager: _heartbeat_fail_count = 0
            else heartbeat_missing
                VehicleManager-->>AutopilotManager: false
                AutopilotManager->>AutopilotManager: _heartbeat_fail_count += 1
            end
        else not_running_or_should_not_run
            AutopilotManager->>AutopilotManager: skip_heartbeat_check
        end

        alt _heartbeat_fail_count >= _max_heartbeat_failures
            AutopilotManager->>AutopilotManager: restart_ardupilot()
            AutopilotManager->>AutopilotManager: _heartbeat_fail_count = 0
        end
    end
Loading

Updated class diagram for AutopilotManager heartbeat monitoring and shutdown

classDiagram
    class AutopilotManager {
        bool should_be_running
        int _heartbeat_fail_count
        int _max_heartbeat_failures
        VehicleManager vehicle_manager

        async auto_restart_ardupilot()
        async start_ardupilot()
        async restart_ardupilot()
        async kill_ardupilot()
        async terminate_ardupilot_subprocess()
        async prune_ardupilot_processes()
        bool is_running()
    }

    class VehicleManager {
        async bool is_heart_beating()
        async shutdown_vehicle()
    }

    class AutoPilotProcessKillFail {
    }

    AutopilotManager *-- VehicleManager : manages
    AutopilotManager ..> AutoPilotProcessKillFail : handles_exception
Loading

Flow diagram for robust kill_ardupilot shutdown behavior

flowchart TD
    A[Start kill_ardupilot] --> B[Log Terminating Ardupilot subprocess.]
    B --> C[Call terminate_ardupilot_subprocess]

    C -->|success| D[Log Ardupilot subprocess terminated.]
    C -->|AutoPilotProcessKillFail| E[Log error about failed termination]

    D --> F[Log Pruning Ardupilot's system processes.]
    E --> F

    F --> G[Call prune_ardupilot_processes]
    G --> H[Log Ardupilot's system processes pruned.]
    H --> I[End kill_ardupilot]
Loading

File-Level Changes

Change Details Files
Add heartbeat monitoring with a failure counter to trigger Ardupilot restarts when MAVLink heartbeats stop.
  • Initialize internal counters for consecutive heartbeat failures and a maximum failure threshold in the manager setup.
  • Extend the auto-restart loop to periodically check the MAVLink heartbeat while Ardupilot is supposed to be running.
  • Reset the heartbeat failure counter on successful heartbeat checks and increment it on failures or exceptions.
  • Trigger an Ardupilot restart when the failure count reaches the configured threshold and reset the counter afterward, with warning logs throughout.
core/services/ardupilot_manager/autopilot_manager.py
Harden Ardupilot shutdown by pruning processes even if graceful subprocess termination fails.
  • Wrap terminate_ardupilot_subprocess in a try/except catching AutoPilotProcessKillFail and log an error when termination fails.
  • Ensure pruning of Ardupilot system processes always runs after a terminate attempt, providing a clean slate even if the attached subprocess cannot be controlled.
core/services/ardupilot_manager/autopilot_manager.py

Possibly linked issues

  • #[autopilot-manager]: PR’s heartbeat-based restart and aggressive kill logic provide the missing watchdog recovery for failures after reboot command.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@CLAassistant
Copy link

CLAassistant commented Feb 6, 2026

CLA assistant check
All committers have signed the CLA.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • Consider making _max_heartbeat_failures configurable via settings instead of a hardcoded 10, so the restart sensitivity can be tuned per deployment or vehicle type.
  • The _heartbeat_fail_count is only reset on successful heartbeats and after a threshold-triggered restart; you may want to also reset it when (re)starting Ardupilot or when should_be_running transitions to false/true to avoid stale counts carrying across distinct runs.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider making `_max_heartbeat_failures` configurable via settings instead of a hardcoded `10`, so the restart sensitivity can be tuned per deployment or vehicle type.
- The `_heartbeat_fail_count` is only reset on successful heartbeats and after a threshold-triggered restart; you may want to also reset it when (re)starting Ardupilot or when `should_be_running` transitions to false/true to avoid stale counts carrying across distinct runs.

## Individual Comments

### Comment 1
<location> `core/services/ardupilot_manager/autopilot_manager.py:206-215` </location>
<code_context>
+            # Monitor MAVLink heartbeat while autopilot is supposed to be running
</code_context>

<issue_to_address>
**issue (bug_risk):** Heartbeat failure counter may carry over across stop/start cycles and trigger premature restarts.

Because `_heartbeat_fail_count` is only reset on a successful heartbeat or after a restart, any accumulated failures can persist when `should_be_running` flips to `False` (e.g., user stops Ardupilot). On the next start, normal connection delays might quickly push the stale count past `_max_heartbeat_failures`, causing premature restarts right after a clean start.

Consider resetting `_heartbeat_fail_count` when `should_be_running` transitions to `True` (start) and/or when it becomes `False` (intentional stop), so the counter only tracks failures for the current run session.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants