-
Notifications
You must be signed in to change notification settings - Fork 121
Restart autopilot on consecutive heartbeat failures #3773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
If we get no heartbeat from the subprocess, there is no way to ensure it is running safely. Killing ardupilot will now agressively prune ardupilot processes if it cannot stop the one it is attached to. Prior to this change, if we could not stop our own process, we would be in limbo, with no way to recover.
Reviewer's GuideAdds consecutive heartbeat failure monitoring to automatically restart Ardupilot when MAVLink heartbeats stop, and makes Ardupilot shutdown more robust by always pruning processes even if graceful termination fails. Sequence diagram for autopilot auto-restart on consecutive heartbeat failuressequenceDiagram
participant AutoRestartTask
participant AutopilotManager
participant VehicleManager
participant ArduPilotProcess
loop every_5_seconds
AutoRestartTask->>AutopilotManager: auto_restart_ardupilot()
alt should_be_running_and_process_not_running
AutopilotManager->>AutopilotManager: start_ardupilot()
end
alt should_be_running_and_is_running
AutopilotManager->>VehicleManager: is_heart_beating()
alt heartbeat_ok
VehicleManager-->>AutopilotManager: true
AutopilotManager->>AutopilotManager: _heartbeat_fail_count = 0
else heartbeat_missing
VehicleManager-->>AutopilotManager: false
AutopilotManager->>AutopilotManager: _heartbeat_fail_count += 1
end
else not_running_or_should_not_run
AutopilotManager->>AutopilotManager: skip_heartbeat_check
end
alt _heartbeat_fail_count >= _max_heartbeat_failures
AutopilotManager->>AutopilotManager: restart_ardupilot()
AutopilotManager->>AutopilotManager: _heartbeat_fail_count = 0
end
end
Updated class diagram for AutopilotManager heartbeat monitoring and shutdownclassDiagram
class AutopilotManager {
bool should_be_running
int _heartbeat_fail_count
int _max_heartbeat_failures
VehicleManager vehicle_manager
async auto_restart_ardupilot()
async start_ardupilot()
async restart_ardupilot()
async kill_ardupilot()
async terminate_ardupilot_subprocess()
async prune_ardupilot_processes()
bool is_running()
}
class VehicleManager {
async bool is_heart_beating()
async shutdown_vehicle()
}
class AutoPilotProcessKillFail {
}
AutopilotManager *-- VehicleManager : manages
AutopilotManager ..> AutoPilotProcessKillFail : handles_exception
Flow diagram for robust kill_ardupilot shutdown behaviorflowchart TD
A[Start kill_ardupilot] --> B[Log Terminating Ardupilot subprocess.]
B --> C[Call terminate_ardupilot_subprocess]
C -->|success| D[Log Ardupilot subprocess terminated.]
C -->|AutoPilotProcessKillFail| E[Log error about failed termination]
D --> F[Log Pruning Ardupilot's system processes.]
E --> F
F --> G[Call prune_ardupilot_processes]
G --> H[Log Ardupilot's system processes pruned.]
H --> I[End kill_ardupilot]
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 1 issue, and left some high level feedback:
- Consider making
_max_heartbeat_failuresconfigurable via settings instead of a hardcoded10, so the restart sensitivity can be tuned per deployment or vehicle type. - The
_heartbeat_fail_countis only reset on successful heartbeats and after a threshold-triggered restart; you may want to also reset it when (re)starting Ardupilot or whenshould_be_runningtransitions to false/true to avoid stale counts carrying across distinct runs.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider making `_max_heartbeat_failures` configurable via settings instead of a hardcoded `10`, so the restart sensitivity can be tuned per deployment or vehicle type.
- The `_heartbeat_fail_count` is only reset on successful heartbeats and after a threshold-triggered restart; you may want to also reset it when (re)starting Ardupilot or when `should_be_running` transitions to false/true to avoid stale counts carrying across distinct runs.
## Individual Comments
### Comment 1
<location> `core/services/ardupilot_manager/autopilot_manager.py:206-215` </location>
<code_context>
+ # Monitor MAVLink heartbeat while autopilot is supposed to be running
</code_context>
<issue_to_address>
**issue (bug_risk):** Heartbeat failure counter may carry over across stop/start cycles and trigger premature restarts.
Because `_heartbeat_fail_count` is only reset on a successful heartbeat or after a restart, any accumulated failures can persist when `should_be_running` flips to `False` (e.g., user stops Ardupilot). On the next start, normal connection delays might quickly push the stale count past `_max_heartbeat_failures`, causing premature restarts right after a clean start.
Consider resetting `_heartbeat_fail_count` when `should_be_running` transitions to `True` (start) and/or when it becomes `False` (intentional stop), so the counter only tracks failures for the current run session.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
If we get no heartbeat from the subprocess, there is no way to ensure it is running safely.
Killing ardupilot will now agressively prune ardupilot processes if it cannot stop the one it is attached to. Prior to this change, if we could not stop our own process, we would be in limbo, with no way to recover.
Testing
Summary by Sourcery
Add heartbeat-based monitoring to automatically restart Ardupilot on consecutive MAVLink heartbeat failures and harden shutdown behavior to avoid lingering or zombie Ardupilot processes.
New Features:
Bug Fixes:
Enhancements: