Skip to content

Conversation

@p-gentili
Copy link
Contributor

Description

  1. Any type of control host might affect provisioning
  2. Any type of control host might affect testing
  3. If the control host is not reachable, (1) or (2) can fail for a reason completed unrelated to the DUT

This PR propose a programmatic way to reboot the control host just before provisioning in case it's not reachable. I think ping is a good enough way to do this evaluation, considering different type of control hosts might have different ways of checking if it's really ready.

This is still opt-in (you need to call the super() method), but I think it's safe to call it everywhere for now, since it doesn't raise any error. We could also wrap it in try/except Exception if it makes sense.

Resolved issues

N/A

Documentation

Pending...

Web service API changes

No!

Tests

Not yet...

@codecov
Copy link

codecov bot commented Nov 27, 2025

Codecov Report

❌ Patch coverage is 7.35294% with 63 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.14%. Comparing base (edadf5b) to head (0c71ba8).

❌ Your patch check has failed because the patch coverage (7.35%) is below the target coverage (60.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #837      +/-   ##
==========================================
- Coverage   69.60%   69.14%   -0.46%     
==========================================
  Files         106      106              
  Lines        9142     9210      +68     
  Branches      841      847       +6     
==========================================
+ Hits         6363     6368       +5     
- Misses       2605     2668      +63     
  Partials      174      174              
Flag Coverage Δ *Carryforward flag
agent 71.03% <ø> (ø) Carriedforward from 0b206e8
cli 85.89% <ø> (ø) Carriedforward from 0b206e8
device 53.48% <7.35%> (-0.74%) ⬇️
server 87.87% <ø> (ø) Carriedforward from 0b206e8

*This pull request uses carry forward flags. Click here to find out more.

Components Coverage Δ
Agent 71.03% <ø> (ø)
CLI 85.89% <ø> (ø)
Common ∅ <ø> (∅)
Device Connectors 53.48% <7.35%> (-0.74%) ⬇️
Server 87.87% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

@amalinowski75 amalinowski75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. Just a few minor comments.

cmd,
shell=True,
check=True,
timeout=300,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

300 seconds seems a lot.

logger.error(
"Unexpected error running command %s: %s", cmd, str(e)
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I don't know how reboot scripts are built but in't better to run the entire script? What if there are some conditionals, etc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It ususally looks like [$POWER_OFF_CMD, "sleep N", $POWER_ON_CMD ]

self.__reboot_control_host()

# Wait for control host to be reachable via ping
self.__wait_back_alive(control_host, 60)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some devices it takes a few minutes to become fully operational after they become pingable. I think it's good to wait here at least 3 minutes.

logger.debug("No control host configured for this agent.")
return

with contextlib.suppress(FileNotFoundError):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileNotFoundError exception seems a bit odd. ConnectionError looks better IMO.

)
time.sleep(int(timeout))

def __ping(self, host: str) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We’ve seen cases where the control host is pingable but not reachable via SSH. I’d suggest using SSH to verify connectivity, as it provides better coverage.

@p-gentili p-gentili changed the title [RFC] Reboot control host before provisioning Reboot control host before provisioning Nov 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants