Skip to content

Conversation

@SamD2021
Copy link
Contributor

@SamD2021 SamD2021 commented Dec 4, 2024

We need unique MAC addresses for each IPU, and to make that happen, we use pre-built fixboard images that need to be flashed after the firmware is applied. To support this, I added these pre-built images to our ARM server and wrote functions to interact with them, making it easier to add unique MAC addresses. I also created functions to check if the fixboard process needs to be applied and handle the flashing if it does.

On top of that, I added a new ssh_run function to simplify running commands on remote hosts, since that’s a pretty common task and it made sense to make it easier and more consistent.

utils/fwutils.py Outdated

self.logger.info("Step 5: apply_fixboard")
if self.should_run("apply_fixboard"):
should_fixboard = self.fixboard_is_needed()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make fixboard_is_needed() return an optional bool?

In my mind this could be simplified a bit.

if self.fixboard_is_needed():
    self.apply_fixboard()
else:
     # assume fixboard has already been applied, or the user does not want to apply this (though I'm not sure if the user has a knob to tell fwup to only run a subset of the steps anyways right now).

If we hit an exception when trying to determine if we should apply fixboard, something is wrong and we should bail out with an error code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user does not have a knob yet for should_run but here it skips if it shouldn't run, then checks if it needs to use fixboard, and inside the function when it errors out it logs the exception and returns a None, so in the reflash step i assert that it isn't none , but i am changing it to check then exit with an error code instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added the changes, and tested it with a reflash, seems good to go!


def fixboard_is_needed(self) -> Optional[bool]:
full_address = f"root@{self.imc_address}"
try:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect this to fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can fail because we are interacting with JSON so i just make sure to handle the exception in that function by returning None and wrapping the return type in an Optional

This will be useful for traversing through the server for fixboard
pre-built board config binaries
We often want to avoid host checking and such so it would be nice to
have a function that abstract all those flags and runs the command on
the remote host
utils/fwutils.py Outdated

except Exception as e:
self.logger.error(f"Unexpected error: {e}")
return None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just bail here, rather than returning None and having to handle this function returning either a boolean or None.

This is messy. If we encounter an error that we don't know how to handle, the hardware is in an unexpected state, and it is ok to just log the error, and exit immediately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with the changes and tested

utils/fwutils.py Outdated
if should_fixboard:
self.logger.info("Applying fixboard!")
self.apply_fixboard()
elif should_fixboard is None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get rid of this (see other comment)

should_fixboard should just return true, false, or throw an exception if something unexpected happens.

@bn222 bn222 merged commit ad644de into bn222:main Dec 19, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants