-
Notifications
You must be signed in to change notification settings - Fork 367
[Nexthop] Distro Image ONIE installer #763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
travisb-nexthop
wants to merge
7
commits into
facebook:main
Choose a base branch
from
nexthop-ai:onie_installer
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+1,414
−39
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[Nexthop] Fix linting of Distro Image Shell scripts The newly introduced shell linter causes the trivial rebasing of the later Distro Image PRs to fail with many lint errors. Rather than adding a lot of noise to those PRs, re-lint new shell scripts separately. There are two functional fixes in build_image.sh here which seem to have been broken by previous linting efforts: 1. DOCKER_BUILD_ARGS optionally constructs arguments for `docker build`, it must not be quoted when used on the command line or docker will receive an invalid option (because it contains literal spaces). 2. DOCKER_ARGS is similarly constructed to factor out common options between the interactive and non-interactive cases. It must also not be quoted. All other changes are automatically applied by pre-commit.
[Nexthop] Build Distro Image in CI <!-- Thanks for submitting a pull request! We appreciate you spending the time to work on these changes. Please provide enough information so that others can review your pull request. --> **Pre-submission checklist** - [x] I've ran the linters locally and fixed lint errors related to the files I modified in this PR. You can install the linters by running `pip install -r requirements-dev.txt && pre-commit install` - [x] `pre-commit run` <!-- Explain the motivation for making this change and any other context that you think would help reviewers of your code. What existing problem does the pull request solve? --> This is the start of the Distro Image CI where the Distro Image is built. This simply invokes the `fboss-image build` command against the sample `from_source.json` image manifest. The resulting artifacts are not preserved in this PR. As part of this it is necessary to remove `-it` from the docker command because GHA's don't have a stdin. Further the sample image manifest now requires all components be present, so add them. <!-- Demonstrate the code is solid. Example: The exact commands you ran and their output, screenshots / videos if the pull request changes the user interface. How exactly did you verify that your PR solves the issue you wanted to solve? --> <!-- If a relevant Github issue exists for this PR, please make sure you link that issue to this PR --> Look at the new workflow in this PR.
[Nexthop] Add FBOSS image builder CLI skeleton
**Pre-submission checklist**
- [X] I've ran the linters locally and fixed lint errors related to the
files I modified in this PR. You can install the linters by running `pip
install -r requirements-dev.txt && pre-commit install`
- [X] `pre-commit run`
Foundation for manifest-driven FBOSS distro image builder.
This PR adds a CLI tool (`fboss-image`) that provides the framework for
building FBOSS distribution images based on JSON manifests. The
implementation uses only Python standard library.
**Key components:**
- Basic CLI framework using standard library
- Manifest parser for JSON-based image definitions
- Build orchestration with defined component ordering
- Stub Device commands
- Pre-commit linting
cd fboss-image/distro_cli
python3 -m unittest discover -s tests -p '*_test.py'
[Nexthop] Add PXE-boot distro image export <!-- Thanks for submitting a pull request! We appreciate you spending the time to work on these changes. Please provide enough information so that others can review your pull request. --> **Pre-submission checklist** - [X] I've ran the linters locally and fixed lint errors related to the files I modified in this PR. You can install the linters by running `pip install -r requirements-dev.txt && pre-commit install` - [X] `pre-commit run` <!-- Explain the motivation for making this change and any other context that you think would help reviewers of your code. What existing problem does the pull request solve? --> Kiwi-NG already builds the PXE boot installer, we just need to move it into place. As part of this I removed the final ls from build_image_in_container.sh because it shows file paths which aren't relevant to users and so might be confusing. <!-- Demonstrate the code is solid. Example: The exact commands you ran and their output, screenshots / videos if the pull request changes the user interface. How exactly did you verify that your PR solves the issue you wanted to solve? --> <!-- If a relevant Github issue exists for this PR, please make sure you link that issue to this PR --> Run `fboss-image build from_source.json` and see that the new file `fboss-distro-image_pxe.tar` is created successfully.
[Nexthop] Distro Infrastructure container PXE-boot MVP
<!-- Thanks for submitting a pull request! We appreciate you spending
the time to work on these changes. Please provide enough information so
that others can review your pull request. -->
**Pre-submission checklist**
- [x] I've ran the linters locally and fixed lint errors related to the
files I modified in this PR. You can install the linters by running `pip
install -r requirements-dev.txt && pre-commit install`
- [x] `pre-commit run`
<!-- Explain the motivation for making this change and any other context
that you think would help reviewers of your code. What existing problem
does the pull request solve? -->
Here the minimum viable Distro Infrastucture container needed to support
IPv4 and IPv6 PXE boot is added. IPv4 expects a DHCP server to exist
on the network to provide IPv4 addresses to the switch. IPv6 defaults
to supply its own DHCPv6 server on the L2 segment, but that can be
disabled.
This is a self-contained, interactive docker container which uses
Proxy DHCP (IPv4) or DHCPv6 (IPv6) to direct PXE-booting devices to
the container's TFTP server and web server.
iPXE is used to support loading the relatively large initrd image over
HTTP instead of TFTP and to support supplying changeable arguments to
the installer initrd. Currently these are hardcoded into autoexec.ipxe,
but future changes might autogenerate this file based on the needs of the
particular PXE installer.
For usage details, see the included README.md. As this is a MVP, those
instructions must be followed to the letter. Future work will
integrate with the fboss-image tool to drive the Distro Infra container
in a more user-friendly way.
Once PXE boot has completed, the MAC is made ineligible for PXE booting
again until reconfigured. This is to support PXE installing, then
booting off the internal drive for every subsequent boot until
PXE-booting is explicitly requested again.
Under IPv4, the boot flow with iPXE is simple because iPXE receives
the next-server IP address. The IPv4 boot flow looks like:
1. BIOS
2. iPXE
3. tftp://next-server/autoexec.ipxe
4. http://next-server/FBOSS-Distro-Image.{kernel,initrd}
5. http://next-server/FBOSS-Distro-Image.xz
Unfortunately IPv6 is more complicated. iPXE does not receive
next-server or anything like it under IPv6, so we cannot follow that
simple flow.
Further, iPXE by default tries to autoconfigure its network interface
with IPv4 first then IPv6. Thus if the network were configured to
support both IPv4 PXE boot and IPv6 PXE boot (the Distro
Infrastructure default), while the BIOS would load iPXE over IPv6,
iPXE would load the PXE installer over IPv4. This protocol switching
is not satisfactory testing.
To resolve these two problems, we separate iPXE into IPv4 and IPv6
versions. The IPv4 version operates as above. The IPv6 version
uses two intermediate scripts to insert the server_ip configuration.
The boot flow for IPv6 is:
1. BIOS
2. iPXEv6
3. Script embedded inside iPXEv6 which forces IPv6 and 'sources' a
generated script -serverip
4. -serverip, a generated script sets the server_ip variable before
passing control onto tftp://server-ip/autoexec.ipxe shared with
IPv4
5. tftp://next-server/autoexec.ipxe
6. http://next-server/FBOSS-Distro-Image.{kernel,initrd}
7. http://next-server/FBOSS-Distro-Image.xz
To support both paths with a common autoexec.ipxe, host-server is used
as server_ip when executing under IPv4.
<!-- Demonstrate the code is solid. Example: The exact commands you ran
and their output, screenshots / videos if the pull request changes the
user interface. How exactly did you verify that your PR solves the issue
you wanted to solve? -->
<!-- If a relevant Github issue exists for this PR, please make sure you
link that issue to this PR -->
Only manual, happy path is tested.
This has been tested manually against fboss103. Under IPv4 that test
output is:
```
ds103:#s-image/distro_infra $ ./build.sh && ./distro_infra.sh --intf vlan1033 --persist-dir data
...
=> exporting to image 0.7s
=> => exporting layers 0.6s
=> => writing image sha256:27dec285715ddfc30a692a4fee1cb34f79a02e581df34801a8a0330e256cf0c9 0.0s
=> => naming to docker.io/library/fboss_distro_infra 0.0s
Listening on vlan1033 - 10.250.33.194 & fc00:33::89
dnsmasq: started, version 2.85 DNS disabled
dnsmasq: compile time options: IPv6 GNU-getopt DBus no-UBus no-i18n IDN2 DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth cryptohash DNSSEC loop-detect inotify dumpfile
dnsmasq-dhcp: DHCP, proxy on subnet 10.250.33.194
dnsmasq-dhcp: DHCPv6, IP range ::fb05:5000:1 -- ::fb05:50ff:ffff, lease time 5m, template for vlan1033
dnsmasq-dhcp: router advertisement on vlan1033
dnsmasq-dhcp: DHCPv6, IP range fc00:33::fb05:5000:1 -- fc00:33::fb05:50ff:ffff, lease time 5m, constructed for vlan1033
dnsmasq-dhcp: router advertisement on fc00:33::, constructed for vlan1033
dnsmasq-dhcp: RTR-ADVERT(vlan1033) fc00:33::
dnsmasq-dhcp: IPv6 router advertisement enabled
dnsmasq-dhcp: DHCP, sockets bound exclusively to interface vlan1033
dnsmasq-tftp: TFTP root is /distro_infra/persistent secure mode
dnsmasq-dhcp: read /distro_infra/dnsmasq_conf.d/default_ignore
dnsmasq-dhcp: RTR-ADVERT(vlan1033) fc00:33::
dnsmasq-dhcp: RTR-ADVERT(vlan1033) fc00:33::
dnsmasq-dhcp: RTR-ADVERT(vlan1033) fc00:33::
Enter MAC address (blank to exit): dc-da-4d-fc-ad-2d
dnsmasq: inotify, new or changed file /distro_infra/dnsmasq_conf.d/dc-da-4d-fc-ad-2d
dnsmasq-dhcp: read /distro_infra/dnsmasq_conf.d/dc-da-4d-fc-ad-2d
Enter MAC address (blank to exit):
```
Reboot fboss103 here
```
>>Checking Media Presence......
>>Media Present......
>>Start PXE over IPv4 on MAC: DC-DA-4D-FC-AD-2D. Press ESC key to abort PXE boot.
Station IP address is 10.250.33.2
Server IP address is 10.250.33.194
NBP filename is ipxe.efi
NBP filesize is 1052160 Bytes
>>Checking Media Presence......
>>Media Present......
Downloading NBP file...
NBP file downloaded successfully.
iPXE initialising devices...
autoexec.ipxe... ok
iPXE 1.21.1+ (gfcfa0) -- Open Source Network Boot Firmware -- https://ipxe.org
Features: DNS HTTP iSCSI TFTP VLAN SRP AoE EFI Menu
Configuring (net0 dc:da:4d:fc:ad:2d)...... ok
http://10.250.33.194:6969/dc-da-4d-fc-ad-2d/pxeboot.initrd... ok
tftp://10.250.33.194/pxeboot.kernel... ok
EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path
EFI stub: Measured initrd data into PCR 9
[ 0.000000] Linux version 5.14.0-626.el9.x86_64...
```
Then the PXE installer runs. The Distro Infrastructure output during
this period is:
```
dnsmasq-dhcp: RTR-ADVERT(vlan1033) fc00:33::
dnsmasq-dhcp: RTR-ADVERT(vlan1033) fc00:33::
dnsmasq-dhcp: DHCPSOLICIT(vlan1033) 00:02:00:00:ab:11:ea:34:3d:47:ca:ee:d2:07
dnsmasq-dhcp: DHCPREPLY(vlan1033) 00:02:00:00:ab:11:ea:34:3d:47:ca:ee:d2:07 no addresses available
dnsmasq-dhcp: RTR-ADVERT(vlan1033) fc00:33::
dnsmasq-dhcp: DHCPSOLICIT(vlan1033) 00:01:00:01:2e:30:1a:70:dc:da:4d:fc:ad:2d
dnsmasq-dhcp: DHCPADVERTISE(vlan1033) fc00:33::fb05:50dc:b9f7 00:01:00:01:2e:30:1a:70:dc:da:4d:fc:ad:2d
dnsmasq-dhcp: DHCPREQUEST(vlan1033) 00:01:00:01:2e:30:1a:70:dc:da:4d:fc:ad:2d
dnsmasq-dhcp: DHCPREPLY(vlan1033) fc00:33::fb05:50dc:b9f7 00:01:00:01:2e:30:1a:70:dc:da:4d:fc:ad:2d
dnsmasq-dhcp: RTR-SOLICIT(vlan1033)
dnsmasq-dhcp: RTR-ADVERT(vlan1033) fc00:33::
dnsmasq-dhcp: DHCPSOLICIT(vlan1033) 00:01:00:01:2e:30:1a:70:dc:da:4d:fc:ad:2d
dnsmasq-dhcp: DHCPADVERTISE(vlan1033) fc00:33::fb05:50a2:9696 00:01:00:01:2e:30:1a:70:dc:da:4d:fc:ad:2d
dnsmasq-dhcp: DHCPREQUEST(vlan1033) 00:01:00:01:2e:30:1a:70:dc:da:4d:fc:ad:2d
dnsmasq-dhcp: DHCPREPLY(vlan1033) fc00:33::fb05:50a2:9696 00:01:00:01:2e:30:1a:70:dc:da:4d:fc:ad:2d
dnsmasq-dhcp: DHCPRELEASE(vlan1033) 00:01:00:01:2e:30:1a:70:dc:da:4d:fc:ad:2d
dnsmasq-tftp: error 8 User aborted the transfer received from fc00:33::fb05:50dc:b9f7
dnsmasq-tftp: sent /distro_infra/persistent/dc-da-4d-fc-ad-2d/ipxev6.efi to fc00:33::fb05:50dc:b9f7
dnsmasq-tftp: sent /distro_infra/persistent/dc-da-4d-fc-ad-2d/ipxev6.efi to fc00:33::fb05:50dc:b9f7
dnsmasq-dhcp: DHCPRELEASE(vlan1033) 00:01:00:01:2e:30:1a:70:dc:da:4d:fc:ad:2d
dnsmasq-dhcp: RTR-SOLICIT(vlan1033) dc:da:4d:fc:ad:2d
dnsmasq-dhcp: RTR-ADVERT(vlan1033) fc00:33::
dnsmasq-dhcp: DHCPSOLICIT(vlan1033) 00:04:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
dnsmasq-dhcp: DHCPADVERTISE(vlan1033) fc00:33::fb05:50f5:dfc9 00:04:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
dnsmasq-dhcp: DHCPREQUEST(vlan1033) 00:04:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
dnsmasq-dhcp: DHCPREPLY(vlan1033) fc00:33::fb05:50f5:dfc9 00:04:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
dnsmasq-tftp: sent /distro_infra/persistent/dc-da-4d-fc-ad-2d/ipxev6.efi-serverip to fc00:33::fb05:50f5:dfc9
dnsmasq-tftp: sent /distro_infra/persistent/dc-da-4d-fc-ad-2d/autoexec.ipxe to fc00:33::fb05:50f5:dfc9
dnsmasq-dhcp: RTR-SOLICIT(vlan1033)
dnsmasq-dhcp: RTR-ADVERT(vlan1033) fc00:33::
dnsmasq-dhcp: DHCPSOLICIT(vlan1033) 00:04:62:19:3e:08:1d:5a:56:77:93:71:a4:d7:25:6f:4c:de
dnsmasq-dhcp: DHCPREPLY(vlan1033) fc00:33::fb05:50b2:2cbb 00:04:62:19:3e:08:1d:5a:56:77:93:71:a4:d7:25:6f:4c:de
dnsmasq-tftp: sent /distro_infra/persistent/dc-da-4d-fc-ad-2d/pxeboot_complete to fc00:33::fb05:50f5:dfc9
dnsmasq-dhcp: read /distro_infra/dnsmasq_conf.d/default_ignore
dc-da-4d-fc-ad-2d PXE booted, disabling future PXE boot provisioning
```
Critical is the line `dc-da-4d-fc-ad-2d PXE booted, disabling future PXE
boot provisioning`, which indicates that PXE boot has been detected as
complete and will not be offered to future boots.
Subsequent reboots of fboss103 time-out when attempting PXE boot and
boot off the NVME instead.
IPv6 works almost identically expect for downloads of the additional
autoipv6.ipxe script.
3acfe8d to
f946e39
Compare
<!-- Thanks for submitting a pull request! We appreciate you spending the time to work on these changes. Please provide enough information so that others can review your pull request. --> **Pre-submission checklist** - [x] I've ran the linters locally and fixed lint errors related to the files I modified in this PR. You can install the linters by running `pip install -r requirements-dev.txt && pre-commit install` - [x] `pre-commit run` <!-- Explain the motivation for making this change and any other context that you think would help reviewers of your code. What existing problem does the pull request solve? --> Here a second output format for Kiwi-NG from the same template is added. This "tbz" rootfs tarball is then wrapped in an ONIE installer. The ONIE installer is based on the Demo OS ONIE installer from the ONIE project with a few modernizations. In particular zstd compression is used, requiring that a statically compiled zstd binary be included. Doing this dramatically improves decompression speed and therefore install time versus the best ONIE-native xz. The difference does not show up yet because the FBOSS binaries are not installed so the image size is still small. During internal testing with FBOSS binaries and xz, the installation time approached 10 minutes. zstd is expected to perform much better. <!-- Demonstrate the code is solid. Example: The exact commands you ran and their output, screenshots / videos if the pull request changes the user interface. How exactly did you verify that your PR solves the issue you wanted to solve? --> <!-- If a relevant Github issue exists for this PR, please make sure you link that issue to this PR --> Loaded this on a TH5-based ONIE switch internally and manually verified that it booted correctly.
f946e39 to
cb9c221
Compare
Contributor
Author
|
This is a stacked PR, the real change is in cb9c221 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pre-submission checklist
pip install -r requirements-dev.txt && pre-commit installpre-commit runSummary
Here a second output format for Kiwi-NG from the same template is
added. This "tbz" rootfs tarball is then wrapped in an ONIE installer.
The ONIE installer is based on the Demo OS ONIE installer from the
ONIE project with a few modernizations.
In particular zstd compression is used, requiring that a statically
compiled zstd binary be included. Doing this dramatically improves
decompression speed and therefore install time versus the best
ONIE-native xz. The difference does not show up yet because the FBOSS
binaries are not installed so the image size is still small. During
internal testing with FBOSS binaries and xz, the installation time
approached 10 minutes. zstd is expected to perform much better.
Test Plan
Loaded this on a TH5-based ONIE switch internally and manually
verified that it booted correctly.