Skip to content

Conversation

@travisb-nexthop
Copy link
Contributor

Pre-submission checklist

  • I've ran the linters locally and fixed lint errors related to the files I modified in this PR. You can install the linters by running pip install -r requirements-dev.txt && pre-commit install
  • pre-commit run

Summary

Here a second output format for Kiwi-NG from the same template is
added. This "tbz" rootfs tarball is then wrapped in an ONIE installer.

The ONIE installer is based on the Demo OS ONIE installer from the
ONIE project with a few modernizations.

In particular zstd compression is used, requiring that a statically
compiled zstd binary be included. Doing this dramatically improves
decompression speed and therefore install time versus the best
ONIE-native xz. The difference does not show up yet because the FBOSS
binaries are not installed so the image size is still small. During
internal testing with FBOSS binaries and xz, the installation time
approached 10 minutes. zstd is expected to perform much better.

Test Plan

Loaded this on a TH5-based ONIE switch internally and manually
verified that it booted correctly.

[Nexthop] Fix linting of Distro Image Shell scripts

The newly introduced shell linter causes the trivial rebasing of the
later Distro Image PRs to fail with many lint errors.

Rather than adding a lot of noise to those PRs, re-lint new shell
scripts separately.

There are two functional fixes in build_image.sh here which seem to
have been broken by previous linting efforts:

1. DOCKER_BUILD_ARGS optionally constructs arguments for `docker
   build`, it must not be quoted when used on the command line or
   docker will receive an invalid option (because it contains literal
   spaces).

2. DOCKER_ARGS is similarly constructed to factor out common options
   between the interactive and non-interactive cases. It must also not
   be quoted.

All other changes are automatically applied by pre-commit.
[Nexthop] Build Distro Image in CI

<!-- Thanks for submitting a pull request! We appreciate you spending
the time to work on these changes. Please provide enough information so
that others can review your pull request. -->

**Pre-submission checklist**
- [x] I've ran the linters locally and fixed lint errors related to the
files I modified in this PR. You can install the linters by running `pip
install -r requirements-dev.txt && pre-commit install`
- [x] `pre-commit run`

<!-- Explain the motivation for making this change and any other context
that you think would help reviewers of your code. What existing problem
does the pull request solve? -->

This is the start of the Distro Image CI where the Distro Image is
built. This simply invokes the `fboss-image build` command against the
sample `from_source.json` image manifest.

The resulting artifacts are not preserved in this PR.

As part of this it is necessary to remove `-it` from the docker command
because GHA's don't have a stdin. Further the sample image manifest now
requires all components be present, so add them.

<!-- Demonstrate the code is solid. Example: The exact commands you ran
and their output, screenshots / videos if the pull request changes the
user interface. How exactly did you verify that your PR solves the issue
you wanted to solve? -->

<!-- If a relevant Github issue exists for this PR, please make sure you
link that issue to this PR -->

Look at the new workflow in this PR.
[Nexthop] Add FBOSS image builder CLI skeleton

**Pre-submission checklist**
- [X] I've ran the linters locally and fixed lint errors related to the
files I modified in this PR. You can install the linters by running `pip
install -r requirements-dev.txt && pre-commit install`
- [X] `pre-commit run`

Foundation for manifest-driven FBOSS distro image builder.

This PR adds a CLI tool (`fboss-image`) that provides the framework for
building FBOSS distribution images based on JSON manifests. The
implementation uses only Python standard library.

**Key components:**
- Basic CLI framework using standard library
- Manifest parser for JSON-based image definitions
- Build orchestration with defined component ordering
- Stub Device commands
- Pre-commit linting

    cd fboss-image/distro_cli
    python3 -m unittest discover -s tests -p '*_test.py'
[Nexthop] Add PXE-boot distro image export

<!-- Thanks for submitting a pull request! We appreciate you spending
the time to work on these changes. Please provide enough information so
that others can review your pull request. -->

**Pre-submission checklist**
- [X] I've ran the linters locally and fixed lint errors related to the
files I modified in this PR. You can install the linters by running `pip
install -r requirements-dev.txt && pre-commit install`
- [X] `pre-commit run`

<!-- Explain the motivation for making this change and any other context
that you think would help reviewers of your code. What existing problem
does the pull request solve? -->

Kiwi-NG already builds the PXE boot installer, we just need to move it
into place.

As part of this I removed the final ls from
build_image_in_container.sh because it shows file paths which aren't
relevant to users and so might be confusing.

<!-- Demonstrate the code is solid. Example: The exact commands you ran
and their output, screenshots / videos if the pull request changes the
user interface. How exactly did you verify that your PR solves the issue
you wanted to solve? -->

<!-- If a relevant Github issue exists for this PR, please make sure you
link that issue to this PR -->

Run `fboss-image build from_source.json` and see that the new file
`fboss-distro-image_pxe.tar` is created successfully.
@meta-cla meta-cla bot added the CLA Signed label Dec 19, 2025
[Nexthop] Distro Infrastructure container PXE-boot MVP

<!-- Thanks for submitting a pull request! We appreciate you spending
the time to work on these changes. Please provide enough information so
that others can review your pull request. -->

**Pre-submission checklist**
- [x] I've ran the linters locally and fixed lint errors related to the
files I modified in this PR. You can install the linters by running `pip
install -r requirements-dev.txt && pre-commit install`
- [x] `pre-commit run`

<!-- Explain the motivation for making this change and any other context
that you think would help reviewers of your code. What existing problem
does the pull request solve? -->

Here the minimum viable Distro Infrastucture container needed to support
IPv4 and IPv6 PXE boot is added. IPv4 expects a DHCP server to exist
on the network to provide IPv4 addresses to the switch. IPv6 defaults
to supply its own DHCPv6 server on the L2 segment, but that can be
disabled.

This is a self-contained, interactive docker container which uses
Proxy DHCP (IPv4) or DHCPv6 (IPv6) to direct PXE-booting devices to
the container's TFTP server and web server.

iPXE is used to support loading the relatively large initrd image over
HTTP instead of TFTP and to support supplying changeable arguments to
the installer initrd. Currently these are hardcoded into autoexec.ipxe,
but future changes might autogenerate this file based on the needs of the
particular PXE installer.

For usage details, see the included README.md. As this is a MVP, those
instructions must be followed to the letter. Future work will
integrate with the fboss-image tool to drive the Distro Infra container
in a more user-friendly way.

Once PXE boot has completed, the MAC is made ineligible for PXE booting
again until reconfigured. This is to support PXE installing, then
booting off the internal drive for every subsequent boot until
PXE-booting is explicitly requested again.

Under IPv4, the boot flow with iPXE is simple because iPXE receives
the next-server IP address. The IPv4 boot flow looks like:

1. BIOS
2. iPXE
3. tftp://next-server/autoexec.ipxe
4. http://next-server/FBOSS-Distro-Image.{kernel,initrd}
5. http://next-server/FBOSS-Distro-Image.xz

Unfortunately IPv6 is more complicated. iPXE does not receive
next-server or anything like it under IPv6, so we cannot follow that
simple flow.

Further, iPXE by default tries to autoconfigure its network interface
with IPv4 first then IPv6. Thus if the network were configured to
support both IPv4 PXE boot and IPv6 PXE boot (the Distro
Infrastructure default), while the BIOS would load iPXE over IPv6,
iPXE would load the PXE installer over IPv4. This protocol switching
is not satisfactory testing.

To resolve these two problems, we separate iPXE into IPv4 and IPv6
versions. The IPv4 version operates as above. The IPv6 version
uses two intermediate scripts to insert the server_ip configuration.
The boot flow for IPv6 is:

1. BIOS
2. iPXEv6
3. Script embedded inside iPXEv6 which forces IPv6 and 'sources' a
   generated script -serverip
4. -serverip, a generated script sets the server_ip variable before
   passing control onto tftp://server-ip/autoexec.ipxe shared with
   IPv4
5. tftp://next-server/autoexec.ipxe
6. http://next-server/FBOSS-Distro-Image.{kernel,initrd}
7. http://next-server/FBOSS-Distro-Image.xz

To support both paths with a common autoexec.ipxe, host-server is used
as server_ip when executing under IPv4.

<!-- Demonstrate the code is solid. Example: The exact commands you ran
and their output, screenshots / videos if the pull request changes the
user interface. How exactly did you verify that your PR solves the issue
you wanted to solve? -->

<!-- If a relevant Github issue exists for this PR, please make sure you
link that issue to this PR -->

Only manual, happy path is tested.

This has been tested manually against fboss103. Under IPv4 that test
output is:
```
ds103:#s-image/distro_infra $ ./build.sh && ./distro_infra.sh --intf vlan1033 --persist-dir data
...
 => exporting to image                                                                                                                                                                                                              0.7s
 => => exporting layers                                                                                                                                                                                                             0.6s
 => => writing image sha256:27dec285715ddfc30a692a4fee1cb34f79a02e581df34801a8a0330e256cf0c9                                                                                                                                        0.0s
 => => naming to docker.io/library/fboss_distro_infra                                                                                                                                                                               0.0s
Listening on vlan1033 - 10.250.33.194 & fc00:33::89
dnsmasq: started, version 2.85 DNS disabled
dnsmasq: compile time options: IPv6 GNU-getopt DBus no-UBus no-i18n IDN2 DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth cryptohash DNSSEC loop-detect inotify dumpfile
dnsmasq-dhcp: DHCP, proxy on subnet 10.250.33.194
dnsmasq-dhcp: DHCPv6, IP range ::fb05:5000:1 -- ::fb05:50ff:ffff, lease time 5m, template for vlan1033
dnsmasq-dhcp: router advertisement on vlan1033
dnsmasq-dhcp: DHCPv6, IP range fc00:33::fb05:5000:1 -- fc00:33::fb05:50ff:ffff, lease time 5m, constructed for vlan1033
dnsmasq-dhcp: router advertisement on fc00:33::, constructed for vlan1033
dnsmasq-dhcp: RTR-ADVERT(vlan1033) fc00:33::
dnsmasq-dhcp: IPv6 router advertisement enabled
dnsmasq-dhcp: DHCP, sockets bound exclusively to interface vlan1033
dnsmasq-tftp: TFTP root is /distro_infra/persistent secure mode
dnsmasq-dhcp: read /distro_infra/dnsmasq_conf.d/default_ignore
dnsmasq-dhcp: RTR-ADVERT(vlan1033) fc00:33::
dnsmasq-dhcp: RTR-ADVERT(vlan1033) fc00:33::
dnsmasq-dhcp: RTR-ADVERT(vlan1033) fc00:33::
Enter MAC address (blank to exit): dc-da-4d-fc-ad-2d
dnsmasq: inotify, new or changed file /distro_infra/dnsmasq_conf.d/dc-da-4d-fc-ad-2d
dnsmasq-dhcp: read /distro_infra/dnsmasq_conf.d/dc-da-4d-fc-ad-2d
Enter MAC address (blank to exit):
```

Reboot fboss103 here

```
>>Checking Media Presence......
>>Media Present......
>>Start PXE over IPv4 on MAC: DC-DA-4D-FC-AD-2D. Press ESC key to abort PXE boot.
  Station IP address is 10.250.33.2

  Server IP address is 10.250.33.194
  NBP filename is ipxe.efi
  NBP filesize is 1052160 Bytes

>>Checking Media Presence......
>>Media Present......
 Downloading NBP file...

  NBP file downloaded successfully.
iPXE initialising devices...
autoexec.ipxe... ok

iPXE 1.21.1+ (gfcfa0) -- Open Source Network Boot Firmware -- https://ipxe.org
Features: DNS HTTP iSCSI TFTP VLAN SRP AoE EFI Menu
Configuring (net0 dc:da:4d:fc:ad:2d)...... ok
http://10.250.33.194:6969/dc-da-4d-fc-ad-2d/pxeboot.initrd... ok
tftp://10.250.33.194/pxeboot.kernel... ok
EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path
EFI stub: Measured initrd data into PCR 9
[    0.000000] Linux version 5.14.0-626.el9.x86_64...
```

Then the PXE installer runs. The Distro Infrastructure output during
this period is:

```
dnsmasq-dhcp: RTR-ADVERT(vlan1033) fc00:33::
dnsmasq-dhcp: RTR-ADVERT(vlan1033) fc00:33::
dnsmasq-dhcp: DHCPSOLICIT(vlan1033) 00:02:00:00:ab:11:ea:34:3d:47:ca:ee:d2:07
dnsmasq-dhcp: DHCPREPLY(vlan1033) 00:02:00:00:ab:11:ea:34:3d:47:ca:ee:d2:07 no addresses available
dnsmasq-dhcp: RTR-ADVERT(vlan1033) fc00:33::
dnsmasq-dhcp: DHCPSOLICIT(vlan1033) 00:01:00:01:2e:30:1a:70:dc:da:4d:fc:ad:2d
dnsmasq-dhcp: DHCPADVERTISE(vlan1033) fc00:33::fb05:50dc:b9f7 00:01:00:01:2e:30:1a:70:dc:da:4d:fc:ad:2d
dnsmasq-dhcp: DHCPREQUEST(vlan1033) 00:01:00:01:2e:30:1a:70:dc:da:4d:fc:ad:2d
dnsmasq-dhcp: DHCPREPLY(vlan1033) fc00:33::fb05:50dc:b9f7 00:01:00:01:2e:30:1a:70:dc:da:4d:fc:ad:2d
dnsmasq-dhcp: RTR-SOLICIT(vlan1033)
dnsmasq-dhcp: RTR-ADVERT(vlan1033) fc00:33::
dnsmasq-dhcp: DHCPSOLICIT(vlan1033) 00:01:00:01:2e:30:1a:70:dc:da:4d:fc:ad:2d
dnsmasq-dhcp: DHCPADVERTISE(vlan1033) fc00:33::fb05:50a2:9696 00:01:00:01:2e:30:1a:70:dc:da:4d:fc:ad:2d
dnsmasq-dhcp: DHCPREQUEST(vlan1033) 00:01:00:01:2e:30:1a:70:dc:da:4d:fc:ad:2d
dnsmasq-dhcp: DHCPREPLY(vlan1033) fc00:33::fb05:50a2:9696 00:01:00:01:2e:30:1a:70:dc:da:4d:fc:ad:2d
dnsmasq-dhcp: DHCPRELEASE(vlan1033) 00:01:00:01:2e:30:1a:70:dc:da:4d:fc:ad:2d
dnsmasq-tftp: error 8 User aborted the transfer received from fc00:33::fb05:50dc:b9f7
dnsmasq-tftp: sent /distro_infra/persistent/dc-da-4d-fc-ad-2d/ipxev6.efi to fc00:33::fb05:50dc:b9f7
dnsmasq-tftp: sent /distro_infra/persistent/dc-da-4d-fc-ad-2d/ipxev6.efi to fc00:33::fb05:50dc:b9f7
dnsmasq-dhcp: DHCPRELEASE(vlan1033) 00:01:00:01:2e:30:1a:70:dc:da:4d:fc:ad:2d
dnsmasq-dhcp: RTR-SOLICIT(vlan1033) dc:da:4d:fc:ad:2d
dnsmasq-dhcp: RTR-ADVERT(vlan1033) fc00:33::
dnsmasq-dhcp: DHCPSOLICIT(vlan1033) 00:04:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
dnsmasq-dhcp: DHCPADVERTISE(vlan1033) fc00:33::fb05:50f5:dfc9 00:04:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
dnsmasq-dhcp: DHCPREQUEST(vlan1033) 00:04:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
dnsmasq-dhcp: DHCPREPLY(vlan1033) fc00:33::fb05:50f5:dfc9 00:04:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
dnsmasq-tftp: sent /distro_infra/persistent/dc-da-4d-fc-ad-2d/ipxev6.efi-serverip to fc00:33::fb05:50f5:dfc9
dnsmasq-tftp: sent /distro_infra/persistent/dc-da-4d-fc-ad-2d/autoexec.ipxe to fc00:33::fb05:50f5:dfc9
dnsmasq-dhcp: RTR-SOLICIT(vlan1033)
dnsmasq-dhcp: RTR-ADVERT(vlan1033) fc00:33::
dnsmasq-dhcp: DHCPSOLICIT(vlan1033) 00:04:62:19:3e:08:1d:5a:56:77:93:71:a4:d7:25:6f:4c:de
dnsmasq-dhcp: DHCPREPLY(vlan1033) fc00:33::fb05:50b2:2cbb 00:04:62:19:3e:08:1d:5a:56:77:93:71:a4:d7:25:6f:4c:de
dnsmasq-tftp: sent /distro_infra/persistent/dc-da-4d-fc-ad-2d/pxeboot_complete to fc00:33::fb05:50f5:dfc9
dnsmasq-dhcp: read /distro_infra/dnsmasq_conf.d/default_ignore
dc-da-4d-fc-ad-2d PXE booted, disabling future PXE boot provisioning
```

Critical is the line `dc-da-4d-fc-ad-2d PXE booted, disabling future PXE
boot provisioning`, which indicates that PXE boot has been detected as
complete and will not be offered to future boots.

Subsequent reboots of fboss103 time-out when attempting PXE boot and
boot off the NVME instead.

IPv6 works almost identically expect for downloads of the additional
autoipv6.ipxe script.
<!-- Thanks for submitting a pull request! We appreciate you spending
the time to work on these changes. Please provide enough information so
that others can review your pull request. -->

**Pre-submission checklist**
- [x] I've ran the linters locally and fixed lint errors related to the
files I modified in this PR. You can install the linters by running `pip
install -r requirements-dev.txt && pre-commit install`
- [x] `pre-commit run`

<!-- Explain the motivation for making this change and any other context
that you think would help reviewers of your code. What existing problem
does the pull request solve? -->

Here a second output format for Kiwi-NG from the same template is
added. This "tbz" rootfs tarball is then wrapped in an ONIE installer.

The ONIE installer is based on the Demo OS ONIE installer from the
ONIE project with a few modernizations.

In particular zstd compression is used, requiring that a statically
compiled zstd binary be included.  Doing this dramatically improves
decompression speed and therefore install time versus the best
ONIE-native xz. The difference does not show up yet because the FBOSS
binaries are not installed so the image size is still small. During
internal testing with FBOSS binaries and xz, the installation time
approached 10 minutes. zstd is expected to perform much better.

<!-- Demonstrate the code is solid. Example: The exact commands you ran
and their output, screenshots / videos if the pull request changes the
user interface. How exactly did you verify that your PR solves the issue
you wanted to solve? -->

<!-- If a relevant Github issue exists for this PR, please make sure you
link that issue to this PR -->

Loaded this on a TH5-based ONIE switch internally and manually
verified that it booted correctly.
@travisb-nexthop travisb-nexthop marked this pull request as ready for review December 19, 2025 17:13
@travisb-nexthop
Copy link
Contributor Author

This is a stacked PR, the real change is in cb9c221

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant