Because we are going to build a bunch of autotools based packages, I am going to use two simple helper functions to make things a lot easier.
For simplicity, I added a script called util.sh that already contains those
functions. You can simply source it in your shell using . <path>/util.sh.
The first helper function runs the configure script for us with some
default options:
run_configure() {
"$srcdir/configure" --host="$TARGET" --prefix="" --sbindir=/bin \
--includedir=/usr/include --datarootdir=/usr/share\
--libexecdir=/lib/libexec --disable-static \
--enable-shared $@
}
The host-touple is set to our target machine touple, the prefix is left empty,
which means install everything into / of our target filesystem.
In case the package wants to install programs into /sbin, we explicitly tell
it that sbin-programs go into /bin.
However, despite having / as our prefix, we want headers to go
into /usr/include and any possible data files into /usr/share.
If a package wants to install helper programs that are regular executables, but
not intended to be used by a user on the command line, those are usually
installed into a libexec sub directory. We explicitly tell the configure
script to install those in the historic /lib/libexec location instead of
clobbering the filesystem root with an extra directory.
The last two switches --disable-static and --enable-shared tell any libtool based packages to prefer building shared libraries over static ones.
If a package doesn't use libtool and maybe doesn't even install libraries, it will simply issue a warning that it doesn't know those switches, but will otherwise ignore them and compile just fine.
The $@ at the end basically paste any arguments passed to this function, so we
can still pass along package specific configure switches.
The second function encapsulates the entire dance for building a package that we already did several times:
auto_build() {
local pkgname="$1"
shift
mkdir -p "$BUILDROOT/build/$pkgname"
cd "$BUILDROOT/build/$pkgname"
srcdir="$BUILDROOT/src/$pkgname"
run_configure $@
make -j `nproc`
make DESTDIR="$SYSROOT" install
cd "$BUILDROOT"
}
The package name is specified as first argument, the remaining arguments are
passed to run_configure, again using $@. The shift command removes the
first argument, so we can do this without passing the package name along.
Another noticable difference is the usage of nproc to determine the number of
available CPU cores and pass it to make -j to speed up the build.
We will build a bunch of packages that either provide libraries, or depend on libraries provided by other packages. This also means that the programs that require libraries need a way to locate them, i.e. find out what compiler flags to add in order to find the headers and what linker flags to add in order to actually link against a library. Especially since a library may itself have dependencies that the program needs to link against.
The pkg-config program tries to
provide a unified solution for this problem. Packages that provide a library
can install a configuration file at a special location (in our
case $SYSROOT/lib/pkgconfig) and packages that need a library can
use pkg-config to query if the library is present, and what compiler/linker
flags are required to use it.
With most autotools based packages, this luckily isn't that much of an issue.
Most of them use standard macros for querying pkg-config that automagically
generate configure flags and variables that can be used to override the
results from pkg-config.
We are basically going to use the pkg-config version installed on our build
system and just need to set a few environment variables to instruct it to look
at the right place instead of the system defaults. The util.sh script also
sets those:
export PKG_CONFIG_SYSROOT_DIR="$SYSROOT"
export PKG_CONFIG_LIBDIR="$SYSROOT/lib/pkgconfig"
export PKG_CONFIG_PATH="$SYSROOT/lib/pkgconfig"
The later two are actually both paths where pkg-config looks for its input
files, the LIBDIR has nothing to do with shared libraries. Setting
the PKG_CONFIG_SYSROOT_DIR variable instructs pkg-config to re-target any
paths it generates to point to the correct root filesystem directory.
The GNU libtool is a wrapper script for building shared libraries in an
autotools project and takes care of providing reasonable fallbacks on ancient
Unix systems where this works either qurky or not at all.
At the same time, it also tries to serve a similar purpose as pkg-config and
install libtool archives (with an .la file extension) to the /lib
directory. However, the way it does this is horribly broken and when using
make install with a DESTDIR argument, it will always store full path in
the .la file, which breaks the linking step of any other libtool based
package that happens to pick it up.
We will go with the easy solution and simply delete those files when they are
installed. Since we have pkg-config and a flat /lib directory, most
packages will find their libraries easily (except for ncurses, where,
historical reason, some programs stubbornly try their own way for).
In order to build Bash, we first build two support libraries: ncurses
and readline.
The ncurses library is itself an extension/reimplementation of the pcurses
and curses libraries dating back to the System V and BSD era. At the time
that they were around, the landscape of terminals was much more diverse,
compared to nowadays, where everyone uses a different variant of Frankensteins
DEC VTxxx emulator.
As a result curses internally used the termcap library for handling terminal
escape codes, which was later replaced with terminfo. Both of those libraries
have their own configuration formats for storing terminal capabilities.
The ncurses library can work with both termcap and terminfo files and
thankfully provides the later for a huge number of terminals.
We can build ncurses as follows:
auto_build "ncurses-6.2" --disable-stripping --with-shared --without-debug \
--enable-pc-files --with-pkg-config-libdir=/lib/pkgconfig \
--enable-widec --with-termlib
A few configure clutches are required, because it is one of those packages where the maintainer tried to be clever and work around autotools semantics, so we have to explicitly tell it to not strip debug symbols when installing the binaries and explicitly tell it to generate shared libraries. We also need to tell it to generate pkg-config files and where to put them.
If the --disable-stripping flag wasn't set, the make install would fail
later, because it would try to strip the debug information using the host
systems strip program, which will choke on the ARM binaries.
The --enable-widec flag instructs the build system to generate an ncurses
version with multi byte, wide character unicode support. The --with-termlib
switch instructs it to generate build the terminfo library as a separate
library instead of having it built into ncurses.
In addition to a bunch of libraries and header files, ncurses also installs
a few programs for handling terminfo files, such as the terminfo
compiler tic or the tset program for querying the database
(e.g. tput reset fully resets your terminal back into a sane state, in a way
that is supported by your terminal).
It also installs the terminal database files into $SYSROOT/usr/share/terminfo
and $SYSROOT/usr/share/tabset, as well as a hand full of man pages
to $SYSROOT/usr/share/man. For historical/backwards compatibillity reasons,
a symlink to the terminfo directory is also added to $SYSROOT/lib.
Because we installed the version with wide charater and unicode support, the
libraries that ncurses installs all have a w suffix at the end (as well as
the pkg-config files it installs), so it's particularly hard for other
programs to find the libraries.
So we add a bunch of symlinks for the programs that don't bother to check both:
ln -s "$SYSROOT/lib/pkgconfig/formw.pc" "$SYSROOT/lib/pkgconfig/form.pc"
ln -s "$SYSROOT/lib/pkgconfig/menuw.pc" "$SYSROOT/lib/pkgconfig/menu.pc"
ln -s "$SYSROOT/lib/pkgconfig/ncursesw.pc" "$SYSROOT/lib/pkgconfig/ncurses.pc"
ln -s "$SYSROOT/lib/pkgconfig/panelw.pc" "$SYSROOT/lib/pkgconfig/panel.pc"
ln -s "$SYSROOT/lib/pkgconfig/tinfow.pc" "$SYSROOT/lib/pkgconfig/tinfo.pc"
In addition to pkg-config scripts, the ncurses package also provides a
shell script that tries to implement the same functionality.
Some packages (like util-linux) try to run that first and may accidentally
pick up a version you have installed on your host system.
So we copy that over from the $SYSROOT/bin directory to $TCDIR/bin:
cp "$SYSROOT/bin/ncursesw6-config" "$TCDIR/bin/ncursesw6-config"
cp "$SYSROOT/bin/ncursesw6-config" "$TCDIR/bin/$TARGET-ncursesw6-config"
Having built the ncurses library, we can now build the readline library,
which implements a highly configurable command prompt with a search-able
history, auto completion and Emacs style key bindings.
Unlike ncurses that it is based on, it does not requires any special
configure flags:
auto_build "readline-8.1"
It only installs the library (libreadline and libhistory) and headers
for the libraries, but no programs.
A bunch of documentation files are installed to $SYSROOT/usr/share, including
not only man pages, but info pages, plain text documentation in a doc sub
directory.
For Bash itself, I used only two extra configure flags:
auto_build "bash-5.1" --without-bash-malloc --with-installed-readline
The --without-bash-malloc flag tells it to use the standard libc malloc
instead of its own implementation and the --with-installed-readline flag
tells it to use the readline library that we just installed, instead of an
outdated, internal stub implementation.
The make install step installs bash to $SYSROOT/bin/bash, but also adds an
additional script $SYSROOT/bin/bashbug which is intended to report bugs you
may encounter back to the bash developers.
Bash has a plugin style support for builtin commands, which in can load as
libraries from a special directory (in our case $SYSROOT/lib/bash). So it will
install a ton of builtins there by default, as well as development headers
in $SYSROOT/usr/include that for third party packages that implement their own
builtins.
Just like readline, Bash brings along a bunch of documentation that it installs
to $SYSROOT/usr/share. Namely HTML documentation in doc, more info pages
and man pages.
Bash is also the first program we build that installs localized text messages
in the $SYSROOT/usr/share/locale directory.
Of course we are no where near done with Bash yet, as we still have some plumbing to do regarding the bash start-up script, but we will get back to that later when we put everything together.
The standard comamnd line programs, such as ls, cat, rm and many more are
provided by the GNU Core Utilities
package.
It is fairly simple to build and install:
auto_build "coreutils-8.32" --enable-single-binary=symlinks
The --enable-single-binary configure switch is a relatively recent feature that instructs coreutils to build a single, monolithic binary in the same style as BusyBox and install a bunch of symlinks to it, instead of installing dozens of separate programs.
A few additional, very useful programs are provided by the GNU diffutils and GNU findutils.
Namely, the diffutils provide diff, patch and cmp, while the findutils
provide find, locate, updatedb and xargs.
The GNU implementation of the grep program, as well as the less pager are
packaged separately.
All of those are relatively easy to build and install:
auto_build "diffutils-3.7"
auto_build "findutils-4.8.0"
auto_build "grep-3.6"
auto_build "less-563"
Again those packages also install a bunch of documentation in the /usr/share
directory in addition to the program binaries.
Coreutils installs a helper library in $SYSROOT/lib/libexec/coretuils and
findutils installs a libexec helper program as well (frcode).
The findutils package also creates an empty $SYSROOT/var directory, because
this is where locate expects to find a filesystem index database that can be
generated using updatedb.
We have already used sed, the stream editor for simple file substitution
in the previous sections for building the kernel and BusyBox.
AWK is a much more advanced stream editing program that actually implements a powerful scripting language for the purpose.
For both programs, we will use the GNU implementations (the GNU AWK
implementation is called gawk).
Both are fairly straight forward to build for our system:
auto_build "sed-4.8"
auto_build "gawk-5.1.0"
Again, we not only get the programs, but also a plethora of documentation and extra files.
In fact, gawk will install an entire AWK library in $SYSROOT/usr/share/awk as
well as a number of plugin libraries in $SYSROOT/lib/gawk, some helper
programs in $SYSROOT/lib/libexec/awk and necessary development headers to
build external plugins.
As the name suggests, the procps package supplies a few handy command line
programs for managing processes. More precisely, we install the following
list of programs:
freewatchwvmstatuptimetoptloadsysctlslabtoppwdxpspmappkillpidofpgrepvmstat
And their common helper library libprocps.so plus accompanying documentation.
The package would also supply the kill program, but we don't install that
here, sine it is also provided by the util-linux package and the later has a
few extra features.
Sadly, we don't get a propper release tarball for procps, but only a dump from
the git repository. Because the configure script and Makefile templates are
generated by autoconf and automake, they are not checked into the repository.
Many autotools based projects have a autogen.sh scrip that checks for the
required tools and takes care of generating the actual build system from the
configuration.
So the frist thing we do, is goto into the procps source tree and generate
the build system ourselves:
cd "$BUILDROOT/src/procps-v3.3.16"
./autogen.sh
cd "$BUILDROOT"
Now, we can build procps:
export ac_cv_func_malloc_0_nonnull=yes
export ac_cv_func_realloc_0_nonnull=yes
auto_build "procps-v3.3.16" --enable-watch8bit --disable-kill --with-gnu-ld
As you can see, some clutches are required here. First of all, the configure
script attempts to find out if malloc(0) and realloc(0) return NULL or
something else.
Technically, when trying to allocate 0 bytes of memory, the C standard permits
the standard library to return a NULL pointer instead of a valid pointer to
some place in memory. Some libraries like the widely used glibc opted to
instead return a valid point. Many programs pass a programatically generated
size to malloc and assume that NULL means "out of memory" or similar
dramatic failure, especially if glibc behaviour encourages this.
Instead of fixing this, some programs instead decided to add a compile time
check and add a wrappers for malloc, realloc and free if it
returns NULL. Since we are cross compiling, this check cannot be run. So
we set the result variables manually, the configure script "sees" that and
skips the check.
The --enable-watch8bit flag enables propper UTF-8 support for the watch
program, for which it requires ncursesw instead of regular ncurses (but
ironically this is one of the packages that fails without the symlink
of ncurses to ncursesw).
The other flag --disable-kill compiles the package without the kill program
for the reasons stated above and the final flag --with-gnu-ld tells it that
the linker is the GNU version of ld which it, by default, assumes to not be
the case.
The psmisc package contains a hand full of extra programs for process
management that aren't already in procps, namely it contains fuser,
killall, peekfd, prtstat and pstree.
Similar to procps, we need to generate the build system ourselves, but we will
also take the opertunity to apply some changes using sed:
cd "$BUILDROOT/src/psmisc-v22.21"
sed -i 's/ncurses/ncursesw/g' configure.ac
sed -i 's/tinfo/tinfow/g' configure.ac
sed -i "s#./configure \"\$@\"##" autogen.sh
./autogen.sh
cd "$BUILDROOT"
The frist two sed lines patch the configure script to try ncursesw
and tinfow instead of ncruses and tinfo respectively.
The third sed line makes sure that the autogen.sh does not run the
configure script once it is done.
With that done, we can now build psmisc with largely the similar fixes:
export ac_cv_func_malloc_0_nonnull=yes
export ac_cv_func_realloc_0_nonnull=yes
export CFLAGS="-O2 -include limits.h -include sys/sysmacros.h"
auto_build "psmisc-v22.21"
The first two are explained in the procps build, the final export CFLAGS
line passes some additional flags to the C compiler. The underlying problem
is that this release of psmisc uses the PATH_MAX macro from limits.h
in some places and the makedev macro from sys/sysmacros.h, but includes
neither of those headers. This works for them, because glibc includes those
from other headers that they include, but musl doesn't. Using the -incldue
option, we force gcc to include those headers before processign any C file.
When you are don with this, don't forget to
unset CFLAGS
Lastly, we get rid of the libtool archive installed by procps:
rm "$SYSROOT/lib/libprocps.la"
GNU nano is an ncurses based text editor that is fairly user friendly and
installed by default on many Debian based distributions. Being a GNU program,
it knows how to use autotools and is fairly simple to build and install:
auto_build "nano-5.6.1"
Along with the program itself it also installs a lot of scripts for syntax
highlighting. There is not much more to say here, other than maybe that it
has an easter egg, that we could disable through a configure flag.
On Unix-like systems tar, the tape archive program, is
basically the standard archival program.
The tar format itself is dead simple. Files are simply glued together with
a 512 byte header in front of every file and null bytes at the end to round
it up to a multiple of 512 bytes. Directories and the like simply use a single
header with no content following. This is also the reason why you can't create
an empty tar file: it would simply be an empty file. Also, tarballs have no
index. You cannot do random access and in order to unpack a single file,
the tar program has to scan across the entire file.
Tar itself doesn't do compression. When you see a compressed tarball, such
as a *.tar.gz file, it has been fed through a compression program like gzip
and must be uncompressed before the tar program can handle it. Programs
like gzip only do compression of individual files and have no idea what they
process. As a result, gzip will compress across tar headers and you really
need to unpack and scan the entire thing when you want to unpack a single file.
The tar program can actually work with compressed tar archives, but what
it does (or at least what GNU tar does) internally is, checking at run time if
the compressor program is available and starting it as a child process that it
through which it feeds the data.
Compression formats typically used together with tar are xz, gzip
and bzip2. Nowadays Zstd is also slowly gaining adoption.
The xz-uilities, GNU gzip and GNU tar are fairly simple to build, since they all use the GNU build system:
auto_build "xz-5.2.5"
auto_build "gzip-1.10"
auto_build "tar-1.34"
Of course, they will also install development headers, libraries, a bunch of
wrapper shell script that allow using grep, less or diff on compressed
files, and a lot of documentation.
The xz package installs a libtool archive that we simply remove:
rm "$SYSROOT/lib/liblzma.la"
The GNU tar package also installs a libexec helper called rmt,
the remote tape drive server.
Of course, the bzip2 program is a bit more involved, since it uses a custom
Makefile.
We first setup our build directory in the usual way:
mkdir -p "$BUILDROOT/build/bzip2-1.0.8"
cd "$BUILDROOT/build/bzip2-1.0.8"
srcdir="$BUILDROOT/src/bzip2-1.0.8"
Then we copy over the source files:
cp "$srcdir"/*.c .
cp "$srcdir"/*.h .
cp "$srcdir"/words* .
cp "$srcdir/Makefile" .
We manually compile the Makefile targets that we are interested in:
make CFLAGS="-Wall -Winline -O2 -D_FILE_OFFSET_BITS=64 -O2 -Os" \
CC=${TARGET}-gcc AR=${TARGET}-ar \
RANLIB=${TARGET}-ranlib libbz2.a bzip2 bzip2recover
The compiler, archive tool ar for building static libraries, and ranlib
(indexing tool for static libraries) are manually specified on the command
line.
We copy the programs, library and header over manually:
cp bzip2 "$SYSROOT/bin"
cp bzip2recover "$SYSROOT/bin"
cp libbz2.a "$SYSROOT/lib"
cp bzlib.h "$SYSROOT/usr/include"
ln -s bzip2 "$SYSROOT/bin/bunzip2"
ln -s bzip2 "$SYSROOT/bin/bzcat"
The symlinks bunzip2 and bzcat both point to the bzip2 binary which
when run deduces from the path that it should act as a decompression tool
instead.
Bzip2 also provides a bunch of wrapper scripts like gzip and xz:
cp "$srcdir/bzdiff" "$SYSROOT/bin"
cp "$srcdir/bzdiff" "$SYSROOT/bin"
cp "$srcdir/bzmore" "$SYSROOT/bin"
cp "$srcdir/bzgrep" "$SYSROOT/bin"
ln -s bzgrep "$SYSROOT/bin/bzegrep"
ln -s bzgrep "$SYSROOT/bin/bzfgrep"
ln -s bzmore "$SYSROOT/bin/bzless"
ln -s bzdiff "$SYSROOT/bin/bzcmp"
cd "$BUILDROOT"
Zlib is a library that implements the deflate compression algorithm that is used
for data compression in formats like gzip or zip (and thanks to zlib also
in a bunch of other formats).
In case you are wondering why we install zlib this after gzip if it uses the
same base compression algorithm: the later has it's own implmentation and won't
benefit from an installed version of zlib.
Because zlib also rolls it's own configure script, we do the same dance again
with copying the required stuff over into our build directory:
mkdir -p "$BUILDROOT/build/zlib-1.2.11"
cd "$BUILDROOT/build/zlib-1.2.11"
srcdir="$BUILDROOT/src/zlib-1.2.11"
cp "$srcdir"/*.c "$srcdir"/*.h "$srcdir"/zlib.pc.in .
cp "$srcdir"/configure "$srcdir"/Makefile* .
We can then proceed to cross compile the static libz.a library:
CROSS_PREFIX="${TARGET}-" prefix="/usr" ./configure
make libz.a
The target is named explicitly here, because by default the Makefile would
try to compile a couple test programs as well.
With everything compiled, we can then copy the result over into our sysroot directory and go back to the build root:
cp libz.a "$SYSROOT/usr/lib/"
cp zlib.h "$SYSROOT/usr/include/"
cp zconf.h "$SYSROOT/usr/include/"
cp zlib.pc "$SYSROOT/usr/lib/pkgconfig/"
cd "$BUILDROOT"
The file command line program can magically identify and describe tons of
file types. The core functionallity is actually implemented in a library
called libmagic that comes with a data base of magic numbers.
There is one little quirk tough, in order to cross compile file, we need
the same version of file already installed, so it can build the magic data
base.
So first, we manually compile file and install it in our toolchain directory:
srcdir="$BUILDROOT/src/file-5.40"
mkdir -p "$BUILDROOT/build/file-host"
cd "$BUILDROOT/build/file-host"
unset PKG_CONFIG_SYSROOT_DIR
unset PKG_CONFIG_LIBDIR
unset PKG_CONFIG_PATH
$srcdir/configure --prefix="$TCDIR" --build="$HOST" --host="$HOST"
make
make install
At this point, it should be fairly straight forward to understand what this
does. The 3 unset lines revert the pkg-config paths so that we can propperly
link it against our host libraries.
After that, we of course need to reset the pkg-config exports:
export PKG_CONFIG_SYSROOT_DIR="$SYSROOT"
export PKG_CONFIG_LIBDIR="$SYSROOT/lib/pkgconfig"
export PKG_CONFIG_PATH="$SYSROOT/lib/pkgconfig"
After that, we can simply install file, libmagic and the data base:
auto_build "file-5.40"
Of course, libmagic installs a libtool archive, so we delete that:
rm "$SYSROOT/lib/libmagic.la"
The reason for building this after all the other tools is that file can make
use of the compressor libraries we installed previously to peek into compressed
files, so for instance, it can identify a gzip compressed tarball as such
instead of telling you it's a gzip compressed whatever.
The util-linux pacakge contains a large collection of command line tools that
implement Linux specific functionallity that is missing from more generic
collections like the GNU core utilities. Among them are essentials like
the mount program, loop back device control or rfkill. I won't list all of
them, since we individually enable them throught he configure script, so a you
can see a detailed list below.
The package can easily be built with our autotools helper:
auto_build "util-linux-2.36" --without-systemd --without-udev \
--disable-all-programs --disable-bash-completion --disable-pylibmount \
--disable-makeinstall-setuid --disable-makeinstall-chown \
--enable-mount --enable-losetup --enable-fsck --enable-mountpoint \
--enable-fallocate --enable-unshare --enable-nsenter \
--enable-hardlink --enable-eject --enable-agetty --enable-wdctl \
--enable-cal --enable-switch_root --enable-pivot_root \
--enable-lsmem --enable-ipcrm --enable-ipcs --enable-irqtop \
--enable-lsirq --enable-rfkill --enable-kill --enable-last \
--enable-mesg --enable-raw --enable-rename --enable-ul --enable-more \
--enable-setterm --enable-libmount --enable-libblkid --enable-libuuid \
--enable-libsmartcols --enable-libfdisk
As mentioned before, the lengthy list of --enable-<foo> is only there because
of the --disable-all-programs switch that turns everything off that we don't
excplicitly enable.
Among the things we skip are programs like login or su that handle user
authentication. However, the agetty program which implements the typical
terminal login promt is explicitly enabled. We will get back to that later
on, when setting up an init system and shadow-utils for the user
authentication.
After the build is done, some cleanup steps need to be taken care of:
mv "$SYSROOT"/sbin/* "$SYSROOT/bin"
rm "$SYSROOT/lib"/*.la
rmdir "$SYSROOT/sbin"
Even when configuring util-linux to install into /bin instead of /sbin,
it will still stubbornly install rfkill into /sbin.
The *.la files that are removed are from the helper libraries libblkid.so,
libsmartcols.so, libuuid.so andlibfdisk.so.
A signifficant portion of the compiled binaries consists of debugging symbols that a can easily be removed using a strip command.
The following command line uses the find command to locate all regular
files -type f, i.e. it won't print symlinks or directories and for each of
them runs the file command to identify their file type. The list is fed
through grep ELF to filter out the ones identified as ELF files (because
there are also some shell scripts in there, which we obviously can't strip)
and then fed through'sed to remove the : ELF... description.
The resulting list of excutable files is then passed on to the ${TARGET}-strip
program from our toolchain using xargs:
find "$SYSROOT/bin" -type f -exec file '{}' \; | \
grep ELF | sed 's/: ELF.*$//' | xargs ${TARGET}-strip -xs
On my Fedora system, this drastically cuts the size of the /bin directory
from ~67.5 MiB (28,595,781 bytes) down to ~7 MiB (7,346,105 bytes).
We do the same thing a second time for the /lib directory. Please note the
extra argument ! -path '*/lib/modules/*' for the find command to make sure
we skip the kernel modules.
find "$SYSROOT/lib" -type f ! -path '*/lib/modules/*' -exec file '{}' \; |\
grep ELF | sed 's/: ELF.*$//' | xargs ${TARGET}-strip -xs
On my system, this reduces the size of the /lib directory
from ~67 MiB (70,574,158 bytes) to ~60 MiB (63,604,818 bytes).
If you are the kind of person who loves to ramble about "those pesky shared libraries" and insist on statically linking everything, you should take a look of what uses up that much space:
The largest chunk of the /lib directory, is kernel modules. On my system
those make up ~54.5 MiB (57,114,381 bytes) of the two numbers above. So if you
are bent on cutting the size down, you should start by tossing out modules you
don't need.
SquashFS is a highly compressed, read-only filesystem that is packed offline into an archive that can than be mounted by Linux.
Besides the high compression, which drastically reduces the on-disk memory
footprint, being immutable and read-only has a number of advantages in itself.
Our system will stay in a well defined state and the lack of write operations
reduces wear on the SD card. Writable directories (e.g. for temporary files
in /tmp or for things like log files that you actually want to write) are
typically achieved by mounting another filesystem to a directory on the
SquashFS root (e.g. from another SD card partition, or simply mounting
a tmpfs).
This can also be combined with an overlayfs, where a directory on the
SquashFS can be merged with a directory from a writable filesystem. Any changes
to existing files are implemented by transparentyl copying the file to the
writable filesystem first and editing it there. Erasing the writable directory
essentially causes a "factory reset" to the initial content.
We will revisit this topic later on, for now we are just interested in packing the filesystem and testing it out.
For packing a SquashFS image, we use squashfs-tools-ng.
The reason for using squashfs-tools-ng is that it contains a handy tool
called gensquashfs that takes an input listing similar to gen_init_cpio.
On some systems, you can just install it from the package repository. But be aware that I'm going to use a few features that were introduced in version 1.1, which currently isn't packaged on some systems.
If you are building the package yourself, you need the devlopment packages for at least one of the compressors that SquashFS supports (e.g. xz-utils or Zstd).
Because we compile a host tool again, we need to unset the pkg-config path
variables first:
unset PKG_CONFIG_SYSROOT_DIR
unset PKG_CONFIG_LIBDIR
unset PKG_CONFIG_PATH
We build the package the same way as other host tools and install it into the toolchain directory:
srcdir="$BUILDROOT/src/squashfs-tools-ng-1.1.0"
mkdir -p "$BUILDROOT/build/squashfs-tools-ng-1.1.0"
cd "$BUILDROOT/build/squashfs-tools-ng-1.1.0"
$srcdir/configure --prefix=$TCDIR --host=$HOST --build=$HOST
make
make install
cd "$BUILDROOT"
The listing file for the SquashFS archive is a little bit longer than the one for the initital ramfs. I included a prepared version.
If you examine the list, you will find that many files of the sysroot aren't packed. Specifically the header files, man pages (plus other documentation) and static libraries. The development files are omitted, because without development tools (e.g. gcc, ...) they are useless on the target system. Likewise, we don't have any tools yet to actually view the documentation files. Omitting those safes us some space.
Using the listing, we can pack the root filesystem using gensquashfs:
gensquashfs --pack-dir "$SYSROOT" --pack-file list.txt -f rootfs.sqfs
On my system, the resulting archive is ~18.3 MiB in size (19,238,912 bytes).
For comparison, I also tried the unstripped binaries, resuling in a SquashFS archive of ~26.7 MiB (27,971,584 bytes) and packing without any kernel modules, resulting in only ~4.4 MiB (4,583,424 bytes).
First of, we will revise the /init script of our initial ram filesystem as
follows:
cd "$BUILDROOT/build/initramfs"
cat > init <<_EOF
#!/bin/sh
PATH=/bin
/bin/busybox --install
/bin/busybox mount -t proc none /proc
/bin/busybox mount -t sysfs none /sys
/bin/busybox mount -t devtmpfs none /dev
boot_part="mmcblk0p1"
root_sfs="rootfs.sqfs"
while [ ! -e "/dev/$boot_part" ]; do
echo "Waiting for device $boot_part"
busybox sleep 1
done
mount "/dev/$boot_part" "/boot"
if [ ! -e "/boot/${root_sfs}" ]; then
echo "${root_sfs} not found!"
exec /bin/busybox sh
exit 1
fi
mount -t squashfs /boot/${root_sfs} /newroot
umount -l /boot
umount -l /dev
umount /sys
umount /proc
unset -v root_sfs boot_part
exec /bin/busybox switch_root /newroot /bin/bash
_EOF
This new init script starts out pretty much the same way, but instead of
dropping directly into a busybox shell, we first mount the primrary
partition of the SD card (in my case /dev/mmcblk0p1) to /boot.
As the device node may not be present yet in the /dev filesystem, we wait
in a loop for it to pop up.
From the SD card, we then mount the rootfs.sqfs that we just generated,
to /newroot. There is a bit of trickery involved here, because traditional,
Unix-like operating systems can only mount devices directly. The mount point
has a filesystem driver associated with it, and an underlying device number.
In order to mount an archive from a file, Linux has a loop back block device,
which works like a regular block device, but reflects read/write access back
to an existing file in the filesystem. The mount command transparently takes
care of setting up the loop device for us, and then actually.
After that comes a bit of a mind screw. We cleanup after ourselves, i.e. unset
the environment variables, but also unmount everything, including the /boot
directory, from which the SquashFS archive was mounted.
Note the -l parameter for the mount, which means lazy. The kernel detaches
the filesystem from the hierarchy, but keeps it open until the last reference
is removed (in our case, held by the loop back block device).
The final switch_root works somewhat similar to a chroot, except that it
actually does change the underlying mountpoints and also gets rid of the
initial ram filesystem for us.
After extending the init script, we can rebuild the initramfs:
./gen_init_cpio initramfs.files | xz --check=crc32 > initramfs.xz
cp initramfs.xz "$SYSROOT/boot"
cd "$BUILDROOT"
We, again, copy everything over to the SD card (don't forget the rootfs.sqfs) and boot up the Raspberry Pi.
This should now drop you directly into a bash shell on the SquashFS image.
If you try to run certain commands like mount, keep in mind that /proc
and /sys aren't mounted, causing the resulting error messages. But if you
manually mount them again, everything should be fine again.