Skip to content

Zimage fix#8

Open
oohal wants to merge 7 commits intoantonblanchard:masterfrom
oohal:zimage-fix
Open

Zimage fix#8
oohal wants to merge 7 commits intoantonblanchard:masterfrom
oohal:zimage-fix

Conversation

@oohal
Copy link
Copy Markdown

@oohal oohal commented Mar 9, 2018

Fixes the entry point calculation so you can kexec into a zImage directly

oohal added 3 commits March 9, 2018 16:25
Add an explicit check for the .kernel:<filename> section that
contains the compressed kernel binary for a zImage. If it's not
present then we're probably trying to kexec an ordinary binary
and  we emit a warning to indicate the user might be making poor
life decisions.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Add the automake dependency and add fedora package names. Spending
more than half a second thinking about this stuff is too much.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Currently for ABI v2 binaries we jump to the entry point in the
ELF header. For a vmlinux this works because the entry point is
0xc000000000000000 and the upper bits of the address are ignored
in real mode. For a zImage the entry point is 0x20000000 and as
a result kexec will jump to 0x20000000 + load_address which
typically results in dying in a fire. Fix this by turning the
entry point into an offset into the PT_LOAD section and jump
to that instead.

This patch allows us to kexec into a zImage directly, which
rules.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
@antonblanchard
Copy link
Copy Markdown
Owner

Thanks Oliver, this looks good.

I tried kexec'ing a zImage and a separate initrd, and the initrd got corrupted. Unfortunately the zImage code ignores the reserved fields in the fdt:

        /* FIXME: we should process reserve entries */

        simple_alloc_init(_end, ima_size - (unsigned long)_end, 32, 64);

And kexec-lite places the initrd right after the kernel:

add_kexec_segment kernel      ... dest 0x1a60000, memsize 0x006b0000
add_kexec_segment initrd      ... dest 0x2110000, memsize 0x035e0000

I added 16MB of padding between the kernel and initrd, and the corruption went away. It would be good to fix the zImage wrapper to understand the reserved entries, but a few ideas to handle existing kernels:

  1. Detect a zImage, and add padding after the zImage. If so, how much?
  2. Detect a zImage and flip things around, placing the initrd before the zImage. The zImage should be able to grow until the fdt up at 2GB

I also note that kexec-lite is setting r7 (ima_size) to 0, which I presume means we initalize the memory allocator to (-_end).

@oohal
Copy link
Copy Markdown
Author

oohal commented Mar 19, 2018

The wrapper always starts allocating from _end up until ima_size. Passing a zero IMA size results in the heap size calculation underflowing to something large so everything works out just fine. Cool...

Anyway the zImage does emit some warnings due to passing an ima_size of zero, but the OPAL console backend has been broken for a while so I never saw them. Fixing that results in:

[   24.472082329,5] OPAL: Switch to big-endian OS
WARNING: Image loaded outside IMA! (_end=0000000001071710, ima_size=0x0)
WARNING: Device tree address is outside IMA!(fdt_addr=0x7fff0000, ima_size=0x0)
WARNING: Device tree extends outside IMA! (fdt_addr=0x7fff0000, size=0x282c, ima_size=0x0

zImage starting: loaded at 0x0000000000c60000 (sp: 0x0000000001070ee8)

These warnings only really matter on actual epapr platforms where the IMA is a real thing, but we can squash them easily enough by passing mem_top from the kexec memory map as the IMA size.

Anyway, for fixing the corruption I think placing the initrd before the zImage is probably safer. The only time the wrapper does any large (multi-MB) allocations is when it detects that the uncompressed vmlinux will overlap with the initrd in which case it allocates a new buffer and moves the initrd into it.

I'm a little surprised we haven't run the problem of the uncompressed vmlinux overlapping with the zImage text, which is a fatal error. At a guess it's because kexec-lite starts allocating space for the loaded segments at linux,kernel-end and the new vmlinux is similarly sized to the old one. I'm not sure what the right fix is here since we fundementally don't know how big the new kernel is unless we decompress it inside of kexec.

In skiboot we just load the zImage at 0x20000000 and assume that's enough for the new kernel. In that situation we don't need to deal with potentially large initrds though so that's probably not a good idea here.

oohal added 4 commits May 7, 2018 12:04
The ePAPR entry ABI expects to be passed the amount of accessible memory
(aka the inital mapped area) in r7. The IMA concept only really makes sense
on BookE parts which always have the MMU enabled, but don't necessarily
have all of memory mapped. On PowerNV (where kexec-lite is mainly used)
we can always access all of memory in real mode.

The zImage will emit warnings if the dtb or initrd are outside the RMA so
we should pass a sane value for the IMA size from the trampoline. This
patch makes the trampoline pass the mem_top that was calculated when
building the kexec memory map as the IMA size.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Currently we reserve the memory from 0...linux,kernel-end in the kexec
memory map which prevents the current kernel's text from being
overwritten when we assemble the kexec segment buffers at kexec time.

This reservation is only really needed when loading a crashkernel since
it wants to preserve the existing kernel text to create a crashdump. For
a generic vmlinux the reservation is mostly pointless since most
kernel's will copy themselves down to in early boot anyway.

When a zImage is loaded we do need to keep the start of memory reserved
because the zImage will extract the vmlinux to zero on ePAPR platforms.
By luck reserving up until linux,kernel-end will usually reserve enough
space for the new kernel's text. This does however break down when the
new kernel is signifigantly larger than the previous (e.g. it contains a
large builtin initramfs).

This patch replaces the 0...linux,kernel-end reservation with a fixed
256MB reservation, which ought to be enough for anybody.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
This property is generated by Linux at boot time and shouldn't be
included in the dtb.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
The zImage uses the memory immediately after it as a heap area. This can
result in data corruption if we load another kexec segment after it
zimage (e.g. initrd) so add some padding to the kernel to prevent this
situation.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
@oohal
Copy link
Copy Markdown
Author

oohal commented May 7, 2018

Fixed the IMA problem and added some padding the to kernel image so the zImage will have a bit of heap space if it needs it. I tested it on a barreleye and it only allocated ~400kb so a 4MB heap should be plenty.

I found another problem with how we use linux,kernel-end too. Currently we reserve from 0...linux,kernel-end in the kexec memory map, but there's no real reason to do so for a vmlinux since it will copy itself over the previous kernel anyway.

When loading a zImage we do need to reserve space at zero because that's where it will decompress the image to. However, if the new vmlinux happens to be bigger than the old reserving up to linux,kernel-end isn't enough and the new kernel will overlap with the zImage which causes it to abort. I've fixed this by just reserving the first 256MB for the new kernel, hopefully that'll be enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants