feat(handler): support NSIS Installers#1255
feat(handler): support NSIS Installers#1255jcrussell wants to merge 1 commit intoonekey-sec:mainfrom
Conversation
|
@jcrussell you should also create integration tests to check that the handler works as expected. You have to create the following directories:
I would put the following in the input directory:
To generate the output directory content, run the following: find unblob/tests/integration/executable/pe/__input__ -type f -exec unblob -f -k -e unblob/tests/integration/executable/pe/__output__ {} \; |
|
@jcrussell any update on this ? do you need assistance ? |
@qkaiser: I believe the code is close to final. Do you mind adding the integration test data? It is easier for me to release code than data. Here's what I have been testing with:
Thanks in advance! |
|
|
||
| return ValidChunk( | ||
| start_offset=start_offset, | ||
| end_offset=start_offset + binary.original_size, |
There was a problem hiding this comment.
original_size is the file size on disk, not the actual PE file size. Samples with suffix are carved with the suffix, which is incorrect. I'm looking into it.
There was a problem hiding this comment.
@jcrussell did you get the opportunity to look into this ?
|
@jcrussell had to figure out how to handle LFS on forks, looks like it's okay now. Made some adjustments to keep pyright happy given LIEF's ability to return completely different types for the same object. We need to fix the way the end offset is calculated, it'll probably be based on sections size and header size. Without unblob considers everything after the PE as part of the PE chunk. |
|
Thanks for moving this along!
I started looking into this: Found this script that dumps a bunch of info, going to try a more complete look at all the parts tomorrow. https://github.com/lief-project/LIEF/blob/main/api/python/examples/pe_reader.py |
|
This works for (some) non-NSIS PEs but trims off the data that NSIS adds after the PE that contains what we actually want to extract. The "trimmed" data is not recognized by any handler. It seems like we need to detect if it's a NSIS installer in |
|
@qkaiser: sorry for the delay, think this is finally fixed: $ ls tests/integration/executable/pe/__output__/nsis-3.11-setup.exe.padded_extract/
0-16.padding 1565007-1565023.padding 16-1565007.pe 16-1565007.pe_extract
$ xxd tests/integration/executable/pe/__output__/nsis-3.11-setup.exe.padded_extract/0-16.padding
00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
$ xxd tests/integration/executable/pe/__output__/nsis-3.11-setup.exe.padded_extract/1565007-1565023.padding
00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
$ sha1sum tests/integration/executable/pe/__output__/nsis-3.11-setup.exe.padded_extract/16-1565007.pe
a64bbad73d4638d668ffdbd0887be7d6528d6a9d tests/integration/executable/pe/__output__/nsis-3.11-setup.exe.padded_extract/16-1565007.pe
$ sha1sum tests/integration/executable/pe/__input__/nsis-3.11-setup.exe
a64bbad73d4638d668ffdbd0887be7d6528d6a9d tests/integration/executable/pe/__input__/nsis-3.11-setup.exe |
…table
Add support for PE file by relying on LIEF to parse PE file once matched
on 'MZ' or 'PE' signature.
If the file is a self-extractable NSIS executable
("Nullsoft.NSIS.exehead" present in manifest) we extract it with 7zip.
Note: the DLL files within MSI extraction directory are no longer
extracted since the PE handler takes care of them. This is an
improvement over the RAR false positive being found in the DLL.
Co-authored-by: Quentin Kaiser <quentin.kaiser@onekey.com>
|
@jcrussell sorry about the delay, plan is to finalize this next week |
| HexString( | ||
| """ | ||
| // MZ header | ||
| 4d 5a | ||
| """ | ||
| ), |
There was a problem hiding this comment.
I'm worried this will lead to many false positive matches being fed to lief, which will have a significant performance impact. One way we limit impact with those small patterns is this:
| HexString( | |
| """ | |
| // MZ header | |
| 4d 5a | |
| """ | |
| ), | |
| Regex("^\x4d\x5a"), |
Forces the matched pattern to be at offset 0 of the file.
| // PE header | ||
| 50 45 00 00 |
There was a problem hiding this comment.
Are there fields other than the magic we can do some decent pattern matching on ?
| ) | ||
|
|
||
| def calculate_chunk(self, file: File, start_offset: int) -> Optional[ValidChunk]: | ||
| file.seek(start_offset, io.SEEK_SET) |
There was a problem hiding this comment.
Not needed.
| file.seek(start_offset, io.SEEK_SET) |
| _, _, _, _, archive_size = struct.unpack( | ||
| "II12sII", overlay[header_start : header_start + 28] | ||
| ) |
There was a problem hiding this comment.
What are those fields exactly ? If you turn this Handler into a StructHandler you can define C_DEFINITIONS, a HEADER_STRUCT and use the parse_header function to get a dissect.cstruct Structure with named fields.
| overlay = bytes(binary.overlay) | ||
|
|
||
| magic_offset = overlay.find(b"NullsoftInst") |
There was a problem hiding this comment.
How big is the overlay ? Is there a way you can derive the start offset of that overlay ? If so you could call find on file which is mmap'ed. It should be faster. Also if you know the magic value is within the first X bytes of the overlay, I would set an upper bound on the content that is searched through.
There was a problem hiding this comment.
So I would do something like this:
| overlay = bytes(binary.overlay) | |
| magic_offset = overlay.find(b"NullsoftInst") | |
| magic_offset = file[binary.overlay_offset:binary.overlay_offset+SOME_UPPER_BOUND].find(b"NullSoftInst") |
bonus point if the upper bound is less than a page
| if "Nullsoft.NSIS.exehead" in manifest: | ||
| return True |
There was a problem hiding this comment.
| if "Nullsoft.NSIS.exehead" in manifest: | |
| return True | |
| return "Nullsoft.NSIS.exehead" in manifest |
|
|
||
| return ValidChunk( | ||
| start_offset=start_offset, | ||
| end_offset=start_offset + binary.original_size, |
There was a problem hiding this comment.
@jcrussell did you get the opportunity to look into this ?
|
@qkaiser: not yet, sorry. hoping to get to it by the end of the month. |
Searches for "Nullsoft" in the manifest to avoid false positives. Possibly too strict.
Fixes #1249