Skip to content

feat(handler): support NSIS Installers#1255

Open
jcrussell wants to merge 1 commit intoonekey-sec:mainfrom
jcrussell:nsis
Open

feat(handler): support NSIS Installers#1255
jcrussell wants to merge 1 commit intoonekey-sec:mainfrom
jcrussell:nsis

Conversation

@jcrussell
Copy link
Contributor

Searches for "Nullsoft" in the manifest to avoid false positives. Possibly too strict.

Fixes #1249

@qkaiser qkaiser self-assigned this Sep 4, 2025
@qkaiser qkaiser self-requested a review September 4, 2025 07:45
@qkaiser qkaiser added enhancement New feature or request format:executable python Pull requests that update Python code labels Sep 4, 2025
@qkaiser
Copy link
Contributor

qkaiser commented Sep 4, 2025

@jcrussell you should also create integration tests to check that the handler works as expected.

You have to create the following directories:

  • unblob/tests/integration/executable/pe/__input__
  • unblob/tests/integration/executable/pe/__output

I would put the following in the input directory:

  • a normal PE file
  • a normal PE file with prefix and suffix padding
  • a nullsoft PE file
  • a nullsoft PE file with prefix and suffix padding

To generate the output directory content, run the following:

find unblob/tests/integration/executable/pe/__input__ -type f -exec unblob -f -k -e unblob/tests/integration/executable/pe/__output__ {} \;

@qkaiser
Copy link
Contributor

qkaiser commented Sep 29, 2025

@jcrussell any update on this ? do you need assistance ?

@jcrussell
Copy link
Contributor Author

@jcrussell any update on this ? do you need assistance ?

@qkaiser: I believe the code is close to final. Do you mind adding the integration test data? It is easier for me to release code than data. Here's what I have been testing with:

Thanks in advance!


return ValidChunk(
start_offset=start_offset,
end_offset=start_offset + binary.original_size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

original_size is the file size on disk, not the actual PE file size. Samples with suffix are carved with the suffix, which is incorrect. I'm looking into it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcrussell did you get the opportunity to look into this ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe #1255 (comment) ?

@qkaiser
Copy link
Contributor

qkaiser commented Oct 15, 2025

@jcrussell had to figure out how to handle LFS on forks, looks like it's okay now. Made some adjustments to keep pyright happy given LIEF's ability to return completely different types for the same object.

We need to fix the way the end offset is calculated, it'll probably be based on sections size and header size. Without unblob considers everything after the PE as part of the PE chunk.

@jcrussell
Copy link
Contributor Author

Thanks for moving this along!

We need to fix the way the end offset is calculated, it'll probably be based on sections size and header size. Without unblob considers everything after the PE as part of the PE chunk.

I started looking into this:

>>> pe = lief.PE.parse("tests/integration/executable/pe/__input__/nsis-3.11-setup.exe")
>>> pe.original_size
1564991
>>> pe.sizeof_headers
1024
>>> sum([s.sizeof_raw_data for s in pe.sections])
52224
>>> sum([v.size for v in pe.data_directories])
19456

Found this script that dumps a bunch of info, going to try a more complete look at all the parts tomorrow.

https://github.com/lief-project/LIEF/blob/main/api/python/examples/pe_reader.py

@jcrussell
Copy link
Contributor Author

This works for (some) non-NSIS PEs but trims off the data that NSIS adds after the PE that contains what we actually want to extract. The "trimmed" data is not recognized by any handler. It seems like we need to detect if it's a NSIS installer incalculate_chunk to see if we need to increase the size for the NSIS data.

        size = sum([s.sizeof_raw_data for s in binary.sections]) + binary.sizeof_headers

        return ValidChunk(
            start_offset=start_offset,
            end_offset=start_offset + size
        )   

@qkaiser qkaiser marked this pull request as draft November 25, 2025 12:44
@qkaiser qkaiser added this to the Internship 2026 milestone Nov 25, 2025
@jcrussell
Copy link
Contributor Author

@qkaiser: sorry for the delay, think this is finally fixed:

$ ls tests/integration/executable/pe/__output__/nsis-3.11-setup.exe.padded_extract/
0-16.padding  1565007-1565023.padding  16-1565007.pe  16-1565007.pe_extract
$ xxd tests/integration/executable/pe/__output__/nsis-3.11-setup.exe.padded_extract/0-16.padding 
00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
$ xxd tests/integration/executable/pe/__output__/nsis-3.11-setup.exe.padded_extract/1565007-1565023.padding 
00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
$ sha1sum tests/integration/executable/pe/__output__/nsis-3.11-setup.exe.padded_extract/16-1565007.pe
a64bbad73d4638d668ffdbd0887be7d6528d6a9d  tests/integration/executable/pe/__output__/nsis-3.11-setup.exe.padded_extract/16-1565007.pe
$ sha1sum tests/integration/executable/pe/__input__/nsis-3.11-setup.exe 
a64bbad73d4638d668ffdbd0887be7d6528d6a9d  tests/integration/executable/pe/__input__/nsis-3.11-setup.exe

@jcrussell jcrussell marked this pull request as ready for review December 4, 2025 01:01
…table

Add support for PE file by relying on LIEF to parse PE file once matched
on 'MZ' or 'PE' signature.

If the file is a self-extractable NSIS executable
("Nullsoft.NSIS.exehead" present in manifest) we extract it with 7zip.

Note: the DLL files within MSI extraction directory are no longer
extracted since the PE handler takes care of them. This is an
improvement over the RAR false positive being found in the DLL.

Co-authored-by: Quentin Kaiser <quentin.kaiser@onekey.com>
@qkaiser
Copy link
Contributor

qkaiser commented Jan 25, 2026

@jcrussell sorry about the delay, plan is to finalize this next week

Comment on lines +61 to +66
HexString(
"""
// MZ header
4d 5a
"""
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried this will lead to many false positive matches being fed to lief, which will have a significant performance impact. One way we limit impact with those small patterns is this:

Suggested change
HexString(
"""
// MZ header
4d 5a
"""
),
Regex("^\x4d\x5a"),

Forces the matched pattern to be at offset 0 of the file.

Comment on lines +69 to +70
// PE header
50 45 00 00
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there fields other than the magic we can do some decent pattern matching on ?

)

def calculate_chunk(self, file: File, start_offset: int) -> Optional[ValidChunk]:
file.seek(start_offset, io.SEEK_SET)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed.

Suggested change
file.seek(start_offset, io.SEEK_SET)

Comment on lines +114 to +116
_, _, _, _, archive_size = struct.unpack(
"II12sII", overlay[header_start : header_start + 28]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are those fields exactly ? If you turn this Handler into a StructHandler you can define C_DEFINITIONS, a HEADER_STRUCT and use the parse_header function to get a dissect.cstruct Structure with named fields.

Comment on lines +105 to +107
overlay = bytes(binary.overlay)

magic_offset = overlay.find(b"NullsoftInst")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How big is the overlay ? Is there a way you can derive the start offset of that overlay ? If so you could call find on file which is mmap'ed. It should be faster. Also if you know the magic value is within the first X bytes of the overlay, I would set an upper bound on the content that is searched through.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I would do something like this:

Suggested change
overlay = bytes(binary.overlay)
magic_offset = overlay.find(b"NullsoftInst")
magic_offset = file[binary.overlay_offset:binary.overlay_offset+SOME_UPPER_BOUND].find(b"NullSoftInst")

bonus point if the upper bound is less than a page

Comment on lines +52 to +53
if "Nullsoft.NSIS.exehead" in manifest:
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if "Nullsoft.NSIS.exehead" in manifest:
return True
return "Nullsoft.NSIS.exehead" in manifest


return ValidChunk(
start_offset=start_offset,
end_offset=start_offset + binary.original_size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcrussell did you get the opportunity to look into this ?

@jcrussell
Copy link
Contributor Author

@qkaiser: not yet, sorry. hoping to get to it by the end of the month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request format:executable python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support NSIS Installers

2 participants