Skip to content

Mistake in computing number of exported functions #8

@laam-egg

Description

@laam-egg

We'll be talking about the file src/thrember/features.py.

First, notice in ImportsInfo::process_raw_features:

# ...
# Number of libraries/imports
lengths = [len(imports), len(libraries)]

# Two separate elements: libraries (alone) and fully-qualified names of imported functions
return np.hstack([lengths, libraries_hashed, imports_hashed]).astype(np.float32)

As you can see, the feature vector would contain the number of imported functions and imported libraries.

Now, go to ExportsInfo::process_raw_features:

# ...
exports_hashed = FeatureHasher(128, input_type="string").transform([raw_obj]).toarray()[0]
return np.hstack([np.array([len(exports_hashed)]), exports_hashed.astype(np.float32)])

So the feature vector would contain len(exports_hashed), which is always 128, instead of the number of exported functions as I would personally expect.

Proposed remedy:

return np.hstack([np.array([len(raw_obj)]), exports_hashed.astype(np.float32)])

where len(raw_obj) is the number of exported functions (for reason, see ExportsInfo::raw_features).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions