-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Description
We'll be talking about the file src/thrember/features.py.
First, notice in ImportsInfo::process_raw_features:
# ...
# Number of libraries/imports
lengths = [len(imports), len(libraries)]
# Two separate elements: libraries (alone) and fully-qualified names of imported functions
return np.hstack([lengths, libraries_hashed, imports_hashed]).astype(np.float32)As you can see, the feature vector would contain the number of imported functions and imported libraries.
Now, go to ExportsInfo::process_raw_features:
# ...
exports_hashed = FeatureHasher(128, input_type="string").transform([raw_obj]).toarray()[0]
return np.hstack([np.array([len(exports_hashed)]), exports_hashed.astype(np.float32)])So the feature vector would contain len(exports_hashed), which is always 128, instead of the number of exported functions as I would personally expect.
Proposed remedy:
return np.hstack([np.array([len(raw_obj)]), exports_hashed.astype(np.float32)])where len(raw_obj) is the number of exported functions (for reason, see ExportsInfo::raw_features).
Metadata
Metadata
Assignees
Labels
No labels