-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Description
I get a RuntimeError: internal error in regular expression engine when evaluating some adversarial files.
I have tried modifying features.py to use the regex library and truncating long search strings, but that leads to incorrect predictions compared to the original implementation.
Traceback (most recent call last):
File "main.py", line 274, in <module>
main()
File "models/thrember.py", line 962, in evaluate_for_threshold
score = predict_sample(lgbm_model, file_data)
File "models/thrember.py", line 475, in predict_sample
features = np.array(extractor.feature_vector(file_data), dtype=np.float32)
File "models/features.py", line 1163, in feature_vector
return self.process_raw_features(self.raw_features(bytez))
File "models/features.py", line 1155, in raw_features
features.update({fe.name: fe.raw_features(bytez, pe) for fe in self.features})
File "models/features.py", line 1155, in <dictcomp>
features.update({fe.name: fe.raw_features(bytez, pe) for fe in self.features})
File "models/features.py", line 356, in raw_features
if re.search(r, s):
File "/usr/lib/python3.10/re.py", line 200, in search
return _compile(pattern, flags).search(string)
RuntimeError: internal error in regular expression engine
I tried to debug if it happens on some specific files but it is independent of that.
Any fixes make the predictions very slow and generates bottlenecks.
Metadata
Metadata
Assignees
Labels
No labels