Skip to content

RuntimeError: internal error in regular expression engine #10

@mashalzainab

Description

@mashalzainab

I get a RuntimeError: internal error in regular expression engine when evaluating some adversarial files.

I have tried modifying features.py to use the regex library and truncating long search strings, but that leads to incorrect predictions compared to the original implementation.

Traceback (most recent call last):
  File "main.py", line 274, in <module>
    main()
  File "models/thrember.py", line 962, in evaluate_for_threshold
    score = predict_sample(lgbm_model, file_data)
  File "models/thrember.py", line 475, in predict_sample
    features = np.array(extractor.feature_vector(file_data), dtype=np.float32)
  File "models/features.py", line 1163, in feature_vector
    return self.process_raw_features(self.raw_features(bytez))
  File "models/features.py", line 1155, in raw_features
    features.update({fe.name: fe.raw_features(bytez, pe) for fe in self.features})
  File "models/features.py", line 1155, in <dictcomp>
    features.update({fe.name: fe.raw_features(bytez, pe) for fe in self.features})
  File "models/features.py", line 356, in raw_features
    if re.search(r, s):
  File "/usr/lib/python3.10/re.py", line 200, in search
    return _compile(pattern, flags).search(string)
RuntimeError: internal error in regular expression engine

I tried to debug if it happens on some specific files but it is independent of that.
Any fixes make the predictions very slow and generates bottlenecks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions