RuntimeError: internal error in regular expression engine

I get a RuntimeError: internal error in regular expression engine when evaluating some adversarial files.

I have tried modifying features.py to use the regex library and truncating long search strings, but that leads to incorrect predictions compared to the original implementation.  

```
Traceback (most recent call last):
  File "main.py", line 274, in <module>
    main()
  File "models/thrember.py", line 962, in evaluate_for_threshold
    score = predict_sample(lgbm_model, file_data)
  File "models/thrember.py", line 475, in predict_sample
    features = np.array(extractor.feature_vector(file_data), dtype=np.float32)
  File "models/features.py", line 1163, in feature_vector
    return self.process_raw_features(self.raw_features(bytez))
  File "models/features.py", line 1155, in raw_features
    features.update({fe.name: fe.raw_features(bytez, pe) for fe in self.features})
  File "models/features.py", line 1155, in <dictcomp>
    features.update({fe.name: fe.raw_features(bytez, pe) for fe in self.features})
  File "models/features.py", line 356, in raw_features
    if re.search(r, s):
  File "/usr/lib/python3.10/re.py", line 200, in search
    return _compile(pattern, flags).search(string)
RuntimeError: internal error in regular expression engine
```

I tried to debug if it happens on some specific files but it is independent of that. 
Any fixes make the predictions very slow and generates bottlenecks. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RuntimeError: internal error in regular expression engine #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RuntimeError: internal error in regular expression engine #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions