Release v1.0.0-stable · Expl0dingCat/safehere

safehere v1.0.0-stable

Runtime tool-output scanning middleware for Cohere AI agents. Detects and blocks prompt injection attacks hiding in tool results before they reach the model.

Highlights

5 detection layers: pattern matching, schema drift, statistical anomaly, heuristic instruction classification, TF-IDF semantic classifier
1,028-sample evaluation corpus across 50+ attack categories
Pre-trained model bundled -- pip install safehere[ml] works out of the box
Regex timeout protection prevents ReDoS denial-of-service
0.5% FPR on 405 benign samples, 97.6% TPR on 623 adversarial samples

Install

pip install safehere              # core (4 rule-based scanners)
pip install safehere[ml]          # + TF-IDF semantic scanner
pip install safehere[cohere]      # + Cohere managed loop
pip install safehere[all]         # everything

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.0.0-stable

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

safehere v1.0.0-stable

Highlights

Install

Uh oh!