Skip to content

v1.0.0-stable

Latest

Choose a tag to compare

@Expl0dingCat Expl0dingCat released this 21 Mar 02:37

safehere v1.0.0-stable

Runtime tool-output scanning middleware for Cohere AI agents. Detects and blocks prompt injection attacks hiding in tool results before they reach the model.

Highlights

  • 5 detection layers: pattern matching, schema drift, statistical anomaly, heuristic instruction classification, TF-IDF semantic classifier
  • 1,028-sample evaluation corpus across 50+ attack categories
  • Pre-trained model bundled -- pip install safehere[ml] works out of the box
  • Regex timeout protection prevents ReDoS denial-of-service
  • 0.5% FPR on 405 benign samples, 97.6% TPR on 623 adversarial samples

Install

pip install safehere              # core (4 rule-based scanners)
pip install safehere[ml]          # + TF-IDF semantic scanner
pip install safehere[cohere]      # + Cohere managed loop
pip install safehere[all]         # everything