Skip to content

Releases: Expl0dingCat/safehere

v1.0.0-stable

21 Mar 02:37

Choose a tag to compare

safehere v1.0.0-stable

Runtime tool-output scanning middleware for Cohere AI agents. Detects and blocks prompt injection attacks hiding in tool results before they reach the model.

Highlights

  • 5 detection layers: pattern matching, schema drift, statistical anomaly, heuristic instruction classification, TF-IDF semantic classifier
  • 1,028-sample evaluation corpus across 50+ attack categories
  • Pre-trained model bundled -- pip install safehere[ml] works out of the box
  • Regex timeout protection prevents ReDoS denial-of-service
  • 0.5% FPR on 405 benign samples, 97.6% TPR on 623 adversarial samples

Install

pip install safehere              # core (4 rule-based scanners)
pip install safehere[ml]          # + TF-IDF semantic scanner
pip install safehere[cohere]      # + Cohere managed loop
pip install safehere[all]         # everything

v1.0.0-beta.1

21 Mar 00:30

Choose a tag to compare

v1.0.0-beta.1 Pre-release
Pre-release

safehere v1.0.0-beta.1

Runtime tool-output scanning middleware for Cohere AI agents. Detects and blocks prompt injection attacks hiding in tool results before they reach the model.

Highlights

  • 5 detection layers: pattern matching, schema drift, statistical anomaly, heuristic instruction classification, and TF-IDF semantic classifier
  • 1,028-sample evaluation corpus across 50+ attack categories -- narrative injection, roleplay hijacking, fake compliance requests, translation-based attacks, persona splitting, encoding evasion, and more
  • Regex timeout protection (50ms per pattern) prevents ReDoS denial-of-service
  • Security hardened via red-team audit: recursive encoding decoder, RTL override reversal, Unicode tag character decoding, homoglyph expansion, schema recursion depth guard, anomaly cold-start hardening

Benchmarks

Metric Result
Detection (623 adversarial) 97.6% TPR
False positives (405 benign) 0.5% FPR
Semantic classifier (held-out 20%) 0.96 F1
CyberArk-style live attacks 10/10 blocked
Latency (with semantic scanner) ~12ms P50

Install

pip install safehere              # core (4 rule-based scanners)
pip install safehere[ml]          # + TF-IDF semantic scanner
pip install safehere[cohere]      # + Cohere managed loop (run/arun)
pip install safehere[all]         # everything

Known limitations

  • Narrative/analogy attacks with zero injection vocabulary can evade all layers
  • Low signal density (<5% payload in long documents) evades density-based filtering
  • Payload splitting across multiple tool outputs is not detected
  • Metrics are self-evaluated, not independently audited
  • Semantic model must be trained locally (python -m safehere.scanners.semantic --train)

Breaking changes from v0.x

  • cohere is now an optional dependency (pip install safehere[cohere])
  • Python 3.8 is no longer supported (minimum 3.9)
  • Scoring weights redistributed to accommodate the semantic scanner
  • SemanticScanner added to the default pipeline (degrades gracefully if scikit-learn is not installed)

v0.3.0-alpha

20 Mar 21:08

Choose a tag to compare

v0.3.0-alpha Pre-release
Pre-release

v0.1.1-alpha

19 Mar 17:55

Choose a tag to compare

v0.1.1-alpha Pre-release
Pre-release

Initial pre-release. Pattern matching, schema drift detection, statistical anomaly detection, and heuristic instruction classification for scanning tool outputs before they reach the model. See README for usage.