Releases · Expl0dingCat/safehere

21 Mar 02:37

v1.0.0-stable

ab4af78

v1.0.0-stable Latest

Latest

safehere v1.0.0-stable

Runtime tool-output scanning middleware for Cohere AI agents. Detects and blocks prompt injection attacks hiding in tool results before they reach the model.

Highlights

5 detection layers: pattern matching, schema drift, statistical anomaly, heuristic instruction classification, TF-IDF semantic classifier
1,028-sample evaluation corpus across 50+ attack categories
Pre-trained model bundled -- pip install safehere[ml] works out of the box
Regex timeout protection prevents ReDoS denial-of-service
0.5% FPR on 405 benign samples, 97.6% TPR on 623 adversarial samples

Install

pip install safehere              # core (4 rule-based scanners)
pip install safehere[ml]          # + TF-IDF semantic scanner
pip install safehere[cohere]      # + Cohere managed loop
pip install safehere[all]         # everything

Assets 2

21 Mar 00:30

Expl0dingCat

v1.0.0-beta.1

e8bd846

v1.0.0-beta.1 Pre-release

Pre-release

safehere v1.0.0-beta.1

Runtime tool-output scanning middleware for Cohere AI agents. Detects and blocks prompt injection attacks hiding in tool results before they reach the model.

Highlights

5 detection layers: pattern matching, schema drift, statistical anomaly, heuristic instruction classification, and TF-IDF semantic classifier
1,028-sample evaluation corpus across 50+ attack categories -- narrative injection, roleplay hijacking, fake compliance requests, translation-based attacks, persona splitting, encoding evasion, and more
Regex timeout protection (50ms per pattern) prevents ReDoS denial-of-service
Security hardened via red-team audit: recursive encoding decoder, RTL override reversal, Unicode tag character decoding, homoglyph expansion, schema recursion depth guard, anomaly cold-start hardening

Benchmarks

Metric	Result
Detection (623 adversarial)	97.6% TPR
False positives (405 benign)	0.5% FPR
Semantic classifier (held-out 20%)	0.96 F1
CyberArk-style live attacks	10/10 blocked
Latency (with semantic scanner)	~12ms P50

Install

pip install safehere              # core (4 rule-based scanners)
pip install safehere[ml]          # + TF-IDF semantic scanner
pip install safehere[cohere]      # + Cohere managed loop (run/arun)
pip install safehere[all]         # everything

Known limitations

Narrative/analogy attacks with zero injection vocabulary can evade all layers
Low signal density (<5% payload in long documents) evades density-based filtering
Payload splitting across multiple tool outputs is not detected
Metrics are self-evaluated, not independently audited
Semantic model must be trained locally (python -m safehere.scanners.semantic --train)

Breaking changes from v0.x

cohere is now an optional dependency (pip install safehere[cohere])
Python 3.8 is no longer supported (minimum 3.9)
Scoring weights redistributed to accommodate the semantic scanner
SemanticScanner added to the default pipeline (degrades gracefully if scikit-learn is not installed)

Assets 2

20 Mar 21:08

Expl0dingCat

v0.3.0-alpha

d17cb24

v0.3.0-alpha Pre-release

Pre-release

Full Changelog: v0.1.1-alpha...v0.3.0-alpha

Assets 2

19 Mar 17:55

Expl0dingCat

v0.1.1-alpha

bc94a72

v0.1.1-alpha Pre-release

Pre-release

Initial pre-release. Pattern matching, schema drift detection, statistical anomaly detection, and heuristic instruction classification for scanning tool outputs before they reach the model. See README for usage.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

safehere v1.0.0-stable

Highlights

Install

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

safehere v1.0.0-beta.1

Highlights

Benchmarks

Install

Known limitations

Breaking changes from v0.x

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Releases: Expl0dingCat/safehere

v1.0.0-stable

safehere v1.0.0-stable

Highlights

Install

Uh oh!

v1.0.0-beta.1

safehere v1.0.0-beta.1

Highlights

Benchmarks

Install

Known limitations

Breaking changes from v0.x

Uh oh!

v0.3.0-alpha

Uh oh!

v0.1.1-alpha

Uh oh!