safehere v1.0.0-stable
Runtime tool-output scanning middleware for Cohere AI agents. Detects and blocks prompt injection attacks hiding in tool results before they reach the model.
Highlights
- 5 detection layers: pattern matching, schema drift, statistical anomaly, heuristic instruction classification, TF-IDF semantic classifier
- 1,028-sample evaluation corpus across 50+ attack categories
- Pre-trained model bundled --
pip install safehere[ml]works out of the box - Regex timeout protection prevents ReDoS denial-of-service
- 0.5% FPR on 405 benign samples, 97.6% TPR on 623 adversarial samples
Install
pip install safehere # core (4 rule-based scanners)
pip install safehere[ml] # + TF-IDF semantic scanner
pip install safehere[cohere] # + Cohere managed loop
pip install safehere[all] # everything