This is an open-source version of the representation engineering framework for stopping harmful outputs or hallucinations on the level of activations. 100% free, self-hosted and open-source.
-
Updated
Feb 11, 2026 - Python
This is an open-source version of the representation engineering framework for stopping harmful outputs or hallucinations on the level of activations. 100% free, self-hosted and open-source.
[ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"
Agentic Safety Framework
an exploration of issues of international social development policy and its operationalization
Safe and Fearless lossy compression using safeguards
Add a description, image, and links to the safeguards topic page so that developers can more easily learn about it.
To associate your repository with the safeguards topic, visit your repo's landing page and select "manage topics."