Guardrails for LLMs: detect and block hallucinated tool calls to improve safety and reliability.
-
Updated
Jul 18, 2025 - Go
Guardrails for LLMs: detect and block hallucinated tool calls to improve safety and reliability.
Human-in-the-loop execution for LLM agents
🛡️ Safe AI Agents through Action Classifier
Runtime detector for reward hacking and misalignment in LLM agents (89.7% F1 on 5,391 trajectories).
Safety-first agentic toolkit: 10 packages for collapse detection, governance, and reproducible runs.
An open-source engineering blueprint for defining and designing the core capabilities, boundaries, and ethics of any AI agent.
A2A version of Agent Action Guard: Safe AI Agents through Action Classifier
Energy based legality gating SDK for AI reasoning. Predicts, repairs, and audits collapse before it happens; reduces hallucinations and provides numeric audit logs.
PULSE: deterministic release gates for AI safety
Add a description, image, and links to the agent-safety topic page so that developers can more easily learn about it.
To associate your repository with the agent-safety topic, visit your repo's landing page and select "manage topics."