ClampAI — OWASP LLM Top 10 (2025) Compliance Mapping

This document maps each risk in the OWASP Top 10 for Large Language Model Applications (2025) to the specific ClampAI theorems and components that mitigate it, together with the residual risk and recommended configuration.

The ClampAI eight theorems are: T1 (Budget Safety), T2 (Bounded Termination), T3 (Invariant Preservation), T4 (Monotone Resource Consumption), T5 (Atomic Execution), T6 (Tamper-Evident Trace), T7 (Exact Rollback), T8 (Emergency Halt).

LLM01: Prompt Injection

Risk: Malicious content in user input or retrieved documents tricks the LLM into taking unintended actions (e.g., exfiltrating data, bypassing authorization, executing arbitrary commands).

ClampAI mitigation:

Component	How it helps
T3 Invariants	Block any action whose projected next state would violate a predicate — regardless of what the LLM claimed as justification. The kernel has no language model; it cannot be persuaded by text.
AttestationGate (`clampai/hardening.py`)	Requires cryptographic attestation from designated attestors before high-risk actions execute. An injected instruction cannot produce a valid attestation.
ReferenceMonitor (`clampai/reference_monitor.py`)	IFC (Information Flow Control) labels prevent data exfiltration paths from being approved even when the LLM believes the action is safe.
T6 Trace	Every approved action is logged with a hash-chained fingerprint; forensic analysis can identify the injection point after the fact.

Recommended invariants:

from clampai import no_sensitive_substring_invariant, human_approval_gate_invariant

kernel = SafetyKernel(
    budget=100.0,
    invariants=[
        # Block actions containing credential patterns
        no_sensitive_substring_invariant("output_text", ["password", "api_key", "Bearer "]),
        # Require human approval before any network egress
        human_approval_gate_invariant("network_egress_approved"),
    ],
)

Residual risk: ClampAI cannot prevent the LLM from reasoning about injected content. It only blocks the resulting action if it violates a declared invariant. Invariants must cover all sensitive action classes.

LLM02: Sensitive Information Disclosure

Risk: The LLM inadvertently reveals PII, credentials, proprietary data, or system internals in its responses or tool outputs.

ClampAI mitigation:

Component	How it helps
IFC Labels (`SecurityLevel`, `DataLabel`)	Data items carry secrecy labels (PUBLIC, CONFIDENTIAL, SECRET, TOP_SECRET). The ReferenceMonitor enforces a no-read-up / no-write-down lattice; high-label data cannot flow to low-label outputs.
T3 Invariants	`no_sensitive_substring_invariant` blocks actions whose output fields contain forbidden patterns (SSNs, card numbers, key material).
SaliencyEngine	Identifies which state variables most influenced a proposed action, enabling post-hoc audits of potential disclosure paths.

Recommended configuration:

from clampai import no_sensitive_substring_invariant
from clampai.reference_monitor import SecurityLevel, DataLabel, LabelledData

# Label sensitive data at ingestion time
pii_record = LabelledData(
    data={"name": "Alice", "ssn": "123-45-6789"},
    label=DataLabel(level=SecurityLevel.CONFIDENTIAL),
)

# Invariant: output must not contain SSN-pattern text
kernel = SafetyKernel(
    budget=100.0,
    invariants=[
        no_sensitive_substring_invariant("agent_response", [r"\d{3}-\d{2}-\d{4}"]),
    ],
)

Residual risk: Pattern-based matching has false negatives (encoded, paraphrased, or translated content). Combine with output-layer classifiers for defense in depth.

LLM03: Supply Chain Vulnerabilities

Risk: Poisoned models, malicious plugins, or compromised training data introduce backdoors or unsafe behaviors.

ClampAI mitigation:

Component	How it helps
AttestationGate	Cryptographically verifies the integrity of each action before execution. An adversarially fine-tuned model that proposes unsafe actions still cannot execute them if invariants are violated.
T3 Invariants	The safety predicate layer is independent of the model weights. Even a fully compromised model cannot override a blocking invariant.
T6 Hash-chained Trace	Provides a tamper-evident audit log. Anomalous action sequences (e.g., sudden shifts toward sensitive operations) are detectable via the trace.

Residual risk: ClampAI cannot verify model provenance or detect weight-level backdoors. Supply-chain mitigations must include model signing (e.g., MLflow model registry with hash verification) at the ML-ops layer.

LLM04: Data and Model Poisoning

Risk: Training or fine-tuning data is manipulated to alter model behavior, inject biases, or create exploitable patterns at inference time.

ClampAI mitigation:

Component	How it helps
Bayesian beliefs (`BeliefState`)	Cross-checks the LLM's proposed action against its prior success rate. A poisoned model producing anomalous action sequences will trigger low-confidence flags. Part of the Reasoning Engine (Layer 1), not a formal theorem.
T3 Invariants	Regardless of how the model was trained, actions violating safety predicates are blocked at runtime.
CausalGraph	Models dependencies between actions. Actions inconsistent with the declared causal graph are rejected before reaching the kernel.

Residual risk: Subtle poisoning that produces plausible-looking actions within invariant bounds cannot be detected by ClampAI alone. Runtime monitoring (T6 trace anomaly detection) supplements but does not replace training-time integrity checks.

LLM05: Improper Output Handling

Risk: LLM outputs are passed unsanitized to downstream systems (SQL interpreters, OS shells, JavaScript contexts), enabling injection attacks.

ClampAI mitigation:

Component	How it helps
T3 Invariants	`no_regex_match_invariant` and `no_sensitive_substring_invariant` can reject actions whose output fields contain injection patterns (SQL keywords, shell metacharacters).
ActionSpec effects model	Actions declare typed effects on named state variables. The kernel enforces that only declared effects are applied — an LLM cannot inject arbitrary state mutations by embedding them in output text.
T5 Atomicity	State changes happen only through the effects model, never through raw output interpretation.

Recommended pattern:

from clampai import no_regex_match_invariant

kernel = SafetyKernel(
    budget=100.0,
    invariants=[
        # Block SQL injection patterns in generated queries
        no_regex_match_invariant("generated_sql", r"(DROP|DELETE|TRUNCATE)\s+TABLE"),
        # Block shell injection in generated commands
        no_regex_match_invariant("shell_cmd", r"[;&|`$]"),
    ],
)

Residual risk: Invariants must explicitly enumerate the downstream interpreters and their injection patterns. Unknown downstream consumers require additional sandboxing at the execution layer.

LLM06: Excessive Agency

Risk: LLM agents are granted more permissions, capabilities, or autonomy than necessary for the task, amplifying the blast radius of any error or compromise.

ClampAI mitigation — primary coverage:

Component	How it helps
T1 Budget Safety	Hard upper bound on total resource expenditure. No action sequence can spend more than the declared budget, regardless of how many steps the agent takes.
T2 Bounded Termination	The agent halts in ≤ ⌊budget / min_action_cost⌋ steps — a provable upper bound on autonomy.
T3 Invariants	Capability restrictions encoded as invariants: `no_delete_invariant`, `read_only_keys_invariant`, `allowed_values_invariant`, etc. The agent cannot exceed declared permissions.
T8 Emergency Halt	Emergency actions bypass cost checks and immediately terminate the session — a guaranteed exit ramp even when budget is exhausted.
`@clampai_safe` decorator	Minimum viable clampaint: wraps any function with budget + invariants in one decorator. Suitable for capability-bounded tool calls.

This is ClampAI's core strength. The framework was designed specifically to address LLM06 through provable bounds rather than heuristic filtering.

# Minimal LLM06 mitigation — wrap every agent tool
@clampai_safe(
    budget=50.0,           # T1: $50 total, then halts
    cost_per_call=5.0,     # T2: max 10 calls
    invariants=[
        no_delete_invariant("records"),         # T3: no deletion
        no_delete_invariant("config"),          # T3: config must not be removed
    ],
)
def agent_tool(input: str) -> str:
    ...

Residual risk: None within the declared invariant scope. Invariants that are too permissive (e.g., lambda s: True) provide no protection.

LLM07: System Prompt Leakage

Risk: The LLM's system prompt (containing instructions, credentials, or proprietary context) is extracted by adversarial users.

ClampAI mitigation:

Component	How it helps
T3 Invariants	Block any action that would write system prompt content to user-visible output fields.
IFC Labels	Label the system prompt as CONFIDENTIAL; IFC prevents it from flowing into PUBLIC-labelled response fields.

Note: ClampAI operates at the action/state level, not at the token/prompt level. Mitigating prompt leakage requires invariants that explicitly model the system prompt as a protected state variable.

Residual risk: If the system prompt is not modelled as a state variable (which is the common case for most deployments), ClampAI provides no protection. This risk requires prompt-level defenses (separate system/user context isolation, output classifiers).

LLM08: Vector and Embedding Weaknesses

Risk: Vector store poisoning, adversarial embeddings, or retrieval manipulation cause the LLM to retrieve and act on malicious content.

ClampAI mitigation:

Component	How it helps
T3 Invariants	Retrieved content passes through invariant checks before being acted upon. A retrieved malicious instruction that produces a dangerous proposed action is blocked at the kernel.
Bayesian beliefs (`BeliefState`)	Bayesian priors on action success rates flag anomalous action proposals that may result from retrieval poisoning. Part of the Reasoning Engine, not a formal theorem.

Residual risk: ClampAI does not inspect vector store contents or embedding distances. Poisoned retrievals that produce plausible-looking actions within invariant bounds pass through. Mitigate at the retrieval layer with content hashing, source attribution, and anomaly scoring.

LLM09: Misinformation

Risk: The LLM generates confidently stated but factually incorrect content that is subsequently acted upon.

ClampAI mitigation:

Component	How it helps
Bayesian beliefs (`BeliefState`)	Maintains calibrated uncertainty estimates per action. Actions proposed with low confidence are flagged before execution. Part of the Reasoning Engine, not a formal theorem.
T3 Invariants	Domain-specific invariants can enforce factual clampaints: `value_range_invariant` ensures numeric outputs (e.g., dosages, financial figures) remain in plausible ranges.
Human approval gates	`human_approval_gate_invariant` requires explicit human sign-off before high-stakes actions derived from LLM-generated facts are executed.

Residual risk: ClampAI cannot verify factual accuracy as a general capability. Mitigations are limited to structural clampaints on outputs. Factual verification requires retrieval-augmented generation with cited sources and human review workflows.

LLM10: Unbounded Consumption

Risk: Denial-of-service through resource exhaustion: the LLM is induced to generate excessively long outputs, make unlimited API calls, or consume disproportionate compute.

ClampAI mitigation — strong coverage:

Component	How it helps
T1 Budget Safety	Hard cap on cumulative cost. No sequence of actions can exhaust more than the declared budget, regardless of adversarial intent.
T2 Bounded Termination	Provable maximum step count = ⌊budget / min_action_cost⌋. The agent cannot loop indefinitely.
`rate_limit_invariant`	Explicit per-window call rate limits, complementing the T1/T2 resource bounds.
`resource_ceiling_invariant`	Hard ceiling on named resource metrics (tokens generated, API calls, memory usage).
T8 Emergency Halt	If a runaway agent exceeds its step budget, emergency actions guarantee a clean exit.

from clampai import rate_limit_invariant, resource_ceiling_invariant

kernel = SafetyKernel(
    budget=10.0,           # T1: hard cost cap
    min_action_cost=0.1,   # T2: max 100 steps (10.0/0.1)
    invariants=[
        rate_limit_invariant("api_calls", max_count=20),
        resource_ceiling_invariant("tokens_generated", ceiling=100_000),
    ],
)

Residual risk: Budget is denominated in abstract "cost units." Token-level rate limiting requires mapping token counts to cost units in the ActionSpec effects, which is the integrator's responsibility.

Coverage Summary

OWASP Risk	ClampAI Coverage	Strength
LLM01 Prompt Injection	T3, AttestationGate, IFC, T6	Partial (invariants must be declared)
LLM02 Sensitive Disclosure	IFC, T3, SaliencyEngine	Partial (pattern-based)
LLM03 Supply Chain	AttestationGate, T3, T6	Partial (runtime only)
LLM04 Data Poisoning	Bayesian beliefs, T3, CausalGraph	Partial (runtime anomaly detection)
LLM05 Output Handling	T3, T5, effects model	Strong (typed effects prevent raw injection)
LLM06 Excessive Agency	T1, T2, T3, T8	Strong — primary design goal
LLM07 System Prompt Leakage	T3, IFC	Weak (requires explicit modelling)
LLM08 Vector Weaknesses	T3, Bayesian beliefs	Partial (upstream retrieval not covered)
LLM09 Misinformation	Bayesian beliefs, T3, human gates	Partial (structural clampaints only)
LLM10 Unbounded Consumption	T1, T2, rate_limit, ceiling	Strong — full provable bounds

ClampAI is strongest on LLM06 (Excessive Agency) and LLM10 (Unbounded Consumption) — the two risks most amenable to formal, provable bounds.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ClampAI — OWASP LLM Top 10 (2025) Compliance Mapping

LLM01: Prompt Injection

LLM02: Sensitive Information Disclosure

LLM03: Supply Chain Vulnerabilities

LLM04: Data and Model Poisoning

LLM05: Improper Output Handling

LLM06: Excessive Agency

LLM07: System Prompt Leakage

LLM08: Vector and Embedding Weaknesses

LLM09: Misinformation

LLM10: Unbounded Consumption

Coverage Summary

See Also

FilesExpand file tree

OWASP_MAPPING.md

Latest commit

History

OWASP_MAPPING.md

File metadata and controls

ClampAI — OWASP LLM Top 10 (2025) Compliance Mapping

LLM01: Prompt Injection

LLM02: Sensitive Information Disclosure

LLM03: Supply Chain Vulnerabilities

LLM04: Data and Model Poisoning

LLM05: Improper Output Handling

LLM06: Excessive Agency

LLM07: System Prompt Leakage

LLM08: Vector and Embedding Weaknesses

LLM09: Misinformation

LLM10: Unbounded Consumption

Coverage Summary

See Also