TrustCost Bench

A quality inspection pipeline for AI outputs. Makes AI check itself before responding — catching lazy answers, missing sources, and filler before they reach the user.

TL;DR — What is this and what does it do?

Every AI gives you answers. But how do you know if you should trust them?

TrustCost Bench installs a pre-flight checklist into your AI. Before every response, the AI automatically audits itself:

You ask: "What new products did Luckin launch this week?"

WITHOUT TrustCost:
  "Luckin Coffee continues to maintain strong innovation momentum..."
  → Filler. Zero information. Sounds confident.

WITH TrustCost (internal pre-flight):
  [Draft] "My knowledge cutoff is..."
  [Gate 0 ❌] Wait — user asked for products, I'm talking about my limitations. Goal mismatch.
  [Gate 1 ❌] I have web search. Why am I not using it?
  → Redo.
  [Searching...]
  [Final] "This week Luckin launched 3 items: Longjing Tea Latte, Bitter Melon Veggie Tea...
           (Source: Sina Finance, 2026-03-19)"

Three skills, install and go:

Skill	What it does	How the user uses it
`/trustcost-preflight`	AI self-checks before every response	Install once, forget about it — AI gets better automatically
`/trustcost-eval`	Score any AI output (0-100)	Paste any AI response → get claim-by-claim breakdown
`/trustcost-rewrite`	Rewrite AI output to be more verifiable	Paste a bad response → get optimized version with sources

What it is NOT:

❌ Not making AI "smarter" — doesn't increase knowledge or reasoning
❌ Not checking accuracy — doesn't tell you if the answer is correct
✅ Making AI outputs verifiable — so you can quickly decide whether to trust them
✅ Making AI not lazy — use your tools, cite your sources, admit what you don't know

The bigger picture:

Now:    Users install skills → their AI outputs improve (individual effect)
Next:   Evaluation dataset → model leaderboard (which model is most "trustworthy"?)
Goal:   Model vendors adopt TCI as training signal → all AI ships with verifiability built-in

Like code linters: today you install them yourself. Tomorrow they're built into the compiler.

The Theory Behind It

Verification cost is the largest hidden tax on AI adoption. It can't be eliminated — but it can be compressed from O(N×M) to O(K).

The Hidden Tax

Every time a human receives AI output, they face a silent question: "Do I trust this?"

The time spent answering that question — searching for sources, cross-checking facts, re-doing the work "just to be sure" — is verification cost. It's invisible in every AI benchmark, yet it dominates real-world adoption decisions.

Current AI evaluation asks: "Is the answer correct?" (accuracy)

We ask a different question:

"How long does it take a human to decide whether to trust the answer?"

These are fundamentally different. An answer can be correct but unverifiable. An answer can be wrong but look trustworthy. No existing benchmark measures this.

Three Laws of Verification Cost

Law 1: Conservation — Verification cost cannot be eliminated, only transferred

Want zero verification cost → must fully trust AI → but full trust = no verification → impossible
Want perfect trust metrics  → need massive human labeling → labeling IS verification cost → contradiction

Trust is not free. Like energy, verification cost is conserved — it can be moved between parties and stages, but never destroyed.

Law 2: The Seller's Paradox — AI cannot verify itself

"You cannot trust a trustworthiness score generated by the system you're trying to trust."

If an AI says "I'm 90% confident," that claim itself needs verification — creating infinite regress. This is why confidence scores, self-evaluation, and AI-generated trust labels are fundamentally broken.

Law 3: Compression — Verification cost can be amortized across users

This is where the opportunity lives:

Status quo:
  Every user × every output × verifies independently
  Total cost = N users × M outputs × T time per verification

TrustCost approach:
  K annotators × verify once → structural rules → all users benefit
  Total cost = K × once + N users × M outputs × t (drastically reduced)

The same logic as drug approval: FDA spends enormous cost on clinical trials (centralized verification) so that ordinary people don't have to run their own experiments (distributed verification cost → near zero).

TrustCost is a compression algorithm for trust.

TrustCost Index v2 (TCI)

A composite score from 0-100. Higher = lower verification cost = better.

Gate Metrics (pass/fail — must pass or TCI = 0)

Discovered through a real failure: an AI with web search scored 72.9 on TCI v1 by eloquently saying "I don't know." v2 adds gates to catch this.

Gate	What it catches	Example failure
Goal Alignment	AI answers a different question than asked	User asks for data; AI explains methodology instead
Capability Utilization	AI has tools but doesn't use them	AI says "my knowledge cutoff..." when it has web search

If either gate fails, TCI = 0. A beautifully structured non-answer is still a non-answer.

Score Metrics (quantitative, when gates pass)

Metric	What it measures	How
Anchor Density	% of claims that are independently verifiable	Verifiable facts / total claims
Source Traceability	How many clicks to verify a claim?	0=direct link, 1=searchable, 2=needs expertise, 3=unverifiable
Uncertainty Honesty	Does AI hedge when it should?	% of unsourced claims properly marked as uncertain
Signal Ratio	How much content directly answers the question?	On-topic tokens / total tokens

if gates_failed:
    TCI = 0
else:
    TCI = 25 × anchor_density
        + 25 × (1 - avg_verify_cost / 3)
        + 25 × uncertainty_honesty
        + 25 × signal_ratio

Important caveat: TCI measures whether verification paths exist, not whether the content at the end of those paths is correct. A model could fabricate a citation with perfect structure and score high. This is a known limitation — TCI is the first compression layer, not the complete solution. See Three-Layer Architecture for the full design.

Quick Example

Same question: "Long-term effects of coffee on cardiovascular health?"

Output A (typical AI):

"Moderate coffee consumption is generally beneficial for cardiovascular health. Studies show that 3-4 cups per day can reduce cardiovascular disease risk..."

Output B (trust-optimized):

"A meta-analysis of 1.2M subjects (Ding et al., 2014, Circulation) found 3-5 cups/day associated with 15% lower cardiovascular mortality. Note: observational studies only — cannot prove causation. The 400mg/day upper limit comes from EFSA 2015 assessment."

	Output A	Output B
Sources provided	0	3 (with author, year, journal)
Uncertainty acknowledged	No	Yes
Filler content	~30%	~5%
TCI Score	38.5	90.4

Both may be equally accurate. But Output B is far more verifiable.

Cross-Domain Results

Domain                Typical TCI  Optimized TCI  Improvement
--------------------  ------------ -------------- -----------
Competitive Intel          29.6         60.9       +31.3
Finance                    16.3         83.7       +67.4
Medical                    38.5         90.4       +51.9
Coding                     43.3         97.2       +53.9
--------------------  ------------ -------------- -----------
AVERAGE                    31.9         83.0       +51.1

Key finding: coding achieves the highest optimized TCI because runnable code is the ultimate verification — verification cost is literally zero when you can execute and see the result yourself.

Three-Layer Architecture

TCI alone is not enough. The full system has three layers:

Layer	Who pays the cost	How often	Purpose
L1: Structural Metrics (TCI)	Framework designers (us)	Once	Free, instant, catches obvious problems
L2: Human Verification Data	Annotators (centralized)	Ongoing but concentrated	Trains better metrics, detects gaming
L3: End-User Verification	Every user	Every time, but compressed	Residual cost, reduced from minutes to seconds

L1 is open-sourced — everyone can use it immediately
L2 is the moat — real human behavior data on "what did the user do after seeing AI output?"
L3 never reaches zero — but L1 and L2 compress it dramatically

This mirrors how the internet evolved: early PageRank (structural signal, gameable) → click behavior data (human signal, hard to fake) → modern search ranking (hybrid).

Roadmap

Phase 1 (now): Core metrics definition + cross-domain comparison examples
Phase 2: Evaluation dataset — AI outputs annotated with actual human verification time
Phase 3: Leaderboard — benchmark major models on TCI
Phase 4: Training signal — TCI as RLHF reward, so models learn to produce verifiable outputs

Quick Start: Skills (for Claude Code users)

Two ready-to-use skills:

Command	What it does
`/trustcost-preflight`	Pre-flight check — embed in system prompt so AI self-audits every response
`/trustcost-eval`	Evaluate any AI output — claim-by-claim TCI breakdown
`/trustcost-rewrite`	Rewrite any AI output to minimize verification cost

# Install
mkdir -p .claude/skills
cp skill/trustcost-preflight.md .claude/skills/  # Auto self-check on every response
cp skill/trustcost-eval.md .claude/skills/        # On-demand evaluation
cp skill/trustcost-rewrite.md .claude/skills/     # On-demand rewrite

Then in Claude Code: /trustcost-eval [paste any AI output]

These skill prompts also work as system prompts for any LLM (GPT, Gemini, etc.) — the evaluation logic is model-agnostic.

Python API

from metrics import TrustCostScore, AnchorPoint, VerifyCost

score = TrustCostScore(
    total_claims=4,
    verifiable_claims=3,
    anchor_points=[
        AnchorPoint("claim text", source_provided=True, verify_cost=VerifyCost.LOW),
    ],
    unsourced_claims=1,
    hedged_unsourced_claims=1,
    total_tokens=100,
    on_topic_tokens=85,
)

print(score.trust_cost_index)  # 0-100, higher is better
print(score.summary())

Why This Matters

Every AI lab is racing to make models smarter. Almost nobody is working on making outputs more verifiable.

But in enterprise adoption, trust is the bottleneck — not capability. A slightly less accurate answer that a human can verify in 5 seconds is more valuable than a perfect answer that takes 10 minutes to validate.

Verification cost is the hidden tax on every AI interaction. TrustCost doesn't eliminate it — that's impossible. But it compresses it, the same way JPEG compresses images: lossy, imperfect, but transformatively useful.

Contributing

This project is in early stage. Contributions welcome:

Real-world AI output examples for the evaluation dataset
Domain-specific verification cost patterns (medical, legal, financial, etc.)
Human verification time studies
Integration with existing eval frameworks

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
metrics		metrics
skill		skill
.gitignore		.gitignore
README.md		README.md
README_CN.md		README_CN.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TrustCost Bench

TL;DR — What is this and what does it do?

Three skills, install and go:

What it is NOT:

The bigger picture:

The Theory Behind It

The Hidden Tax

Three Laws of Verification Cost

Law 1: Conservation — Verification cost cannot be eliminated, only transferred

Law 2: The Seller's Paradox — AI cannot verify itself

Law 3: Compression — Verification cost can be amortized across users

TrustCost Index v2 (TCI)

Gate Metrics (pass/fail — must pass or TCI = 0)

Score Metrics (quantitative, when gates pass)

Quick Example

Cross-Domain Results

Three-Layer Architecture

Roadmap

Quick Start: Skills (for Claude Code users)

Python API

Why This Matters

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TrustCost Bench

TL;DR — What is this and what does it do?

Three skills, install and go:

What it is NOT:

The bigger picture:

The Theory Behind It

The Hidden Tax

Three Laws of Verification Cost

Law 1: Conservation — Verification cost cannot be eliminated, only transferred

Law 2: The Seller's Paradox — AI cannot verify itself

Law 3: Compression — Verification cost can be amortized across users

TrustCost Index v2 (TCI)

Gate Metrics (pass/fail — must pass or TCI = 0)

Score Metrics (quantitative, when gates pass)

Quick Example

Cross-Domain Results

Three-Layer Architecture

Roadmap

Quick Start: Skills (for Claude Code users)

Python API

Why This Matters

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages