Citation Verifier

Verify legal citations against CourtListener. Catches hallucinated case citations from AI tools by checking whether a citation actually exists and belongs to the case it claims to.

Try It

Verify and Retrieve -- paste citations, verify them against CourtListener, and download the opinion text or PDFs. No installation needed.

How It Works

Input: "Smith v. Jones, 2018 WL 301424 (S.D.N.Y. Mar. 5, 2018)"
  |
  +-- Step 1: Citation Lookup API (fast, precise)
  |     Found + name matches? -> VERIFIED + CourtListener link
  |     Found but different case? -> NOT FOUND ("citation belongs to ...")
  |
  +-- Step 1b: Adjacent Page Fallback
  |     Try pages +/-1 and +/-2 for off-by-one starting pages
  |
  +-- Step 2: Opinion Search (fuzzy fallback)
  |     Search by case name + court + date range
  |     Retries without court filter if no results
  |     Score results by name/court/date/citation similarity
  |
  +-- Step 3: RECAP Search (docket/PACER fallback)
        Search by docket number first (if available),
        then by case name (with/without court filter).
        Query docket-entries API for documents near the cited date

Statuses

Status	Meaning
`VERIFIED`	Citation lookup found the exact case
`LIKELY_REAL`	Strong fuzzy match (>= 85% confidence)
`POSSIBLE_MATCH`	Partial match found (>= 40% confidence)
`NOT_FOUND`	No match found in any search step

Diagnostics

When a citation isn't fully verified, the tool explains why:

"Citation exists but belongs to a different case" -- the reporter citation is real but for a different case (common AI hallucination pattern)
"Name mismatch" / "Name differs" -- case name similarity issues
"Court mismatch" -- cited court doesn't match the found case
"Date mismatch" / "Date close" -- year discrepancies
"Reporter citation could not be confirmed" -- CourtListener doesn't have the citation on file
"Found in RECAP" -- case found in PACER docket data, not the opinions database
"We found a possible docket match" -- docket found but no specific document verified

Installation

pip install -e .          # core library + CLI
pip install -e ".[web]"   # adds web app (FastAPI, uvicorn)

Configuration

Create a .env file in the project root:

COURTLISTENER_API_TOKEN=your_token_here

Get a free API token at https://www.courtlistener.com/ (Profile > API keys).

The token is required for the Citation Lookup API (Step 1). The Search API (Steps 2-3) works without a token but is rate-limited.

Usage

Web App

The quickest way to use the tool — no installation required for end users.

pip install -e ".[web]"
python web/app.py
# Open http://localhost:8000

The app has three pages:

Retrieve (/) -- Verify citations and download the matched opinion text or PDFs from CourtListener. Quick search + deep search workflow.
QC (/qc) -- Review verification batches and assign QC status (internal use).
Debug (/debug) -- Detailed verification with confidence scores, diagnostics, CSV export, and FLP flagging.

Results stream via SSE as each citation completes. Batches capped at 50 citations.

Public mode: Set MODE=public to expose only the Retrieve page (used for the hosted Replit deployment). Debug and QC routes return 404.

Command Line

# Single citation
python -m citation_verifier "Obergefell v. Hodges, 576 U.S. 644 (2015)"

# Multiple citations
python -m citation_verifier "Case One, 576 U.S. 644 (2015)" "Case Two, 999 F.3d 1 (2021)"

# From a file (one citation per line)
python -m citation_verifier --file citations.txt

# JSON output
python -m citation_verifier --json "Obergefell v. Hodges, 576 U.S. 644 (2015)"

Exit code is 1 if any citation is NOT_FOUND (useful for scripting/CI).

Python API

from citation_verifier import CitationVerifier

verifier = CitationVerifier()
result = verifier.verify("Obergefell v. Hodges, 576 U.S. 644 (2015)")

print(result.status)          # VerificationStatus.VERIFIED
print(result.matched_url)     # https://www.courtlistener.com/opinion/2812209/...
print(result.diagnostics)     # [] (List[Diagnostic], each with .category and .message)

Pre-parsed citations (batch pipelines)

When processing PDFs with eyecite, you can pass pre-parsed citations directly to avoid the lossy string round-trip that drops court, month, and day metadata:

from eyecite import get_citations
from eyecite.models import FullCaseCitation
from citation_verifier import CitationVerifier
from citation_verifier.parser import parsed_citation_from_eyecite

text = "Obergefell v. Hodges, 576 U.S. 644 (2015)"
cite = next(c for c in get_citations(text) if isinstance(c, FullCaseCitation))
parsed = parsed_citation_from_eyecite(cite, raw_text=text)

verifier = CitationVerifier()
result = verifier.verify(text, parsed=parsed)

Supported Citation Formats

Standard reporters: 576 U.S. 644, 999 F.3d 1, 584 S.W.2d 716
WestLaw: 2018 WL 301424
California style: (2022) 76 Cal.App.5th 685
Docket numbers: Case No. 24-cv-9429, No. 17-cv-12676, shorthand C15-1228-JCC
Federal parentheticals: (S.D.N.Y. 2018), (M.D. Ala. July 6, 2018)
Reversed date/court: (Feb. 5, 2026 SDNY)
Complex party names: Macy's Texas, Inc. v. D.A. Adams & Co.
Abbreviations auto-expanded: Cnty. → County, Dep't → Department, Corp. → Corporation, etc.

Testing

# Unit tests (mocked, no API calls)
pytest tests/test_verifier.py -v

# False negative regression (hits real API, needs token)
pytest tests/test_false_negatives.py -v

# Parser diagnostics (eyecite vs our parser comparison)
pytest tests/test_parser_diagnostics.py -v

# CourtListener API limitation workarounds
pytest tests/test_cl_api_issues.py -v

test_verifier.py has 103 unit tests covering the full pipeline: citation lookup, name matching, adjacent page fallback, opinion search, RECAP search, court corroboration, scoring and weight redistribution, docket number normalization, abbreviation expansion, surname matching, the eyecite factory function, and the pre-parsed citation path. All API calls are mocked. test_async_verifier.py has 30 tests verifying sync/async behavior parity.

test_false_negatives.py runs against the real CourtListener API using the corpus in tests/data/known_real_citations.json (5 cases). tests/data/known_fake_citations.json contains 8 confirmed hallucinations for reference.

Project Structure

src/citation_verifier/
  models.py          -- Data structures (statuses, parsed citations, results)
  court_map.py       -- Court abbreviation -> CourtListener ID mapping (federal courts)
  state_reporter_map.py -- Regional reporter -> state court mapping
  name_matcher.py    -- Multi-factor case name similarity scoring
  text_cleaner.py    -- Contamination phrase removal from extracted names
  parser.py          -- Citation parsing (eyecite + regex fallbacks + eyecite factory)
  client.py          -- CourtListener API wrapper with rate limiting
  cache.py           -- File-based verification result cache
  verifier.py        -- Core three-step verification pipeline
  __main__.py        -- CLI interface

web/
  app.py             -- FastAPI application (SSE streaming, public mode)
  static/get.html    -- Retrieve page (homepage, vanilla HTML/CSS/JS)
  static/index.html  -- Debug page (detailed verification)
  static/qc.html     -- QC review page

tests/
  test_verifier.py             -- Unit tests (mocked API)
  test_false_negatives.py      -- Regression tests (live API)
  test_parser_diagnostics.py   -- Parser comparison diagnostics
  test_cl_api_issues.py        -- CL API limitation tests
  extract_citations_batch.py   -- Batch PDF citation extraction
  verify_sample_citations.py   -- Sample and verify extracted citations
  data/                        -- Test fixtures and results

scratch/                       -- Working notes and utility scripts (not part of the tool)

Acknowledgments

This project is built on:

CourtListener from the Free Law Project -- the legal research platform and API we verify citations against
eyecite -- the citation extraction library that powers our parser
CaseStrainer by Jonathan Franklin -- our case name matching algorithm (name_matcher.py), contamination phrase removal (text_cleaner.py), and state reporter mapping (state_reporter_map.py) are adapted from CaseStrainer's approach to detecting hallucinated legal citations

Scoring

Fuzzy match confidence is a weighted score:

Component	Weight	Notes
Case name	50%	SequenceMatcher similarity; compares defendants for "State v." style cases
Court	20%	Exact match on CourtListener court ID
Date	20%	Full credit for exact date; granular scoring for same month, same year, +/- 1 year
Docket no.	5%	Normalized comparison (strips division prefix, judge suffix, expands shorthand)
Citation	5%	Reporter citation or WL number found in CourtListener record

RECAP docket-only matches (no specific document found) receive a 40% score discount.

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
.claude		.claude
briefs		briefs
docs		docs
scratch		scratch
scripts		scripts
src/citation_verifier		src/citation_verifier
tests		tests
web		web
.gitignore		.gitignore
.replit		.replit
CLAUDE.md		CLAUDE.md
Procfile		Procfile
README.md		README.md
pyproject.toml		pyproject.toml
replit.nix		replit.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Citation Verifier

Try It

How It Works

Statuses

Diagnostics

Installation

Configuration

Usage

Web App

Command Line

Python API

Pre-parsed citations (batch pipelines)

Supported Citation Formats

Testing

Project Structure

Acknowledgments

Scoring

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Citation Verifier

Try It

How It Works

Statuses

Diagnostics

Installation

Configuration

Usage

Web App

Command Line

Python API

Pre-parsed citations (batch pipelines)

Supported Citation Formats

Testing

Project Structure

Acknowledgments

Scoring

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages