Vectorless, reasoning‑based RAG for landmark U.S. Supreme Court cases
Quick Search · Deep Search · Setup · References
A Next.js 14 application for searching 13 landmark U.S. Supreme Court cases with two parallel retrieval systems:
| Mode | How it works |
|---|---|
| Quick Search | Pinecone + Voyage AI legal embeddings → fast semantic retrieval |
| Deep Search | PageIndex‑inspired tree index + NVIDIA Llama 3.1 70B → structured legal reasoning |
Important
This repository does not use the hosted PageIndex product directly. Instead it implements the core PageIndex idea locally: convert each long legal PDF into a structured, table-of-contents-like tree, then let the model reason over that structure instead of relying only on vector similarity.
- What This Project Does
- Why PageIndex‑Style Retrieval Beats Vector Embeddings for Legal Docs
- Architecture
- How the Tree Is Built
- Retrieval Pipelines
- Legal Cases Corpus
- Technology Stack
- Screenshots
- Setup
- Running the App
- API Endpoints
- Project Structure
- When To Use Each Mode
- Implementation Notes
- References
- License
The repo contains two distinct ways to search the same legal corpus:
-
Quick mode uses a classic vector RAG stack. PDFs from
docs/are chunked with LangChain, embedded with Voyage AI'svoyage-law-2model, and stored in Pinecone. Queries are embedded and retrieved with Max Marginal Relevance (k=20). -
Deep mode uses a tree index instead of Pinecone. Python scripts extract PDF text with PyMuPDF, split long documents into page groups, ask NVIDIA's model to identify logical sections, merge those sections into a hierarchical JSON tree, and store the result under
docs/trees/. At query time, deep search reads those tree files and asks NVIDIA to reason over them.
That second path — tree‑based, reasoning‑driven retrieval — is the main idea behind this project.
For long legal opinions, semantic similarity is often not enough. Legal answers are buried inside structured sections — syllabus, procedural history, majority opinion, concurrence, dissent, doctrinal analysis — and vector search is only good at finding text that sounds similar to the question. Legal reasoning needs more.
PageIndex identifies five core limitations of conventional vector‑based RAG that are especially painful for legal documents:
| # | Limitation | What goes wrong |
|---|---|---|
| 1 | Query–Knowledge space mismatch | Vector retrieval assumes the most semantically similar text is the most relevant — but queries express intent, not content |
| 2 | Semantic similarity ≠ true relevance | In legal text, many passages share near‑identical semantics but differ critically in relevance |
| 3 | Hard chunking breaks semantic integrity | Fixed‑size chunks (e.g. 1000 chars) cut through sentences, paragraphs, and opinion sections |
| 4 | Cannot integrate chat history | Each query is treated independently — the retriever has no memory of prior context |
| 5 | Poor handling of in‑document references | References like "see Appendix G" don't share semantic similarity with the referenced content |
| Aspect | Vector embeddings + Pinecone | PageIndex‑style tree retrieval |
|---|---|---|
| Retrieval signal | Semantic similarity | Structural reasoning over sections and pages |
| Preprocessing shape | Fixed‑size chunks | Logical sections / table‑of‑contents nodes |
| Explainability | Approximate: "this chunk scored highly" | Clear: "this section/page was selected" |
| Cross‑references | Often weak unless wording matches | Better when the model can follow document structure |
| Legal opinions | Can split holdings across chunk boundaries | Keeps arguments closer to opinion structure |
| Multi‑step reasoning | Needs extra re‑ranking or orchestration | Native fit for iterative reasoning |
| Infrastructure | Embeddings + vector DB + query embeddings | Tree generation + model reasoning |
| Best use case | Fast broad semantic lookup | Deep legal analysis over long, structured documents |
Landmark court opinions are not flat text. They are long, sectioned arguments where the answer depends on:
- Where in the opinion the issue is discussed
- How one section references another
- Whether the relevant language appears in the majority, concurrence, or dissent
- How much surrounding context is needed to interpret the rule correctly
A PageIndex‑style approach is stronger than embeddings here. It helps the model navigate the opinion more like a legal researcher and less like a nearest‑neighbor lookup engine.
Browser
│
├── / ─────────────────────────────────→ Landing page
│ │
│ ├── POST /api/bootstrap
│ │ │
│ │ └── POST /api/ingest
│ │ → load PDFs from docs/
│ │ → split into 1000-char chunks
│ │ → embed with Voyage AI
│ │ → upsert into Pinecone
│ │
│ └── route to /search?q=...&mode=...
│
└── /search ───────────────────────────→ Search page
│
├── quick mode → POST /api/search
│ → query embedding
│ → Pinecone MMR retrieval
│
└── deep mode → POST /api/deep-search
→ spawn scripts/deep_search.py
→ load docs/trees/*.json
→ send structured context to NVIDIA
→ stream answer back via SSE
Tree generation lives in scripts/generate_trees.py.
For each document listed in docs/db.json, the script:
- Extracts page text from the PDF with PyMuPDF
- Splits long documents into page groups of ~8 000 characters
- Uses 1 overlapping page between neighboring groups for cross‑boundary context
- Sends each page group to NVIDIA's chat completions API (Llama 3.1 70B)
- Falls back to OpenAI (
gpt-4o-mini) ifOPENAI_API_KEYis set and NVIDIA fails - Asks the model to identify logical sections — syllabus, facts, majority opinion, concurrence, dissent, etc.
- Merges and deduplicates chunk outputs by page index
- Writes one
*_tree.jsonfile per case underdocs/trees/
Key design decision: chunking is used only to make tree generation feasible under model token limits. The output is not a vector index — it is a logical section tree.
Each case gets a JSON structure shaped like a compact legal table of contents:
{
"doc_id": "nvidia-mapp_vs_ohio",
"filename": "mapp_vs_ohio.pdf",
"title": "Mapp v. Ohio",
"tree": [
{
"title": "Mapp v. Ohio",
"node_id": "0000",
"page_index": 1,
"text": "<root synopsis>",
"nodes": [
{
"title": "Majority Opinion",
"node_id": "0001",
"page_index": 1,
"text": "<section text>"
}
]
}
]
}- The root node acts like a case‑level overview
- Child nodes act like section entries in a table of contents
- Each node keeps page information and raw section text
Quick search starts from POST /api/search.
- Validate
PINECONE_API_KEY,PINECONE_INDEX, andVOYAGE_API_KEY - Embed the user query with Voyage AI using
voyage-law-2 - Open the Pinecone index through LangChain's
PineconeStore - Run
maxMarginalRelevanceSearch(query, { k: 20 }) - Deduplicate results by document id
- Return normalized result objects for the UI
This is the fast path — useful for quick semantic lookup across the corpus.
Deep search starts from POST /api/deep-search.
- Confirm that
docs/trees/contains generated tree files - Spawn
scripts/deep_search.py - Load the local tree JSON files
- Flatten root and child nodes into a bounded legal context window (60 000 chars max)
- Send that structured context plus the user query to NVIDIA Llama 3.1 70B
- Stream the answer to the browser as Server‑Sent Events
Important: Deep mode does not query Pinecone, does not use embeddings at runtime, and depends entirely on the prebuilt tree index.
The corpus contains 13 landmark Supreme Court cases spanning 1803–2008:
| Case | Year | Constitutional Topic |
|---|---|---|
| Marbury v. Madison | 1803 | Judicial Review |
| Gibbons v. Ogden | 1824 | Interstate Commerce |
| Mapp v. Ohio | 1961 | Fourth Amendment — Search & Seizure |
| Baker v. Carr | 1962 | Equal Protection — Redistricting |
| Gideon v. Wainwright | 1963 | Right to Counsel |
| Miranda v. Arizona | 1966 | Right Against Self‑Incrimination |
| Tinker v. Des Moines | 1969 | Student Free Speech |
| NY Times v. United States | 1971 | Freedom of the Press |
| United States v. Nixon | 1974 | Executive Privilege |
| United States v. Lopez | 1995 | Commerce Clause Limits |
| Bush v. Gore | 2000 | Election Law — Equal Protection |
| Roper v. Simmons | 2005 | Eighth Amendment — Juvenile Death Penalty |
| District of Columbia v. Heller | 2008 | Second Amendment — Individual Right |
| Layer | Technology | Purpose |
|---|---|---|
| Framework | Next.js 14 (App Router) | UI and API routes |
| UI | React 18, Tailwind CSS 3 | Search interface and streaming results |
| Vector embeddings | Voyage AI voyage-law-2 |
Quick‑mode legal embeddings (1024‑dim) |
| Vector DB | Pinecone (serverless) | Quick‑mode retrieval store |
| PDF parsing | LangChain PDFLoader, PyMuPDF |
Ingestion and tree generation |
| Deep reasoning | NVIDIA Llama 3.1 70B | Tree generation and deep search |
| Fallback LLM | OpenAI gpt-4o-mini |
Tree‑generation fallback only |
| Icons | Lucide React, Heroicons | UI iconography |
Landing page — search mode selection
Quick search results with document cards
- Node.js 18+
- npm
- Python 3.x
- Pinecone API key — for quick search
- Voyage AI API key — for quick search
- NVIDIA API key — for tree generation and deep search
- OpenAI API key (optional) — tree‑generation fallback
# JavaScript
npm install --legacy-peer-deps
# Python
pip install -r requirements.txtCreate a root .env file (see .env.example):
PINECONE_API_KEY=""
PINECONE_INDEX=""
VOYAGE_API_KEY=""
NVIDIA_API_KEY=""
OPENAI_API_KEY=""
NVIDIA_API_URL="https://integrate.api.nvidia.com/v1/chat/completions"
OPENAI_API_URL="https://api.openai.com/v1/chat/completions"
NVIDIA_TREE_MODEL="meta/llama-3.1-70b-instruct"
NVIDIA_DEEP_SEARCH_MODEL="meta/llama-3.1-70b-instruct"
OPENAI_TREE_MODEL="gpt-4o-mini"
NEXT_PUBLIC_APP_URL=""
PORT="3000"
VERCEL_URL=""
| Variable | Purpose | Required |
|---|---|---|
PINECONE_API_KEY |
Pinecone authentication | Yes (quick mode) |
PINECONE_INDEX |
Pinecone index name | Yes (quick mode) |
VOYAGE_API_KEY |
Voyage legal embeddings | Yes (quick mode) |
NVIDIA_API_KEY |
Tree generation and deep search | Yes (deep mode) |
OPENAI_API_KEY |
Fallback provider during tree generation | No |
NVIDIA_API_URL |
NVIDIA chat completions endpoint | Yes (deep mode) |
OPENAI_API_URL |
OpenAI chat completions endpoint | If using OpenAI fallback |
NVIDIA_TREE_MODEL |
Tree‑generation model name | Yes (deep mode) |
NVIDIA_DEEP_SEARCH_MODEL |
Deep‑search model name | Yes (deep mode) |
OPENAI_TREE_MODEL |
OpenAI fallback model name | If using OpenAI fallback |
NEXT_PUBLIC_APP_URL |
Health/keep‑alive base URL | No |
PORT |
Local bootstrap self‑call target | Recommended locally |
VERCEL_URL |
Used for deployed bootstrap calls | No |
npm run devOpen http://localhost:3000.
The app bootstraps Pinecone automatically on load. You can also trigger it manually:
curl -X POST http://localhost:3000/api/bootstrapThis workflow:
- Loads PDFs from
docs/ - Splits them into 1 000‑character chunks with 200 overlap
- Embeds them with Voyage AI (
voyage-law-2) - Upserts vectors into Pinecone
Deep mode needs tree files under docs/trees/.
From Python:
python scripts/generate_trees.pyFrom the API:
curl -X POST http://localhost:3000/api/generate-treesIf tree files don't exist, deep search returns
503. The generator skips trees that already exist — delete adocs/trees/*_tree.jsonfile to rebuild a specific case.
| Route | Method | Purpose |
|---|---|---|
/api/bootstrap |
POST | Starts quick‑search bootstrap |
/api/ingest |
POST | Loads PDFs, chunks, embeds, writes to Pinecone |
/api/search |
POST | Quick‑mode semantic retrieval |
/api/deep-search |
POST | Deep‑mode streamed tree reasoning |
/api/generate-trees |
POST | Builds local tree files |
/api/health |
GET | Health endpoint |
/api/init |
GET | Starts keep‑alive service in production |
.
├── docs/
│ ├── *.pdf # 13 Supreme Court case PDFs
│ ├── db.json # Case metadata (title, parties, date, topic)
│ └── trees/
│ └── *_tree.json # Generated tree index files
├── img/ # Screenshots
├── scripts/
│ ├── deep_search.py # NVIDIA‑powered tree reasoning
│ └── generate_trees.py # Chunked tree generation (NVIDIA + OpenAI fallback)
├── src/
│ ├── app/
│ │ ├── api/
│ │ │ ├── bootstrap/route.ts # Quick‑search bootstrap trigger
│ │ │ ├── deep-search/route.ts # Deep‑search SSE streaming
│ │ │ ├── generate-trees/route.ts
│ │ │ ├── health/route.ts
│ │ │ ├── ingest/route.ts # Pinecone ingestion worker
│ │ │ ├── init/route.ts # Keep‑alive initializer
│ │ │ └── search/route.ts # Quick‑mode vector search
│ │ ├── search/page.tsx # Results route (both modes)
│ │ ├── services/
│ │ │ ├── bootstrap.ts # PDF → chunk → embed → upsert
│ │ │ ├── keepAlive.ts # Production health pinger
│ │ │ └── pinecone.ts # Index creation and vector checks
│ │ ├── types/ # TypeScript interfaces
│ │ ├── globals.css
│ │ ├── layout.tsx
│ │ └── page.tsx # Landing page
│ ├── components/
│ │ ├── CitationGenerator.tsx # Bluebook citation with clipboard
│ │ ├── DeepSearchResults.tsx # Deep‑mode streaming UI
│ │ ├── DocumentView.tsx # Quick‑mode document modal
│ │ ├── Header.tsx
│ │ ├── LoadingState.tsx # Bootstrap spinner overlay
│ │ ├── SearchForm.tsx # Query input + mode toggle
│ │ └── SearchResults.tsx # Quick‑mode result cards
│ └── lib/
│ ├── search-client.ts # Client‑side search utilities
│ └── utils.ts # Shared helpers
├── .env # Environment variables (git‑ignored)
├── package.json
├── requirements.txt # Python: requests, pymupdf, python‑dotenv
├── tailwind.config.ts
└── tsconfig.json
| Quick Search | Deep Search | |
|---|---|---|
| Speed | Fast (< 2s) | Slower (5–15s) |
| Best for | Broad semantic lookup | Structured legal analysis |
| Retrieval | Vector similarity | Model reasoning over document tree |
| Infrastructure | Requires Pinecone + Voyage | Requires NVIDIA API + tree files |
| Output | Document cards with excerpts | Streamed reasoning with citations |
This app avoids vector search in deep mode, but is not yet a full one‑to‑one implementation of PageIndex's ideal retrieval loop. Today, the deep‑search runtime:
- Loads local
*_tree.jsonfiles - Flattens root and child nodes into a bounded prompt context
- Asks NVIDIA's model to answer using that structured case context
The system is best described as:
- Vector RAG for quick mode
- PageIndex‑inspired structured retrieval for deep mode
It captures the key design choice: tree‑based indexing instead of vector‑only chunk retrieval.
The most accurate label:
PageIndex‑inspired indexing and retrieval, implemented locally with chunked tree construction and NVIDIA/OpenAI model calls
- PageIndex intro: vectorless, reasoning‑based RAG
- PageIndex developer docs
- PageIndex GitHub repository
- PageIndex case study: 98.7% accuracy on FinanceBench
This project is licensed under the MIT License.