An AI-powered investigative platform for the Epstein Files.
Parrhesepstein is a full-stack Flask application that enables deep, systematic analysis of the Jeffrey Epstein document corpus released by the U.S. Department of Justice. It combines multi-agent AI investigation, semantic search (RAG), network graph analysis, and structured data persistence to surface connections, financial flows, and patterns across thousands of declassified documents.
The name combines parrhesia (Greek: fearless speech, speaking truth to power) with Epstein. Name idea by Boni Castellane.
Author: The Pirate
![]() Home |
![]() Investigation Crew |
![]() Report |
![]() Smart Archive (RAG) |
![]() People Registry |
![]() Flight Logs |
![]() Network Graph |
![]() Relationship Map |
![]() Influence Schema |
![]() Influence Graph |
- Architecture Overview
- Tech Stack
- Project Structure
- Installation
- Configuration
- Running
- Core Features
- API Reference
- Agent System
- Database Schema
- Data Pipeline
- License
+------------------+
| Flask (5001) |
+--------+---------+
|
+--------------------+--------------------+
| | |
+--------v-------+ +--------v--------+ +---------v--------+
| 18 Route | | 9 Agent | | 13 Service |
| Blueprints | | Modules | | Modules |
| (71 endpoints) | | (AI workers) | | (data layer) |
+--------+--------+ +--------+--------+ +---------+--------+
| | |
+---------+---------+ +------+-------+ +---------+---------+
| | | | | | |
+---v---+ +--------+ | | Claude API | +-v---+ +--v--+ +---v----+
| HTML | | JSON | | | (Anthropic) | |Mongo| |Chroma| |justice |
| Pages | | APIs | | +--------------+ |DB | |DB | |.gov |
+-------+ +--------+ | +-----+ +-----+ +--------+
|
+--------v--------+
| Background |
| Thread Pool |
| (daemon jobs) |
+-----------------+
All long-running operations (investigations, network generation, PDF downloads, influence analysis) run as background daemon threads with polling status endpoints, keeping the UI responsive.
| Layer | Technology |
|---|---|
| Backend | Flask 3.1.2, Python 3.11+ |
| AI Engine | Claude (Anthropic SDK), ThreadPoolExecutor (parallel batch analysis) |
| Vector DB | ChromaDB 0.4.18 (semantic search / RAG) |
| Document DB | MongoDB (PyMongo 4.6) |
| Graph Analysis | NetworkX 3.4.2 |
| PDF Processing | PyPDF2, PyMuPDF, Tesseract OCR, Claude Vision |
| Data Analysis | Pandas 2.3.3 |
| Frontend | Vanilla JS, Vis.js (network graphs), Leaflet.js (maps) |
| Data Source | U.S. DOJ Epstein Files (justice.gov), Epstein email dataset |
parrhesepstein/
├── app/
│ ├── __init__.py # Flask app factory
│ ├── config.py # Centralized configuration
│ ├── extensions.py # Shared state (MongoDB, email DF, OCR flags)
│ ├── run.py # Entry point
│ │
│ ├── routes/ # 18 Blueprint modules, 71+ endpoints
│ │ ├── pages.py # 20 HTML page routes
│ │ ├── search.py # Search API (justice.gov, emails, semantic)
│ │ ├── documents.py # Document management + RAG Q&A
│ │ ├── investigate.py # Single-person investigation
│ │ ├── investigation_crew.py # Multi-agent crew system
│ │ ├── network.py # Network graph generation
│ │ ├── influence.py # Influence network analysis
│ │ ├── relationships.py # Email/document relationship extraction
│ │ ├── merge.py # Investigation merging + deep-dive
│ │ ├── synthesis.py # Report synthesis
│ │ ├── analyze.py # Batch document analysis
│ │ ├── flights.py # Flight data API
│ │ ├── people.py # People database
│ │ ├── indexing.py # ChromaDB indexing
│ │ ├── ocr.py # OCR + image extraction
│ │ ├── settings_routes.py # App settings
│ │ └── status.py # System health + dashboard stats
│ │
│ ├── agents/ # 9 AI agent modules
│ │ ├── vectordb.py # ChromaDB ops, entity extraction, graph building
│ │ ├── investigator.py # Person dossier generator
│ │ ├── network_agent.py # Relationship graph mapper
│ │ ├── investigation_crew.py # 6-agent orchestrated investigation
│ │ ├── influence_analyzer.py # International org influence mapper
│ │ ├── meta_investigator.py # Cross-investigation comparator
│ │ ├── context_provider.py # RAG + MongoDB context retrieval
│ │ └── orchestrator.py # Agent coordination helpers
│ │
│ ├── services/ # 13 service modules
│ │ ├── claude.py # Anthropic client + retry logic
│ │ ├── justice_gov.py # Justice.gov search API
│ │ ├── pdf.py # PDF download, extraction, auto-indexing
│ │ ├── emails.py # Email dataset search (parquet)
│ │ ├── entities.py # NER + keyword extraction
│ │ ├── people.py # People collection CRUD
│ │ ├── documents.py # Local document management
│ │ ├── fact_checker.py # EFTA citation verification
│ │ ├── settings.py # Settings cache (60s TTL)
│ │ ├── network_builder.py # Network data construction
│ │ ├── merge_logic.py # Investigation merge logic
│ │ └── jobs.py # Background job management
│ │
│ ├── templates/ # 20 HTML pages (~18k lines)
│ └── static/ # sidebar.js, sidebar.css, icon.png
│
├── chroma_db/ # ChromaDB persistent storage
├── documents/ # Downloaded PDFs + extracted text
├── saved_analyses/ # Exported influence analysis JSON
├── epstein_flights_data.json # Epstein flight records
├── epstein_emails.parquet # Email dataset
└── requirements.txt # Python dependencies
Codebase: ~8,300 lines Python + ~18,000 lines HTML/JS
- Python 3.11+
- MongoDB (running on
localhost:27017) - Tesseract OCR (optional, for scanned PDFs)
- Poppler (optional, for PDF-to-image conversion)
- An Anthropic API key (Claude)
# Clone the repository
git clone https://github.com/Pinperepette/parrhesepstein.git
cd parrhesepstein
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/macOS
# venv\Scripts\activate # Windows
# Install Python dependencies
pip install -r requirements.txt
# Install system dependencies (macOS)
brew install tesseract poppler
# Install system dependencies (Ubuntu/Debian)
# sudo apt-get install tesseract-ocr poppler-utils
# Ensure MongoDB is running
mongosh --eval "db.runCommand({ping: 1})"The application expects two data files in the project root:
| File | Description | Source |
|---|---|---|
epstein_flights_data.json |
Flight passenger records | Included in repo |
epstein_emails.parquet |
Email dataset (~4,300 emails) | Hugging Face |
Both are optional. The app functions without them but the Flights and Email pages will be empty.
All configuration lives in app/config.py:
MONGO_URI = "mongodb://localhost:27017/"
DB_EPSTEIN_NAME = "EpsteinAnalyses" # Main database
DB_SETTINGS_NAME = "SnareSetting" # API key storage
DOCUMENTS_DIR = "<project_root>/documents" # Downloaded PDFs + text
CHROMA_PATH = "<project_root>/chroma_db" # Vector database
ANALYSES_DIR = "<project_root>/saved_analyses"
VALID_MODELS = [
"claude-sonnet-4-20250514",
"claude-opus-4-20250514",
"claude-haiku-4-5-20251001",
]
VALID_LANGUAGES = [
"Italiano", "English", "Español",
"Français", "Deutsch", "Português"
]The Claude API key is configured through the Settings page (/settings) and stored in MongoDB, not in environment variables or config files.
python app/run.pyThe application starts on http://localhost:5001 with debug mode and threading enabled.
On first launch:
- Navigate to
/settings - Enter your Anthropic API key
- Select your preferred Claude model and output language
- Start investigating from the dashboard (
/)
Endpoints: /api/search, /api/search-multi, /api/semantic-search
Searches the U.S. DOJ Epstein Files database at justice.gov/d9/2024-06/multimedia-search. Supports:
- Single query — searches justice.gov with pagination (up to 10 pages)
- Multi-source — combines justice.gov + local email dataset results
- Semantic (RAG) — vector similarity search across all indexed documents in ChromaDB
Every search result that contains a PDF link is automatically downloaded, text-extracted, and indexed into ChromaDB in a background thread. The PDF extraction pipeline has triple fallback:
PyPDF2 (fast, text-based PDFs)
→ Tesseract OCR (scanned documents)
→ Claude Vision API (last resort)
Downloaded documents are persisted to documents/ as both .pdf and .txt files.
Endpoint: POST /api/investigate
Generates a comprehensive dossier on a single person using the InvestigatorAgent:
- Searches justice.gov for all documents mentioning the person
- Downloads and extracts full PDF text
- Fetches Wikipedia background via
wikipediaapi - Identifies connected people, financial amounts, dates, red flags
- Generates AI analysis narrative with Claude
Output structure:
{
"name": "Leon Black",
"wikipedia": { "title": "...", "summary": "...", "url": "..." },
"documents_found": 42,
"mentions": [{ "document": "...", "url": "...", "context": "..." }],
"connections": ["Jeffrey Epstein", "Apollo Global"],
"timeline": ["2012-03-15", "2013-07-22"],
"financial": ["$158 million", "$40M+"],
"red_flags": ["..."],
"ai_analysis": "..."
}Results are saved to the people collection in MongoDB and are accessible from the People page.
Endpoint: POST /api/investigation
The most advanced investigation mode. Orchestrates a team of 6 specialized AI agents:
| Agent | Role |
|---|---|
| Director | Plans search strategy — identifies terms, people, patterns to search |
| Researcher | Executes searches on justice.gov, downloads all documents |
| Analyst | Extracts key facts, people, connections, timeline from documents |
| Banking Specialist | Identifies financial transactions, wire transfers, shell companies |
| Cipher Specialist | Decodes patterns, aliases, codenames, indirect references |
| Synthesizer | Generates the final comprehensive report |
The pipeline runs sequentially: Director plans → Researcher searches → Analyst extracts → Banking analyzes → Cipher decodes → Synthesizer reports. Document analysis within each stage uses parallel batch processing via ThreadPoolExecutor for throughput.
Continuation support: POST /api/investigation/<id>/continue allows extending an existing investigation with a new objective, building on previous findings.
Meta-investigation: POST /api/meta-investigation compares multiple investigations, finds contradictions, and generates a unified verdict.
Citation verification: Every generated report is run through the fact-checker, which extracts all EFTA document codes and verifies them against ChromaDB and justice.gov. The UI shows a verification badge (green >80%, yellow >50%, red <50%).
Endpoint: POST /api/network
Builds a relationship graph from document co-occurrences:
- Searches justice.gov for query terms
- Extracts named entities from each document using sliding-window NER with false-positive filtering (email headers, locations, organizations, common verbs are excluded)
- Builds a
NetworkXgraph where edges represent co-occurrence in the same document - Converts to
Vis.jsformat for interactive frontend visualization - Identifies clusters (connected components) and hub nodes (highest degree)
The entity extraction handles 3-word names (e.g., "Sultan Bin Sulayem") and deduplicates partial matches.
Endpoint: POST /api/influence-network
Maps how Epstein's private network influenced international organizations:
Target organizations:
- WHO (World Health Organization)
- ICRC (International Committee of the Red Cross)
- World Bank
- Gates Foundation
- United Nations
- GAVI (Vaccine Alliance)
- IPI (International Peace Institute)
Tracked intermediaries: Jeffrey Epstein, Leon Black, Bill Gates, Boris Nikolic, Terje Rod-Larsen, Larry Summers, and others.
Three depth levels:
| Level | Pages/search | Max docs | Use case |
|---|---|---|---|
small |
2 | 30 | Quick scan |
medium |
5 | 100 | Standard analysis |
full |
10 | 300 | Comprehensive mapping |
Results include connection maps, financial flows, key documents, and exportable Markdown reports. Supports document deep-dives (POST /api/influence-network/deep-analysis) for drilling into specific findings.
Endpoints: POST /api/investigations/merge, POST /api/meta-investigation
Merge: Combines multiple crew investigations into a unified analysis. Aggregates documents, connections, people, and timelines, then re-synthesizes a combined report.
Deep-dive: POST /api/investigations/deep-dive performs targeted analysis on a single document within the context of a merged investigation.
Meta-investigation: Compares investigations for contradictions, corroborations, and gaps. Three-phase workflow:
- Analyze — compare all investigations
- Resolve — search for documents that address contradictions
- Verdict — generate unified conclusion
Endpoint: POST /api/sintesi/generate
Aggregates multiple influence analyses and deep-dives into a single structured report. Extracts and deduplicates:
- Key people and their roles
- Organizations and their connections
- Financial flows and amounts
- Document evidence chains
- Unified timeline
Output is a structured Markdown report stored in MongoDB.
Endpoint: POST /api/search-emails
Searches the Epstein email dataset (~4,300 emails loaded from Parquet into a Pandas DataFrame). Searchable fields: subject, from_address, to_address, message_html, other_recipients.
Accessible via the JMail page (/jmail).
Endpoints: GET /api/flights, GET /api/flights/passengers
Serves Epstein flight records from JSON. Supports filtering by passenger name via URL parameter (/flights?passenger=NAME). The People page shows a flight icon for individuals found in the flight data, linking directly to their filtered flight records.
Endpoint: POST /api/archive/ask
Ask natural language questions against all indexed documents. Uses ChromaDB semantic search to find relevant chunks, then sends them as context to Claude for a grounded answer. Accessible via the Archive page (/archive).
Module: app/services/fact_checker.py
Runs automatically after every crew investigation report is generated. Extracts all EFTA document codes (regex: EFTA\d{8,}) and verifies each against:
- ChromaDB (is the document indexed locally?)
- Justice.gov search (does the document exist in the DOJ database?)
Results are stored alongside the investigation:
{
"citation_verification": {
"total_citations": 12,
"verified": 10,
"unverified": 2,
"details": [
{ "doc_id": "EFTA01234567", "status": "verified", "source": "chromadb" },
{ "doc_id": "EFTA99999999", "status": "unverified", "source": null }
]
}
}All long-running operations follow the async job pattern:
# Start a job
POST /api/<resource>
→ { "job_id": "uuid", "status": "started" }
# Poll for status
GET /api/<resource>/status/<job_id>
→ {
"job_id": "uuid",
"status": "pending | running | completed | error",
"progress": "Downloading PDF 3/20...",
"result": { ... } // when completed
}| Method | Endpoint | Description |
|---|---|---|
POST |
/api/search |
Search justice.gov |
POST |
/api/search-multi |
Combined justice.gov + email search |
POST |
/api/search-emails |
Search email dataset |
POST |
/api/semantic-search |
RAG search in ChromaDB |
POST |
/api/download-pdf |
Download and extract PDF text |
GET |
/api/documents |
List local documents |
GET |
/api/documents/<id>/text |
Get document text |
GET |
/api/documents/<id>/pdf |
Serve PDF file |
GET |
/api/vectordb/stats |
ChromaDB statistics |
POST |
/api/archive/ask |
RAG Q&A |
POST |
/api/investigate |
Start person investigation |
GET |
/api/investigate/status/<id> |
Poll investigation status |
POST |
/api/investigation |
Start crew investigation |
GET |
/api/investigation/status/<id> |
Poll crew status |
GET |
/api/investigation/list |
List all investigations |
GET |
/api/investigation/<id> |
Get investigation details |
POST |
/api/investigation/<id>/continue |
Continue investigation |
POST |
/api/meta-investigation |
Compare investigations |
POST |
/api/network |
Generate network graph |
GET |
/api/network/status/<id> |
Poll network status |
POST |
/api/influence-network |
Start influence analysis |
GET |
/api/influence-network/status/<id> |
Poll influence status |
POST |
/api/influence-network/deep-analysis |
Deep-dive into document |
POST |
/api/influence-network/export |
Export to Markdown |
GET |
/api/relationships/emails |
Extract email relationships |
GET |
/api/relationships/documents |
Extract co-occurrences |
POST |
/api/investigations/merge |
Merge investigations |
POST |
/api/investigations/deep-dive |
Document deep-dive |
POST |
/api/sintesi/generate |
Generate synthesis report |
GET |
/api/flights |
Flight data |
GET |
/api/flights/passengers |
Unique passenger list |
GET |
/api/people |
People database |
POST |
/api/index-document |
Index document to ChromaDB |
POST |
/api/vectordb/index-all-local |
Batch index all local files |
POST |
/api/pdf-text |
Extract PDF text (with OCR) |
GET |
/api/settings |
Get settings |
POST |
/api/settings |
Update settings + API key |
GET |
/api/status |
Health check |
GET |
/api/dashboard/stats |
Dashboard statistics |
┌─────────────────────┐
│ InvestigatorAgent │ Single-person dossier
└─────────────────────┘
┌─────────────────────┐
│ NetworkAgent │ Relationship graph
└─────────────────────┘
┌─────────────────────┐
│ InvestigationCrew │ 6-agent orchestrated team
│ ├─ Director │
│ ├─ Researcher │
│ ├─ Analyst │
│ ├─ Banking │
│ ├─ Cipher │
│ └─ Synthesizer │
└─────────────────────┘
┌─────────────────────┐
│InfluenceAnalyzer │ Org influence mapping
└─────────────────────┘
┌─────────────────────┐
│ MetaInvestigator │ Cross-investigation comparison
└─────────────────────┘
┌─────────────────────┐
│ ContextProvider │ RAG + MongoDB context
└─────────────────────┘
All agents use Claude via the Anthropic SDK with 3-retry logic on server errors. The model and language are configurable at runtime via /settings.
| Collection | Purpose | Key Fields |
|---|---|---|
crew_investigations |
Multi-agent investigation results | objective, strategy, analysis, report, citation_verification |
people |
Person profiles and dossiers | name, roles, relevance, dossier, connections, investigations |
analyses |
Influence network analyses | target_orgs, depth, result.connections |
deep_analyses |
Document deep-dives | doc_id, result.key_findings, result.red_flags |
syntheses |
Aggregated reports | analysis_ids, persons, organizations, synthesis |
merged_investigations |
Merged investigation results | investigation_ids, merged_report |
searches |
Saved search results | query, total_results, results_sample |
app_settings |
Runtime configuration | model, language |
| Collection | Purpose |
|---|---|
api_keys |
Anthropic API key storage |
Single collection epstein_documents with:
- Chunking: 1,000 characters with 200-character overlap
- Metadata:
doc_id,title,url,chunk_index,source - Embedding: ChromaDB default (all-MiniLM-L6-v2)
justice.gov search
│
▼
PDF download ──────────────────────────┐
│ │
▼ ▼
Text extraction Save to disk
(PyPDF2/OCR/Vision) documents/*.pdf
│ documents/*.txt
▼
ChromaDB indexing
(1000-char chunks)
│
▼
Available for:
├─ Semantic search (/api/semantic-search)
├─ RAG Q&A (/api/archive/ask)
├─ Agent context (crew investigations)
└─ Fact-checking (citation verification)
Every search endpoint triggers background PDF downloads. Documents are automatically persisted and indexed without user intervention. The pipeline is idempotent — re-downloading an already-indexed document is a no-op.
This project is intended for research, journalism, and public accountability purposes. The Epstein Files are public records released by the U.S. Department of Justice.
Built with fearless speech in mind.









