This document is the historical ledger for the Epstein Files corpus.
It preserves:
- historical and deprecated sources
- community reconstruction efforts
- dataset anomalies and structural divergences
- NATIVEs analysis and recovery progress
- verification artifacts and manifests
- checksum lineage
- contributor transparency
- context intentionally excluded from the main README
The main README is concise and forward-facing. This document is archival and explanatory.
Nothing documented here is removed once recorded (append-only by design).
If this corpus must be understood years later, this is the record.
This repository distinguishes between two structural tiers:
- Preserves original DOJ release structure
- Prioritizes least-redacted verified copies
- Maintains original ZIP hash lineage
- Avoids structural reinterpretation
- Separates source material from derivative normalization
- Community composites
- Flattened or normalized structures
- Superseded torrents
- Partial reconstructions
- Aggregated structured builds
Reference sets are preserved for transparency but are not authoritative.
This separation protects provenance integrity.
- Overview
- Canonical vs Reference Classification
- Structured Dataset (Mostly Full) — 2026-02-04
- Data Set 9 (DS09) — Detailed Notes, History, and Reconstruction
- Historical Sources & Magnets (DS09)
- Reconstruction Methodology (DS09)
- NATIVEs Analysis & Recovery Status
- Redaction Divergence Notes
- Timeline of Availability
- Relationship to Main README
- Maintenance Policy
The Epstein Files corpus consists of Data Sets 1–12 originally released under EFTA.
While most datasets are structurally stable, Data Set 9 (DS09) is historically:
- the largest
- the most unstable
- the most fragmented across community sources
- the only dataset requiring major reconstruction
This document preserves how the corpus evolved from aggregation to provenance-focused archival.
- Original DOJ ZIP structure maintained
- Verified original ZIP hashes
- Least-redacted known copies preserved
- No normalization applied
- Structured dataset builds
- Community composite torrents
- Deduplicated merges
- Flattened navigational reorganizations
- Superseded magnets
Reference materials remain documented to preserve checksum and magnet lineage.
Contributor: https://github.com/excoffierleonard
Release Title: Epstein Files — Structured Dataset (Mostly Full) (1-12) Release Date: 2026-02-04 Size: 206.18 GB
magnet:?xt=urn:btih:f5cbe5026b1f86617c520d0a9cd610d6254cbe85&dn=epstein-files-structured-full-20250204.tar.zst&xl=221393230690
SHA256: 29acc987cd7fadfbbf94444ed165750b84d82c85af3703bab74308ea9e91e910
Preserved for transparency and historical reference.
Not designated canonical due to:
- Incomplete DS09
- Inclusion of later redacted file variants
- Structural normalization differing from original DOJ layout
- Use of community composites for DS10 and DS11
This dataset materially assisted early aggregation and reconciliation efforts.
This section is the authoritative deep-dive for DS09.
It preserves:
- historical magnets
- reconstruction efforts
- NATIVEs analysis
- verification artifacts
- context excluded from the main README
- ~99.9% reconstructable by file count
- ~25 PDFs unresolved
- 2,327 / 2,542 NATIVEs recovered (~91.6%)
Completeness measured by file presence, not byte-for-byte parity.
This repository does not claim canonical completeness — only best verifiable public reconstruction.
- Expected: ~531,307
- Recovered: ~531,282
- Missing: ~25 PDFs
- Expected: ~2,542
- Recovered: 2,327
- Recovery rate: ~91.6%
Remaining NATIVEs fall into:
HEAD 200→GET 404- Unresolvable Bates numbers
Preserved for archival and forensic context.
magnet:?xt=urn:btih:0a3d4b84a77bd982c9c2761f40944402b94f9c64&dn=DataSet9-incomplete.zip&xl=48995762176
Status: Deprecated Severely incomplete No NATIVEs
magnet:?xt=urn:btih:7ac8f771678d19c75a26ea6c14e7d4c003fbf9b6&dn=dataset9-more-complete.tar.zst&xl=96148724837
Status: Superseded
magnet:?xt=urn:btih:286060d26392042a5e2b5354d09ec7c7c5cee7dc&dn=dataset-09%20%28Incomplete%29&xl=101565025420
Status: Historical reference
magnet:?xt=urn:btih:5b50564ee995a54009fec387c97f9465eb18ba00&dn=dataset-9_by_fuckthissite3.tar&xl=148072017920
Status: Historical reference
(Originally from the main readme, these entries may be duplicates of those previously mentioned)
-
Torrent Magnet: LINK
magnet:?xt=urn:btih:9c1f0a021459938e2446310beea5e43a17509a19&xt=urn:btmh:122076e576d49af5705f53b768621d85232c36452d4777ed23f10c18c72bb9fe109c&dn=dataset9_reconstructed_20260110.tar.zst&xl=181137604550&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2FannounceSHA256: 1472d41d66b069423ed804ceee3d47bc6d307be0daa05d0efe5a38df2d4469e5
Notes:
-
Torrent Magnet: LINK
magnet:?xt=urn:btih:5b50564ee995a54009fec387c97f9465eb18ba00&dn=dataset-9_by_fuckthissite3.tar&xl=148072017920SHA256: 5ADC043BCF94304024D718E57267C1AA009D782835F6ADBE6AD7FDBB763F15C5
Notes:
- Contains ~2,308 NATIVES, ~252,169 PDFs
ym's compiled flattened PDFs (VERY SLOW) (94.58 GB / 180 GB)
-
Torrent Magnet: LINK
magnet:?xt=urn:btih:286060d26392042a5e2b5354d09ec7c7c5cee7dc&dn=dataset-09%20%28Incomplete%29&xl=101565025420&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fexodus.desync.com%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2FannounceSHA256: See checksums-incomplete.zip
Notes:
- 531,282 PDFs (flattened). No NATIVEs.
- VOL00009.DAT, VOL00009.OPT
- Early seed; bandwidth limited — please be patient and report issues.
Earlier efforts involved:
- merging partial archives
- deduplicating by filename and size
- validating against
.DATmanifests - identifying placeholder NATIVEs
Current methodology prioritizes:
- direct endpoint recovery
- HEAD-before-GET validation
- incremental verification
- manifest reconciliation
- 4670 bytes
- 2433 bytes
Likely upstream stubs or redactions.
- Office documents often 0-byte stubs
- Two SQLite databases password protected
- Some
.avifiles sequential frame segments - Significant jail footage present
Datasets 1–8 exist in:
- Earlier less-redacted versions
- Later redacted DOJ reuploads
Canonical corpus preserves least-redacted verified variants where available.
Structured or community builds may contain later redacted copies.
This divergence is intentional and documented.
- DOJ publishes Data Sets 1–12
- DS09 unstable / incomplete
- Early partial magnets (~45 GB)
- Deduplicated composites (~89.5 GB)
- Larger community reconstructions (~140 GB)
- Structured dataset aggregation (Leonard)
- Provenance-first canonical corpus established
Main README:
- Clean
- Minimal
- Current recommended downloads
Archival Record:
- Exhaustive
- Historical
- Magnet lineage preserved
- Append-only
- Append-only
- No historical deletions
- Superseded magnets retained
- Corrections documented inline
- Canonical designation explicitly stated
This document ensures the Epstein Files corpus retains a verifiable paper trail.