Deterministic media acquisition for self‑hosted archives
A sudoStacks project
Retreivr is a self‑hosted media acquisition engine built for deterministic archival.
It takes user intent (URLs, search queries, or scheduled playlist syncs), resolves them into concrete media targets, downloads the media, applies canonical structure and metadata rules, and writes clean, reproducible files to disk.
Retreivr is not a media server and does not stream or index content. It focuses exclusively on reliable acquisition, correct metadata application, and predictable filesystem structure.
The system is designed to be:
- Deterministic (no duplicate or unstable outputs)
- Idempotent (safe to re-run)
- Canonical (consistent naming + metadata rules)
- Local-first (runs entirely under your control)
- Deterministic execution (no duplicate downloads)
- Canonical metadata-first architecture (MusicBrainz authority)
- Clean filesystem structure (no source IDs in filenames)
- Idempotent scheduler behavior
- Single-worker design for correctness
- Local-first, Docker-first deployment
- Resolves search queries into concrete media candidates
- Downloads media using yt-dlp
- Applies canonical naming rules
- Embeds structured metadata into files
- Stores download history in SQLite
- Synchronizes playlists via deterministic snapshot + diff
- Provides a Web UI and REST API
- Sends optional Telegram run summaries
- Stream media
- Replace Plex, Jellyfin, or music players
- Auto-delete owned files
- Circumvent DRM or protected platforms
- Run as a cloud service
- Collect telemetry
- Direct URL (single item)
- Search queries
- Scheduled playlist sync
- MusicBrainz-first canonical resolution
- Spotify fallback only when OAuth + Premium validated
- Unified FIFO job queue
- yt-dlp execution
- Container finalized before metadata embedding
- Atomic move to final destination
- Video: title, identifiers, channel_id, canonical URL embedded
- Music: enriched via MusicBrainz (track, album, ISRC, MBIDs, artwork)
- Files are never renamed after finalization
- Deterministic playlist snapshot hashing
- Reorder does not trigger re-download
- Active-job duplicate prevention
- Crash-safe idempotency
- Single structured run summary
Music/
Album Artist/
Album (Year)/
Disc 1/
01 - Track Title.ext
Rules:
- No video IDs in filenames
- No upload dates in filenames
- Zero-padded track numbers
- Unicode-safe normalization
- Filename = sanitized title only
- Collision resolution via " (2)", " (3)"
- Source identifiers stored in metadata + SQLite only
As of v0.9.3:
- Default video container: MKV
- No forced re-encoding
- Metadata embedded after container finalization
MKV provides strong metadata support while preserving original codec fidelity.
Pull the image:
docker pull ghcr.io/sudoStacks/retreivr:latestCopy templates and start:
cp docker/docker-compose.yml.example docker/docker-compose.yml
cp .env.example .env
docker compose -f docker/docker-compose.yml up -dCanonical Docker mount layout:
/downloadsmedia output/dataruntime DB/temp data/configconfig.json/logslogs/tokensauth/cookies
Open the Web UI at:
http://YOUR_HOST:8090
Docker deployment:
- Docker Engine or Docker Desktop
- docker compose (v2)
Optional local/source:
- Python 3.11
- ffmpeg on PATH
Primary config file:
/config/config.json (Docker)
data/config/config.json (local/source default)
Key areas:
- Playlist definitions
- Default
final_format - Music mode toggle
- OAuth configuration (optional)
- Scheduler interval
- Telegram notifications (optional)
Spotify integration requires OAuth credentials and premium validation. Without it, Spotify functionality remains disabled.
When enabled:
- MusicBrainz resolves canonical track + release
- MBIDs embedded
- Artwork optionally embedded
- Tags enriched without renaming files
Spotify metadata is never authoritative.
Common endpoints:
GET /api/statusGET /api/metricsPOST /api/runGET /api/historyGET /api/music/albums/searchPOST /api/music/album/candidates
OpenAPI docs available at /docs.
docker compose pull
docker compose down
docker compose up -dData persists in mounted volumes.
- Stable ingestion engine
- MusicBrainz-first canonical resolution
- Deterministic playlist snapshot behavior
- Idempotent scheduler runs
- MKV default container
- Integration test coverage for core flows
v0.9.3 is a stabilization milestone.
MIT. See LICENSE.
