This directory contains all technical documentation for the Auris Vive pipeline — architecture decisions, stage designs, and the API spec.
If you are a designer or artist looking for the brand kit, you want ../brand/.
Architecture Decision Records capture why we chose what we chose. Read these before touching the pipeline.
| Document | Decision | Status |
|---|---|---|
| ADR-001 | Technology stack — Python, Demucs, Basic Pitch, music21 | ✅ Written |
| ADR-002 | Inference backend — Modal, JobQueue abstraction, singleton model loading | ✅ Written |
| ADR-003 | Drum stem transcription — onset-only v1, DrumTranscriber ABC, ADTLib v2 | ✅ Written |
| ADR-004 | Score quantisation — strict vs expressive | 🔲 Pending |
| ADR-005 | Multi-channel downmix algorithm | 🔲 Pending |
| ADR-006 | Client integration strategy — API-first, thin clients, on-device rationale | ✅ Written |
One document per pipeline stage. Each SDD covers: what the stage does, why it's designed that way, the full implementation, edge cases, error handling, and test requirements.
| Document | Stage | Status |
|---|---|---|
| SDD-001 | Full pipeline overview — open in browser | 🔄 In progress |
| SDD-002 | Ingest — decode, resample, normalise | ✅ Written |
| SDD-003 | Separate — Demucs source separation | 🔲 Pending |
| SDD-004 | Transcribe — Basic Pitch MIDI extraction | ✅ Written |
| SDD-005 | Outputs — FLAC stems, MIDI files, score stub | ✅ Written |
| SDD-006 | API and job queue | 🔲 Pending |
| SDD-007 | Input adapters (file, URL, stream, device) | 🔲 Pending |
| SDD-008 | Analyse — per-stem curve extraction | ✅ Written |
| SDD-009 | Visual prototype — web audio + wave renderer | ✅ Written |
Input adapter → file / URL / stream / device
↓ path: str
Ingest → librosa decode + normalise
↓ np.ndarray (2, N) float32 @ 44,100 Hz
Separate → Demucs htdemucs
↓ dict[str, np.ndarray] (drums / bass / vocals / other)
Transcribe → Basic Pitch per stem
↓ dict[str, PrettyMIDI]
Outputs → FLAC stems / .mid files / MusicXML+PDF
↓ JobResult { stems, midi, score }
API → REST + WebSocket
↓
Clients → Web embed / Mobile SDK
Each handoff is a typed contract. No stage reaches past its immediate neighbour. See SDD-001 for the full picture.
| Concern | Choice | Reason |
|---|---|---|
| Language | Python | ML research ecosystem — Demucs and Basic Pitch have no equivalent elsewhere |
| Audio I/O | librosa | Decodes any format; returns NumPy — universal internal representation |
| Source separation | Demucs htdemucs |
Best quality on MUSDB18; hybrid waveform+spectrogram architecture |
| Transcription | Basic Pitch (Spotify) | Handles polyphony; classical methods don't |
| MIDI representation | pretty_midi | Clean object model; native Basic Pitch output |
| Score generation | music21 | Understands music theory; MusicXML output |
Full rationale in ADR-001.
18 unresolved decisions are tracked in SDD-001. The most blocking ones:
| ID | Question | Blocks |
|---|---|---|
| Q-STREAM-1 | Batch vs rolling-window streaming architecture | Stream adapter, pipeline design |
| ✅ Closed | ||
| ✅ Closed | ||
| Q-API-1 | Authentication model | SDD-006 |
python3.11 -m venv .venv-ml
source .venv-ml/bin/activate
pip install -e ".[dev,ml]"
# FFmpeg required for MP3/MP4 decoding
brew install ffmpegSee ../.github/CONTRIBUTING.md for how to add ADRs and SDDs.
Engineering questions → open an issue tagged engineering
Pipeline design questions → open an issue tagged pipeline