Engineering

This directory contains all technical documentation for the Auris Vive pipeline — architecture decisions, stage designs, and the API spec.

If you are a designer or artist looking for the brand kit, you want ../brand/.

Document map

Decisions (ADRs)

Architecture Decision Records capture why we chose what we chose. Read these before touching the pipeline.

Document	Decision	Status
ADR-001	Technology stack — Python, Demucs, Basic Pitch, music21	✅ Written
ADR-002	Inference backend — Modal, JobQueue abstraction, singleton model loading	✅ Written
ADR-003	Drum stem transcription — onset-only v1, DrumTranscriber ABC, ADTLib v2	✅ Written
ADR-004	Score quantisation — strict vs expressive	🔲 Pending
ADR-005	Multi-channel downmix algorithm	🔲 Pending
ADR-006	Client integration strategy — API-first, thin clients, on-device rationale	✅ Written

Stage design documents (SDDs)

One document per pipeline stage. Each SDD covers: what the stage does, why it's designed that way, the full implementation, edge cases, error handling, and test requirements.

Document	Stage	Status
SDD-001	Full pipeline overview — open in browser	🔄 In progress
SDD-002	Ingest — decode, resample, normalise	✅ Written
SDD-003	Separate — Demucs source separation	🔲 Pending
SDD-004	Transcribe — Basic Pitch MIDI extraction	✅ Written
SDD-005	Outputs — FLAC stems, MIDI files, score stub	✅ Written
SDD-006	API and job queue	🔲 Pending
SDD-007	Input adapters (file, URL, stream, device)	🔲 Pending
SDD-008	Analyse — per-stem curve extraction	✅ Written
SDD-009	Visual prototype — web audio + wave renderer	✅ Written

Architecture in brief

Input adapter       →  file / URL / stream / device
  ↓ path: str
Ingest              →  librosa decode + normalise
  ↓ np.ndarray (2, N) float32 @ 44,100 Hz
Separate            →  Demucs htdemucs
  ↓ dict[str, np.ndarray]  (drums / bass / vocals / other)
Transcribe          →  Basic Pitch per stem
  ↓ dict[str, PrettyMIDI]
Outputs             →  FLAC stems / .mid files / MusicXML+PDF
  ↓ JobResult { stems, midi, score }
API                 →  REST + WebSocket
  ↓
Clients             →  Web embed / Mobile SDK

Each handoff is a typed contract. No stage reaches past its immediate neighbour. See SDD-001 for the full picture.

Stack

Concern	Choice	Reason
Language	Python	ML research ecosystem — Demucs and Basic Pitch have no equivalent elsewhere
Audio I/O	librosa	Decodes any format; returns NumPy — universal internal representation
Source separation	Demucs `htdemucs`	Best quality on MUSDB18; hybrid waveform+spectrogram architecture
Transcription	Basic Pitch (Spotify)	Handles polyphony; classical methods don't
MIDI representation	pretty_midi	Clean object model; native Basic Pitch output
Score generation	music21	Understands music theory; MusicXML output

Full rationale in ADR-001.

Open questions

18 unresolved decisions are tracked in SDD-001. The most blocking ones:

ID	Question	Blocks
Q-STREAM-1	Batch vs rolling-window streaming architecture	Stream adapter, pipeline design
~~Q-TRX-2~~	~~Drum transcription~~ — resolved by ADR-003 (onset-only for v1)	✅ Closed
~~Q-SEP-1~~	~~Model loading~~ — resolved by ADR-002 (singleton at worker startup)	✅ Closed
Q-API-1	Authentication model	SDD-006

Development setup

python3.11 -m venv .venv-ml
source .venv-ml/bin/activate
pip install -e ".[dev,ml]"
# FFmpeg required for MP3/MP4 decoding
brew install ffmpeg

Contributing

See ../.github/CONTRIBUTING.md for how to add ADRs and SDDs.

Engineering questions → open an issue tagged engineering Pipeline design questions → open an issue tagged pipeline

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Engineering

Document map

Decisions (ADRs)

Stage design documents (SDDs)

Architecture in brief

Stack

Open questions

Development setup

Contributing

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Engineering

Document map

Decisions (ADRs)

Stage design documents (SDDs)

Architecture in brief

Stack

Open questions

Development setup

Contributing