Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

README.md

Engineering

This directory contains all technical documentation for the Auris Vive pipeline — architecture decisions, stage designs, and the API spec.

If you are a designer or artist looking for the brand kit, you want ../brand/.


Document map

Decisions (ADRs)

Architecture Decision Records capture why we chose what we chose. Read these before touching the pipeline.

Document Decision Status
ADR-001 Technology stack — Python, Demucs, Basic Pitch, music21 ✅ Written
ADR-002 Inference backend — Modal, JobQueue abstraction, singleton model loading ✅ Written
ADR-003 Drum stem transcription — onset-only v1, DrumTranscriber ABC, ADTLib v2 ✅ Written
ADR-004 Score quantisation — strict vs expressive 🔲 Pending
ADR-005 Multi-channel downmix algorithm 🔲 Pending
ADR-006 Client integration strategy — API-first, thin clients, on-device rationale ✅ Written

Stage design documents (SDDs)

One document per pipeline stage. Each SDD covers: what the stage does, why it's designed that way, the full implementation, edge cases, error handling, and test requirements.

Document Stage Status
SDD-001 Full pipeline overview — open in browser 🔄 In progress
SDD-002 Ingest — decode, resample, normalise ✅ Written
SDD-003 Separate — Demucs source separation 🔲 Pending
SDD-004 Transcribe — Basic Pitch MIDI extraction ✅ Written
SDD-005 Outputs — FLAC stems, MIDI files, score stub ✅ Written
SDD-006 API and job queue 🔲 Pending
SDD-007 Input adapters (file, URL, stream, device) 🔲 Pending
SDD-008 Analyse — per-stem curve extraction ✅ Written
SDD-009 Visual prototype — web audio + wave renderer ✅ Written

Architecture in brief

Input adapter       →  file / URL / stream / device
  ↓ path: str
Ingest              →  librosa decode + normalise
  ↓ np.ndarray (2, N) float32 @ 44,100 Hz
Separate            →  Demucs htdemucs
  ↓ dict[str, np.ndarray]  (drums / bass / vocals / other)
Transcribe          →  Basic Pitch per stem
  ↓ dict[str, PrettyMIDI]
Outputs             →  FLAC stems / .mid files / MusicXML+PDF
  ↓ JobResult { stems, midi, score }
API                 →  REST + WebSocket
  ↓
Clients             →  Web embed / Mobile SDK

Each handoff is a typed contract. No stage reaches past its immediate neighbour. See SDD-001 for the full picture.


Stack

Concern Choice Reason
Language Python ML research ecosystem — Demucs and Basic Pitch have no equivalent elsewhere
Audio I/O librosa Decodes any format; returns NumPy — universal internal representation
Source separation Demucs htdemucs Best quality on MUSDB18; hybrid waveform+spectrogram architecture
Transcription Basic Pitch (Spotify) Handles polyphony; classical methods don't
MIDI representation pretty_midi Clean object model; native Basic Pitch output
Score generation music21 Understands music theory; MusicXML output

Full rationale in ADR-001.


Open questions

18 unresolved decisions are tracked in SDD-001. The most blocking ones:

ID Question Blocks
Q-STREAM-1 Batch vs rolling-window streaming architecture Stream adapter, pipeline design
Q-TRX-2 Drum transcriptionresolved by ADR-003 (onset-only for v1) ✅ Closed
Q-SEP-1 Model loadingresolved by ADR-002 (singleton at worker startup) ✅ Closed
Q-API-1 Authentication model SDD-006

Development setup

python3.11 -m venv .venv-ml
source .venv-ml/bin/activate
pip install -e ".[dev,ml]"
# FFmpeg required for MP3/MP4 decoding
brew install ffmpeg

Contributing

See ../.github/CONTRIBUTING.md for how to add ADRs and SDDs.

Engineering questions → open an issue tagged engineering Pipeline design questions → open an issue tagged pipeline