Skip to content

Architecture

Chris edited this page Jan 29, 2026 · 1 revision

Architecture

This page explains how the Sendspin SDK components work together to enable synchronized multi-room audio playback.

High-Level Overview

┌─────────────────────────────────────────────────────────────────┐
│                     Your Application                            │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │  SyncCorrectionCalculator  │  Your Resampler/Drop Logic │   │
│  │  (correction decisions)    │  (applies correction)      │   │
│  └─────────────────────────────────────────────────────────┘   │
├─────────────────────────────────────────────────────────────────┤
│  SendspinClientService    │  AudioPipeline    │  IAudioPlayer   │
│  (protocol handling)      │  (orchestration)  │  (your impl)    │
├─────────────────────────────────────────────────────────────────┤
│  SendspinConnection  │  KalmanClockSync  │  TimedAudioBuffer    │
│  (WebSocket)         │  (timing)         │  (reports error)     │
├─────────────────────────────────────────────────────────────────┤
│  OpusDecoder  │  FlacDecoder  │  PcmDecoder                     │
└─────────────────────────────────────────────────────────────────┘

Data Flow

Audio flows through the SDK like this:

Server                    SDK                           Your App
  │                        │                               │
  │  WebSocket binary      │                               │
  │  (audio chunks)        │                               │
  ├───────────────────────>│                               │
  │                        │                               │
  │                   ┌────┴────┐                          │
  │                   │ Decoder │                          │
  │                   │ (Opus/  │                          │
  │                   │  FLAC)  │                          │
  │                   └────┬────┘                          │
  │                        │ PCM samples                   │
  │                   ┌────┴────┐                          │
  │                   │ Timed   │                          │
  │                   │ Audio   │                          │
  │                   │ Buffer  │                          │
  │                   └────┬────┘                          │
  │                        │ samples + sync error          │
  │                   ┌────┴────┐                          │
  │                   │ Sample  │                          │
  │                   │ Source  │                          │
  │                   └────┬────┘                          │
  │                        │                               │
  │                        ├──────────────────────────────>│
  │                        │   IAudioPlayer.Read()         │
  │                        │                          ┌────┴────┐
  │                        │                          │ WASAPI/ │
  │                        │                          │ ALSA/   │
  │                        │                          │ CoreAu  │
  │                        │                          └────┬────┘
  │                        │                               │
  │                        │                          [Speakers]

Namespaces

Namespace Purpose Key Types
Sendspin.SDK.Client Protocol client, capabilities SendspinClientService, ClientCapabilities
Sendspin.SDK.Connection WebSocket transport SendspinConnection, ConnectionState
Sendspin.SDK.Audio Audio pipeline, buffer, decoders AudioPipeline, TimedAudioBuffer, IAudioPlayer
Sendspin.SDK.Synchronization Clock sync KalmanClockSynchronizer, HighPrecisionTimer
Sendspin.SDK.Discovery mDNS discovery MdnsServerDiscovery, DiscoveredServer
Sendspin.SDK.Protocol Message types MessageSerializer, protocol messages
Sendspin.SDK.Models Data models GroupState, TrackMetadata, AudioFormat

Key Components

SendspinClientService

The main entry point for connecting to servers. Handles:

  • WebSocket connection lifecycle
  • Protocol handshake (client/hello / server/hello)
  • Clock sync timing (burst of measurements on connect)
  • Message routing (text → handlers, binary → audio pipeline)
  • Command sending (play, pause, volume, etc.)
var client = new SendspinClientService(
    logger,
    connection,
    clockSync,
    capabilities,
    audioPipeline);  // Optional - enables automatic audio

// Connect and go!
await client.ConnectAsync(serverUri);

AudioPipeline

Orchestrates the audio flow from network to speakers:

States:

  • IdleStartingBufferingPlayingStoppingIdle
  • Can transition to Error from any state

Responsibilities:

  • Create decoder based on audio format
  • Manage timed buffer lifecycle
  • Wait for clock sync convergence
  • Control audio player (play/pause/stop)
var pipeline = new AudioPipeline(
    logger,
    decoderFactory,
    clockSync,
    bufferFactory,      // Creates ITimedAudioBuffer
    playerFactory,      // Creates IAudioPlayer
    sourceFactory,      // Creates IAudioSampleSource
    waitForConvergence: true);

KalmanClockSynchronizer

Synchronizes local time with server time using a Kalman filter. Key concepts:

The Problem: Server uses monotonic time (starts near 0), client uses Unix epoch. The offset can be billions of microseconds - this is normal.

The Solution: NTP-style 4-timestamp exchange:

  1. Client sends client/time with T1 (send time)
  2. Server responds with T2 (receive time), T3 (reply time)
  3. Client records T4 (receive time)
  4. Kalman filter estimates offset and drift

Key Properties:

  • IsConverged - True after 5+ stable measurements
  • HasMinimalSync - True after 2+ measurements (quick start)
  • StaticDelayMs - Manual sync offset for speaker latency tuning
// Convert server timestamp to local time
var localTime = clockSync.ServerToClientTime(serverTimestamp);

TimedAudioBuffer

Stores PCM samples with server timestamps. Handles:

  • Converting server timestamps to local playback time
  • Calculating sync error (drift between expected and actual playback)
  • Reporting when buffer is ready for playback

Key Properties:

  • SyncErrorMicroseconds - Raw sync error (positive = behind, negative = ahead)
  • SmoothedSyncErrorMicroseconds - EMA-filtered for stability
  • BufferedMilliseconds - Current buffer depth
  • IsReadyForPlayback - True when target buffer reached
// SDK calculates error, you apply correction
var syncError = buffer.SyncErrorMicroseconds;

// After applying drop/insert correction:
buffer.NotifyExternalCorrection(dropped, inserted);

IAudioPlayer (You Implement)

The interface between SDK and your audio backend:

public interface IAudioPlayer : IAsyncDisposable
{
    // State
    AudioPlayerState State { get; }
    float Volume { get; set; }
    bool IsMuted { get; set; }
    int OutputLatencyMs { get; }

    // Lifecycle
    Task InitializeAsync(AudioFormat format, CancellationToken ct);
    void SetSampleSource(IAudioSampleSource source);
    void Play();
    void Pause();
    void Stop();
    Task SwitchDeviceAsync(string? deviceId, CancellationToken ct);

    // Events
    event EventHandler<AudioPlayerState>? StateChanged;
    event EventHandler<AudioPlayerError>? ErrorOccurred;
}

Clock Sync Deep Dive

Why It Matters

For multi-room sync, all players need to agree on "now". A 10ms discrepancy is audible as an echo effect.

The Kalman Filter

The SDK uses a Kalman filter (not simple averaging) because:

  • Handles variable network latency
  • Estimates clock drift rate (how fast clocks diverge)
  • Adapts to changing network conditions
  • Converges quickly (~300-500ms for basic sync)

Timing Precision

Windows DateTime has ~15ms resolution - useless for audio sync. The SDK uses Stopwatch.GetTimestamp() which provides ~100ns resolution:

// Always use this for timing-critical code:
var timeMicroseconds = HighPrecisionTimer.Shared.GetCurrentTimeMicroseconds();

Sync Correction Architecture (v5.0+)

Starting with v5.0, sync correction is external. The SDK reports error; you decide how to correct.

Why External?

Different platforms have different optimal strategies:

  • Windows: WDL resampler (high quality)
  • Browser: Native playbackRate (Web Audio API)
  • Linux: Hardware rate adjustment (ALSA)
  • Embedded: Platform-specific DSP

Correction Flow

SDK (reports only)                    Your App (applies)
─────────────────                    ─────────────────
TimedAudioBuffer                     SyncCorrectionCalculator
├─ SyncErrorMicroseconds        ──>  ├─ UpdateFromSyncError()
├─ SmoothedSyncErrorMicroseconds     ├─ TargetPlaybackRate
├─ ReadRaw()                         ├─ DropEveryNFrames
└─ NotifyExternalCorrection()   <──  └─ InsertEveryNFrames

Tiered Strategy

Error Method Notes
< 1ms None Imperceptible
1-15ms Rate adjust Smooth, uses resampling
15-500ms Drop/insert Faster correction
> 500ms Re-anchor Clear buffer, restart

Connection Modes

Client-Initiated (Primary)

Your app discovers servers and connects to them:

var discovery = new MdnsServerDiscovery(logger);
await discovery.StartAsync();
// Discovers _sendspin-server._tcp services

Server-Initiated (Host Mode)

Your app advertises itself; servers connect to you:

var hostService = new SendspinHostService(loggerFactory, capabilities, pipeline, clockSync);
await hostService.StartAsync();
// Advertises as _sendspin._tcp, servers connect to us

Protocol Messages

Text Messages (JSON)

Type Direction Purpose
client/hello C→S Handshake with capabilities
server/hello S→C Server identification
client/time C→S Clock sync request
server/time S→C Clock sync response (4 timestamps)
stream/start S→C Audio stream begins
stream/end S→C Audio stream ends
stream/clear S→C Clear buffer (seek/track change)
group/update S→C Playback state, metadata
client/command C→S Play, pause, next, etc.

Binary Messages

Format: [1 byte type][8 bytes timestamp][payload]

Type (first byte) Purpose
4-7 Audio data (slots 0-3)
8-11 Artwork (slots 0-3)
16-23 Visualizer data

See Also: