Skip to content

Latest commit

 

History

History
55 lines (45 loc) · 3.62 KB

File metadata and controls

55 lines (45 loc) · 3.62 KB

Architecture

Overview

The Audio Analysis MCP Server is a FastMCP application that exposes audio analysis capabilities as Model Context Protocol tools over Streamable HTTP. It wraps best-in-class ML libraries (Whisper, pyannote.audio, librosa, HuggingFace Transformers) behind a clean, stateless API.

Component Diagram

┌─────────────────────────────────────────────────┐
│              MCP Client (Claude)                 │
└──────────────────────┬──────────────────────────┘
                       │ Streamable HTTP
┌──────────────────────▼──────────────────────────┐
│            FastMCP Server (server.py)            │
│          HTTP /health  ·  /mcp endpoint          │
├─────────────────────────────────────────────────┤
│                  Tools Layer                     │
│  ┌────────────┐  ┌───────────┐  ┌────────────┐  │
│  │orchestrat. │  │transcribe │  │ diarize    │  │
│  │(pipeline)  │  │           │  │            │  │
│  ├────────────┤  ├───────────┤  ├────────────┤  │
│  │  prosody   │  │ patterns  │  │ sentiment  │  │
│  └────────────┘  └───────────┘  └────────────┘  │
├─────────────────────────────────────────────────┤
│               Processors Layer                   │
│   WhisperProc · DiarizationProc · ProsodyProc    │
│      PatternsProc · SentimentProc                │
├─────────────────────────────────────────────────┤
│           Model Management (loader.py)           │
│    GPU detection · VRAM mode · Model caching     │
└─────────────────────────────────────────────────┘

Data Flow

  1. Request — MCP client sends tool call with audio file path
  2. Validation — File existence, format, and duration validated
  3. Routing — Single tool → direct processor; full_analysis → orchestrator pipeline
  4. Inference — Models loaded on-demand (GPU if available, CPU fallback)
  5. Response — Structured Pydantic models serialized to JSON

Key Design Decisions

Decision Rationale
Stateless HTTP transport Eliminates session affinity issues on server restart
Feature flags Deploy with partial capabilities (e.g., transcription-only)
Processors ↔ Tools separation Tools own MCP interface; processors own ML inference
Low-VRAM sequential loading Enables deployment on 4–6 GB GPUs without OOM errors
Pydantic models throughout Runtime validation + automatic JSON schema for MCP tool signatures

Configuration

All settings are environment-variable driven via pydantic-settings. See docs/CONFIGURATION.md for the complete reference.