Skip to content

Latest commit

 

History

History
281 lines (209 loc) · 10.1 KB

File metadata and controls

281 lines (209 loc) · 10.1 KB

Interview Audio MCP Server - Implementation Plan

Created: 2026-01-30 Project Manager: project-manager agent Architecture: software-architect agent


Project Overview

Repository

Objective

Build an MCP server for interview audio analysis deployed on game-da-god server (192.168.4.140:8420). The server will provide transcription, speaker diarization, tone analysis, and speech pattern detection for post-interview coaching.

Key Capabilities

  • Audio transcription with timestamps (Whisper large-v3)
  • Speaker diarization (pyannote.audio 3.1)
  • Tone, pace, and energy analysis (librosa)
  • Pause detection and filler word identification
  • Sentiment analysis per segment
  • Structured JSON output optimized for Claude Code consumption

Implementation Tickets

All 15 tickets have been created in GitHub Issues under the v1.0 - MVP milestone.

Project Board: https://github.com/users/krisoye/projects/3

Phase 1: Foundation (Issues #1-#3)

Estimated Effort: 8 hours

Issue Title Priority Effort Dependencies
#1 Repository Setup and Project Infrastructure P1-high 2h None
#2 Pydantic Schema Definitions for All Data Models P1-high 3h #1
#3 Audio Loader Module with Format Conversion P1-high 3h #1, #2

Critical Path: #1 → #2 → #3


Phase 2: Core Processors (Issues #4-#8)

Estimated Effort: 29 hours

Issue Title Priority Effort Dependencies
#4 Whisper Transcription Processor with Word-Level Timestamps P1-high 4h #3, #2
#5 pyannote Speaker Diarization Processor P1-high 8h #3, #4, #2
#6 Prosody Analysis Processor for Tone/Energy/Pace P1-high 8h #3, #5, #2
#7 Speech Pattern Detector (Pauses, Fillers, Overlaps) P1-high 6h #4, #5, #2
#8 Sentiment Analyzer for Transcript Segments P2-medium 3h #4, #5, #2

Critical Path: #4 → #5 → #6 (longest chain) Parallelization: After #5 completes, #6, #7, #8 can run in parallel


Phase 3: Server Integration (Issues #9-#10)

Estimated Effort: 8 hours

Issue Title Priority Effort Dependencies
#9 FastMCP Server Implementation with Tool Endpoints P1-high 4h #4-#8, #2
#10 Full Analysis Orchestrator Tool P1-high 4h #9, #4-#8, #2

Critical Path: #9 → #10


Phase 4: Testing (Issues #11-#12)

Estimated Effort: 10 hours

Issue Title Priority Effort Dependencies
#11 Unit Tests for All Processor Modules P1-high 6h #3-#8
#12 Integration Tests for End-to-End Pipeline P1-high 4h #10, #9, #11

Critical Path: #11 (parallel with #10) → #12


Phase 5: Deployment (Issues #13-#15)

Estimated Effort: 8 hours

Issue Title Priority Effort Dependencies
#13 Server Deployment Configuration for game-da-god P1-high 4h #9, #12
#14 MCP Client Configuration for Claude Code P1-high 2h #13
#15 interview-coach Agent Integration P2-medium 2h #14

Critical Path: #13 → #14 → #15


Critical Path Analysis

The absolute critical path (longest dependency chain):

#1 (2h) → #2 (3h) → #3 (3h) → #4 (4h) → #5 (8h) → #6 (8h) →
#9 (4h) → #10 (4h) → #12 (4h) → #13 (4h) → #14 (2h) → #15 (2h)

Critical Path Total: 48 hours

Parallelization Opportunities:

  • After #5: Can run #6, #7, #8 in parallel (saves ~6h)
  • After #10: Can run #11 in parallel with final integration work (saves ~4h)

Optimized Timeline: ~38-42 working hours (5-6 days of focused development)


Effort Summary

Phase Tickets Effort % of Total
Phase 1: Foundation 3 8h 13%
Phase 2: Core Processors 5 29h 46%
Phase 3: Server Integration 2 8h 13%
Phase 4: Testing 2 10h 16%
Phase 5: Deployment 3 8h 13%
TOTAL 15 63h 100%

Technology Stack

Core Technologies

Component Technology Version Purpose
MCP Framework FastMCP 0.4.x Python MCP server framework
Transcription OpenAI Whisper large-v3 Speech-to-text with timestamps
Diarization pyannote.audio 3.1.x Speaker identification
Audio Processing librosa 0.10.x Feature extraction
Audio I/O pydub + ffmpeg - Format conversion
Sentiment transformers 4.36+ Text classification
Data Validation Pydantic 2.x Schema validation

ML Models (Total: ~4.5GB)

  • whisper-large-v3 (~3GB)
  • pyannote/speaker-diarization-3.1 (~500MB)
  • pyannote/segmentation-3.0 (~100MB)
  • cardiffnlp/twitter-roberta-base-sentiment-latest (~500MB)

Deployment Architecture

Target Server: game-da-god

  • IP: 192.168.4.140
  • Port: 8420
  • CPU: i7-12700KF (20 cores)
  • RAM: 32GB
  • Storage: 1TB SSD with 944GB free
  • Python: 3.12
  • Mode: CPU-only (no GPU)

Performance Estimates

  • 30-minute interview transcription: ~15-20 minutes
  • Speaker diarization: ~5-10 minutes
  • Full analysis pipeline: ~25-35 minutes total

Network Configuration

  • Firewall: Allow port 8420 from 192.168.4.0/24 (LAN only)
  • No authentication (trusted local network)
  • systemd service for auto-start

Integration Points

Claude Code MCP Configuration

Add to ~/.claude.json:

{
  "mcpServers": {
    "interview-audio": {
      "type": "http",
      "url": "http://192.168.4.140:8420/mcp"
    }
  }
}

interview-coach Agent

Update ~/.claude/agents/interview-coach.md to use MCP tools:

  • Primary tool: interview-audio:full_interview_analysis
  • Input: Audio file path from interview-prep/ folder
  • Output: Structured analysis with coaching recommendations
  • Integration with existing interview-preparation skill

Risk Mitigation

Technical Risks

Risk Impact Mitigation
CPU-only slow processing Medium Set expectations (30-35 min for 30-min interview)
HuggingFace model access High Verify HF_TOKEN before deployment
OOM on long interviews Medium Implement graceful degradation, fallback to smaller models
Network connectivity Low LAN-only, stable connection

Schedule Risks

Risk Impact Mitigation
pyannote alignment complexity High Allocated 8 hours, may need buffer
Integration test failures Medium Comprehensive unit tests first
Model download time Low Pre-download during setup phase

Quality Gates

Definition of Done (per ticket)

  • Code implementation complete
  • Unit tests written and passing
  • Code passes linting (black, ruff, mypy)
  • Documentation updated (docstrings, README if needed)
  • PR created and reviewed
  • Merged to main branch

Milestone Acceptance Criteria (v1.0 MVP)

  • All 15 tickets completed
  • Integration tests passing
  • Server deployed to game-da-god
  • MCP client configured in Claude Code
  • Full pipeline analysis completes successfully on test interview
  • interview-coach agent can invoke tools

Next Steps

Immediate Actions (Human)

  1. Review this implementation plan
  2. Review architecture document: /mnt/c/Users/kriso/OneDrive/Documents/Professional-Income/Job-Search/Resources/Frameworks/INTERVIEW_AUDIO_ANALYSIS_MCP_ARCHITECTURE.md
  3. Approve project scope and timeline
  4. Assign to software-architect for implementation

Implementation Sequence (software-architect)

  1. Start with Issue #1 (repository setup)
  2. Work through critical path in sequence
  3. Parallelize non-blocking tickets where possible
  4. Use workspace manager for all code changes
  5. Create PRs for each ticket (or logical groups)
  6. Notify project-manager when PRs are ready for merge

Post-MVP Enhancements (Phase 2)

  • Real-time analysis with WebSocket streaming (requires GPU)
  • Multi-language support
  • Comparison analysis across multiple interviews
  • Historical trend tracking
  • GPU acceleration for 10x faster processing

References


Status: Ready for Implementation Estimated Completion: 5-6 working days (assuming single developer, full-time) Budget: 63 hours (includes 13-hour buffer over 50-hour initial estimate)


Generated by project-manager agent | 2026-01-30