Created: 2026-01-30 Project Manager: project-manager agent Architecture: software-architect agent
- Name: interview-audio-mcp
- URL: https://github.com/krisoye/interview-audio-mcp
- Visibility: Private
- Topics: mcp-server, audio-analysis, whisper, interview-coaching
Build an MCP server for interview audio analysis deployed on game-da-god server (192.168.4.140:8420). The server will provide transcription, speaker diarization, tone analysis, and speech pattern detection for post-interview coaching.
- Audio transcription with timestamps (Whisper large-v3)
- Speaker diarization (pyannote.audio 3.1)
- Tone, pace, and energy analysis (librosa)
- Pause detection and filler word identification
- Sentiment analysis per segment
- Structured JSON output optimized for Claude Code consumption
All 15 tickets have been created in GitHub Issues under the v1.0 - MVP milestone.
Project Board: https://github.com/users/krisoye/projects/3
Estimated Effort: 8 hours
| Issue | Title | Priority | Effort | Dependencies |
|---|---|---|---|---|
| #1 | Repository Setup and Project Infrastructure | P1-high | 2h | None |
| #2 | Pydantic Schema Definitions for All Data Models | P1-high | 3h | #1 |
| #3 | Audio Loader Module with Format Conversion | P1-high | 3h | #1, #2 |
Critical Path: #1 → #2 → #3
Estimated Effort: 29 hours
| Issue | Title | Priority | Effort | Dependencies |
|---|---|---|---|---|
| #4 | Whisper Transcription Processor with Word-Level Timestamps | P1-high | 4h | #3, #2 |
| #5 | pyannote Speaker Diarization Processor | P1-high | 8h | #3, #4, #2 |
| #6 | Prosody Analysis Processor for Tone/Energy/Pace | P1-high | 8h | #3, #5, #2 |
| #7 | Speech Pattern Detector (Pauses, Fillers, Overlaps) | P1-high | 6h | #4, #5, #2 |
| #8 | Sentiment Analyzer for Transcript Segments | P2-medium | 3h | #4, #5, #2 |
Critical Path: #4 → #5 → #6 (longest chain) Parallelization: After #5 completes, #6, #7, #8 can run in parallel
Estimated Effort: 8 hours
| Issue | Title | Priority | Effort | Dependencies |
|---|---|---|---|---|
| #9 | FastMCP Server Implementation with Tool Endpoints | P1-high | 4h | #4-#8, #2 |
| #10 | Full Analysis Orchestrator Tool | P1-high | 4h | #9, #4-#8, #2 |
Critical Path: #9 → #10
Estimated Effort: 10 hours
| Issue | Title | Priority | Effort | Dependencies |
|---|---|---|---|---|
| #11 | Unit Tests for All Processor Modules | P1-high | 6h | #3-#8 |
| #12 | Integration Tests for End-to-End Pipeline | P1-high | 4h | #10, #9, #11 |
Critical Path: #11 (parallel with #10) → #12
Estimated Effort: 8 hours
| Issue | Title | Priority | Effort | Dependencies |
|---|---|---|---|---|
| #13 | Server Deployment Configuration for game-da-god | P1-high | 4h | #9, #12 |
| #14 | MCP Client Configuration for Claude Code | P1-high | 2h | #13 |
| #15 | interview-coach Agent Integration | P2-medium | 2h | #14 |
Critical Path: #13 → #14 → #15
The absolute critical path (longest dependency chain):
#1 (2h) → #2 (3h) → #3 (3h) → #4 (4h) → #5 (8h) → #6 (8h) →
#9 (4h) → #10 (4h) → #12 (4h) → #13 (4h) → #14 (2h) → #15 (2h)
Critical Path Total: 48 hours
Parallelization Opportunities:
- After #5: Can run #6, #7, #8 in parallel (saves ~6h)
- After #10: Can run #11 in parallel with final integration work (saves ~4h)
Optimized Timeline: ~38-42 working hours (5-6 days of focused development)
| Phase | Tickets | Effort | % of Total |
|---|---|---|---|
| Phase 1: Foundation | 3 | 8h | 13% |
| Phase 2: Core Processors | 5 | 29h | 46% |
| Phase 3: Server Integration | 2 | 8h | 13% |
| Phase 4: Testing | 2 | 10h | 16% |
| Phase 5: Deployment | 3 | 8h | 13% |
| TOTAL | 15 | 63h | 100% |
| Component | Technology | Version | Purpose |
|---|---|---|---|
| MCP Framework | FastMCP | 0.4.x | Python MCP server framework |
| Transcription | OpenAI Whisper | large-v3 | Speech-to-text with timestamps |
| Diarization | pyannote.audio | 3.1.x | Speaker identification |
| Audio Processing | librosa | 0.10.x | Feature extraction |
| Audio I/O | pydub + ffmpeg | - | Format conversion |
| Sentiment | transformers | 4.36+ | Text classification |
| Data Validation | Pydantic | 2.x | Schema validation |
- whisper-large-v3 (~3GB)
- pyannote/speaker-diarization-3.1 (~500MB)
- pyannote/segmentation-3.0 (~100MB)
- cardiffnlp/twitter-roberta-base-sentiment-latest (~500MB)
- IP: 192.168.4.140
- Port: 8420
- CPU: i7-12700KF (20 cores)
- RAM: 32GB
- Storage: 1TB SSD with 944GB free
- Python: 3.12
- Mode: CPU-only (no GPU)
- 30-minute interview transcription: ~15-20 minutes
- Speaker diarization: ~5-10 minutes
- Full analysis pipeline: ~25-35 minutes total
- Firewall: Allow port 8420 from 192.168.4.0/24 (LAN only)
- No authentication (trusted local network)
- systemd service for auto-start
Add to ~/.claude.json:
{
"mcpServers": {
"interview-audio": {
"type": "http",
"url": "http://192.168.4.140:8420/mcp"
}
}
}Update ~/.claude/agents/interview-coach.md to use MCP tools:
- Primary tool:
interview-audio:full_interview_analysis - Input: Audio file path from interview-prep/ folder
- Output: Structured analysis with coaching recommendations
- Integration with existing interview-preparation skill
| Risk | Impact | Mitigation |
|---|---|---|
| CPU-only slow processing | Medium | Set expectations (30-35 min for 30-min interview) |
| HuggingFace model access | High | Verify HF_TOKEN before deployment |
| OOM on long interviews | Medium | Implement graceful degradation, fallback to smaller models |
| Network connectivity | Low | LAN-only, stable connection |
| Risk | Impact | Mitigation |
|---|---|---|
| pyannote alignment complexity | High | Allocated 8 hours, may need buffer |
| Integration test failures | Medium | Comprehensive unit tests first |
| Model download time | Low | Pre-download during setup phase |
- Code implementation complete
- Unit tests written and passing
- Code passes linting (black, ruff, mypy)
- Documentation updated (docstrings, README if needed)
- PR created and reviewed
- Merged to main branch
- All 15 tickets completed
- Integration tests passing
- Server deployed to game-da-god
- MCP client configured in Claude Code
- Full pipeline analysis completes successfully on test interview
- interview-coach agent can invoke tools
- Review this implementation plan
- Review architecture document:
/mnt/c/Users/kriso/OneDrive/Documents/Professional-Income/Job-Search/Resources/Frameworks/INTERVIEW_AUDIO_ANALYSIS_MCP_ARCHITECTURE.md - Approve project scope and timeline
- Assign to software-architect for implementation
- Start with Issue #1 (repository setup)
- Work through critical path in sequence
- Parallelize non-blocking tickets where possible
- Use workspace manager for all code changes
- Create PRs for each ticket (or logical groups)
- Notify project-manager when PRs are ready for merge
- Real-time analysis with WebSocket streaming (requires GPU)
- Multi-language support
- Comparison analysis across multiple interviews
- Historical trend tracking
- GPU acceleration for 10x faster processing
- Architecture Document: INTERVIEW_AUDIO_ANALYSIS_MCP_ARCHITECTURE.md (to be added)
- GitHub Repository: https://github.com/krisoye/interview-audio-mcp
- Project Board: https://github.com/users/krisoye/projects/3
- Issues: https://github.com/krisoye/interview-audio-mcp/issues
Status: Ready for Implementation Estimated Completion: 5-6 working days (assuming single developer, full-time) Budget: 63 hours (includes 13-hour buffer over 50-hour initial estimate)
Generated by project-manager agent | 2026-01-30