Skip to content

Conversation

@sti0
Copy link
Contributor

@sti0 sti0 commented Jan 5, 2026

Description

Overview

Adds configuration options for passing extra arguments to audio players, overriding the TTS provider, and setting the server port via CLI flags. This enables voice notifications to work inside devcontainers and other containerized environments where audio requires special configuration (e.g., PulseAudio socket mounting).

Problem

The voice server previously:

  • Used hardcoded player paths (/usr/bin/mpg123, /snap/bin/mpv) that don't work when players are installed elsewhere
  • Had no way to pass extra arguments to audio players (needed for -o pulse in containers)
  • Required environment variables for all configuration with no CLI override option
  • Called which on every audio playback instead of caching the result

Solution

New CLI Flags

Flag Short Description Precedence
--extra-args=<args> - Extra arguments appended to audio player commands CLI > env > none
--tts-provider=<provider> -t Override TTS provider (google/elevenlabs) CLI > env > elevenlabs
--port=<port> -p Override server port CLI > env > 8888

New Environment Variable

Variable Description
VOICE_SERVER_EXTRA_ARGS Extra arguments for audio players (CLI takes precedence)

Dynamic Player Detection (Cached)

Replaced hardcoded paths with which-based detection, cached at startup:

  • detectPlayer() runs once at startup, stores result in DETECTED_PLAYER
  • findPlayer('mpg123') / findPlayer('mpv') - finds player regardless of install location
  • No repeated which calls during audio playback

Changes

Packs/kai-voice-system/src/voice/server.ts

Imports & CLI Parsing:

  • Added parseArgs import from Node.js util module
  • Added execSync import for running which commands
  • Added CLI argument parsing for --extra-args, --tts-provider/-t, --port/-p
  • PORT now uses CLI precedence: cliArgs.values.port || process.env.PAI_VOICE_PORT || "8888"

Player Detection:

  • Added findPlayer(name) function for dynamic player detection via which
  • Added detectPlayer() function that runs once at startup
  • Added DETECTED_PLAYER constant caching the detected player path and type
  • Updated playAudio() to use cached DETECTED_PLAYER instead of calling findPlayer() repeatedly

Configuration Functions:

  • Added getExtraArgs() function to resolve CLI/env args with precedence
  • Updated TTS_PROVIDER to accept CLI override via --tts-provider/-t

Logging:

  • Added startup logging for detected audio player path
  • Added startup logging for configured extra args
  • Added runtime logging showing full player command when extra args are used

Packs/kai-voice-system/README.md

  • Added "Audio Player Arguments" documentation section
  • Added "Devcontainer Setup" guide with example configuration
  • Added common use cases table

Usage Examples

Environment Variable:

# In $PAI_DIR/.env
VOICE_SERVER_EXTRA_ARGS="-o pulse"

CLI Flags:

# Extra player args
bun run server.ts --extra-args="-o pulse"

# TTS provider override
bun run server.ts --tts-provider=google
bun run server.ts -t elevenlabs

# Port override
bun run server.ts --port=9000
bun run server.ts -p 9000

# Combined
bun run server.ts --extra-args="-o pulse" -t google -p 9000

Devcontainer Setup:

{
  "mounts": [
    "source=/run/user/1000/pulse,target=/run/user/1000/pulse,type=bind"
  ],
  "containerEnv": {
    "PULSE_SERVER": "unix:/run/user/1000/pulse/native"
  }
}

Common Use Cases

Use Case Configuration
Container with PulseAudio VOICE_SERVER_EXTRA_ARGS="-o pulse"
Specific ALSA device VOICE_SERVER_EXTRA_ARGS="-o alsa -a hw:1,0"
Switch to Google TTS --tts-provider=google or -t google
Custom port --port=9000 or -p 9000

Logging Output

Startup: Shows detected player and configured extra args

🚀 Voice Server running on port 8888
🎙️  TTS Provider: ElevenLabs
🔊 Audio player: /usr/bin/mpg123
🔊 Extra player args: -o pulse

Runtime: Logs full player command when extra args are used

🔊 Playing audio: /usr/bin/mpg123 -q /tmp/voice-1736085437.mp3 -o pulse

Test Plan

  • Start server with VOICE_SERVER_EXTRA_ARGS="-o pulse" - verify startup log shows extra args
  • Start server with --extra-args="-o alsa" - verify CLI overrides env var
  • Start server with --tts-provider=google - verify TTS provider switches
  • Start server with -t elevenlabs - verify short flag works
  • Start server with --port=9000 - verify server starts on port 9000
  • Start server with -p 9000 - verify short flag works for port
  • Verify startup log shows detected audio player path
  • Test audio playback in devcontainer with PulseAudio mount
  • Verify findPlayer() locates mpg123/mpv correctly on Linux
  • Verify graceful fallback when no player is found

Breaking Changes

None. All changes are additive and backward compatible.

Dependencies

This depends partially on #322 where .env loading was enhanced.


Generated with Claude Code

@sti0 sti0 force-pushed the feature/voice-server-extra-args branch from 97a51fa to 09a326b Compare January 5, 2026 13:09
Adds three new CLI flags to the voice server:
- --extra-args: Pass additional arguments to audio players (e.g., "-o pulse")
- --tts-provider: Select TTS provider (google/elevenlabs)
- --port: Configure server port

Also includes fix for argument ordering - extra args now correctly
precede the filename so audio players recognize them.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@sti0 sti0 force-pushed the feature/voice-server-extra-args branch from 09a326b to 775c937 Compare January 5, 2026 14:05
The .env parser was keeping literal quote characters when values were
quoted (e.g., VOICE_SERVER_EXTRA_ARGS="-o pulse"). This caused mpg123
to receive broken arguments like '"-o' 'pulse"' instead of '-o' 'pulse'.

Changes:
- Use indexOf('=') instead of split('=') to handle values containing '='
- Strip surrounding double and single quotes from values

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ecielam
Copy link

ecielam commented Jan 5, 2026

suggest adding --host / PAI_VOICE_HOST overrides too. It's on my list but you already did all this in here! I'll add it later on if you don't want to though.

@sti0
Copy link
Contributor Author

sti0 commented Jan 5, 2026

suggest adding --host / PAI_VOICE_HOST overrides too. It's on my list but you already did all this in here! I'll add it later on if you don't want to though.

Hi @ecielam ,
sounds like a new feature right? There is no PAI_VOICE_HOST atm. If I interpret this right, you like to add a ENV to let the server run on another host. But then the ENV muss added to the hooks not to the server.

Correct me if I'm wrong. If so, feel free to contribute this feature in a different branch. There should be no blocking involved.

BR

@ecielam
Copy link

ecielam commented Jan 5, 2026

suggest adding --host / PAI_VOICE_HOST overrides too. It's on my list but you already did all this in here! I'll add it later on if you don't want to though.

Hi @ecielam , sounds like a new feature right? There is no PAI_VOICE_HOST atm. If I interpret this right, you like to add a ENV to let the server run on another host. But then the ENV muss added to the hooks not to the server.

Correct me if I'm wrong. If so, feel free to contribute this feature in a different branch. There should be no blocking involved.

BR

yeah, you're right it's technically a new feature ... I'll contribute later once your stuff is merged in and I get around to doing it.

@danielmiessler
Copy link
Owner

Thank you @sti0 for these voice CLI improvements! 🙏

With PAI v2.1, all packs moved to pai-* naming. Your voice system enhancements are valuable - feel free to re-apply against the new structure!

See the release: https://github.com/danielmiessler/PAI/releases/tag/v2.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants