Skip to content

Conversation

@andkirby
Copy link

@andkirby andkirby commented Nov 8, 2025

Summary

  • Added complete command-line interface for Kitten TTS
  • Implemented stdin pipeline support for text input
  • Added executable kitten-tts binary script
  • Created comprehensive CLI with all major features
  • No changes in core files

Key Features Added

  • Command-line interface with comprehensive argument parsing
  • Stdin pipeline support - can read text from pipes (e.g., echo "text" | kitten-tts)
  • Executable binary script for direct usage
  • Audio fade-out with customizable duration (default: 0.2s)
  • Text preprocessing - automatic dots suffix to prevent abrupt
    audio cutoff
  • Multi-format audio output (WAV, FLAC, OGG support)
  • Voice selection for all 8 available voices (expr-voice-2/m/f
    through expr-voice-5/m/f)
  • Speech speed control with float values
  • System audio playback across platforms (macOS, Linux, Windows)
  • Comprehensive help documentation with usage examples

Files Changed

  • kitten-tts - New executable wrapper script
  • kittentts/cli.py - Complete CLI implementation with pipeline
    support. And system installs the binary by path venv/kitten-tts

Usage Examples

# Basic usage with argument
kitten-tts "Hello world"

# Pipeline usage (stdin)
echo "Hello world" | ./kitten-tts
cat file.txt | kitten-tts --output audio.wav

# With specific voice and fade-out
kitten-tts "Hello world" --voice expr-voice-2-f --fade-out 0.3

# Save to file with custom speed
kitten-tts "Hello world" --output hello.wav --speed 1.2

# List available voices
kitten-tts --list-voices

iamgroot42 and others added 13 commits August 5, 2025 14:11
* Remove the `misaki` dependency, but directly depend on `phonemizer-fork` instead.
* Do the side-effect phonemizer initialization call by hand
- Add executable kitten-tts wrapper script
- Add kittentts/cli.py with full command-line interface
- Configure console script entry point in pyproject.toml
- Implement audio fade-out with customizable duration (default: 0.2s)
- Add automatic dots suffix to prevent audio cutoff
- Support all available voices, speed control, and audio formats
- Add joblib dependency for proper package installation
- Include comprehensive help documentation and examples

Features:
- Text-to-speech synthesis via command line
- Multiple voice options (expr-voice-2/m/f through expr-voice-5/m/f)
- Adjustable speech speed and fade-out duration
- Audio file output (WAV, FLAC, OGG) or direct playback
- Automatic text preprocessing to prevent abrupt cutoffs
- Implemented pipeline/stdin reading functionality
- Added support for piping text to kitten-tts command
- Updated help documentation with pipeline usage examples
- Enhanced error handling for stdin operations
- Maintained backward compatibility with argument-based input

Usage examples:
  echo "hello world" | ./kitten-tts
  cat text_file.txt | ./kitten-tts --output audio.wav
- Added comprehensive CLI usage section
- Documented installation and setup steps for CLI
- Listed all CLI features and available voices
- Added examples for both argument and stdin/pipeline usage
- Organized Python API and CLI sections separately
- Updated features list to highlight CLI functionality
- Organized CLI documentation in a collapsible details section
- Added structured subsections (Installation, Basic Usage, Advanced Options)
- Improved readability with better organization
- Maintained all CLI features and examples
- Made README more concise while preserving comprehensive information
- Changed 'Click to expand CLI usage instructions' to 'CLI Usage Instructions'
- More concise and cleaner collapsible section header
Major improvements:

🚀 CLI Performance:
- Implement lazy imports for instant help display (0.04s vs 2.2s)
- Add optimized entry point that only loads heavy dependencies when needed
- Refactor CLI into separate entry and processing modules

🎵 Audio System Enhancements:
- Add direct audio streaming with sounddevice library
- Implement fallback to system temp directory for temp files
- Fix permission issues when running from root directory
- Add proper temp file cleanup and error handling

📦 Package Structure:
- Update pyproject.toml to use optimized entry point
- Make package imports lazy to improve startup performance
- Add sounddevice as optional streaming dependency

💡 User Experience:
- Help commands now appear instantly
- Audio works from any directory including root
- Graceful fallback when sounddevice unavailable
- Maintains full CLI functionality with all existing features
@andkirby andkirby marked this pull request as draft November 8, 2025 21:40
@andkirby andkirby marked this pull request as ready for review November 8, 2025 21:42
- Combined formatting improvements from both branches
- Kept comprehensive CLI documentation
- Maintained proper spacing and structure
- Keep CLI script entry point from main
- Use Hatchling version configuration from fix-pkg
- Remove requirements.txt in favor of pyproject.toml dependencies
- Remove misaki and huggingface_hub dependencies
- Add phonemizer-fork dependency
- Clean up duplicate packaging files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants