|
1 | 1 | <p align="center"> |
2 | | - <img src="assets/banner.jpeg" alt="speak - High performance CLI tool your agent can use to generate life like speech, real time on Apple Silicon" width="100%"> |
| 2 | + <img src="assets/banner.jpeg" alt="speak - Talk to your Claude" width="100%"> |
3 | 3 | </p> |
4 | 4 |
|
5 | | -A fast CLI tool for AI agents to convert their text output to speech using Chatterbox TTS on Apple Silicon. |
| 5 | +``` |
| 6 | + ┌─────────────────────────────────────────────────────────────┐ |
| 7 | + │ │ |
| 8 | + │ ███████╗██████╗ ███████╗ █████╗ ██╗ ██╗ │ |
| 9 | + │ ██╔════╝██╔══██╗██╔════╝██╔══██╗██║ ██╔╝ │ |
| 10 | + │ ███████╗██████╔╝█████╗ ███████║█████╔╝ │ |
| 11 | + │ ╚════██║██╔═══╝ ██╔══╝ ██╔══██║██╔═██╗ │ |
| 12 | + │ ███████║██║ ███████╗██║ ██║██║ ██╗ │ |
| 13 | + │ ╚══════╝╚═╝ ╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝ │ |
| 14 | + │ │ |
| 15 | + │ Talk to your Claude. │ |
| 16 | + │ │ |
| 17 | + └─────────────────────────────────────────────────────────────┘ |
| 18 | +``` |
6 | 19 |
|
7 | | -## Install as Agent Skill |
| 20 | +<p align="center"> |
| 21 | + <strong>Voice cloning. Long documents. Audiobook quality. Local & private.</strong> |
| 22 | +</p> |
8 | 23 |
|
9 | | -Add this skill to Claude Code, Cursor, Windsurf, and other AI agents: |
| 24 | +<p align="center"> |
| 25 | + <code>speak article.md --stream</code> → Audio starts in seconds |
| 26 | +</p> |
10 | 27 |
|
11 | | -```bash |
12 | | -npx skills add EmZod/speak |
13 | | -``` |
| 28 | +--- |
14 | 29 |
|
15 | | -## Quick Start |
| 30 | +## Install |
16 | 31 |
|
| 32 | +**For AI Agents** (Claude Code, Cursor, Windsurf): |
17 | 33 | ```bash |
18 | | -git clone https://github.com/EmZod/speak.git |
19 | | -cd speak |
20 | | -bun install |
21 | | - |
22 | | -# First run auto-installs Python dependencies |
23 | | -bun run src/index.ts "Hello, world!" --play |
| 34 | +npx skills add EmZod/speak |
24 | 35 | ``` |
25 | 36 |
|
26 | | -Create an alias for easier access: |
| 37 | +**CLI:** |
27 | 38 | ```bash |
| 39 | +git clone https://github.com/EmZod/speak.git |
| 40 | +cd speak && bun install |
28 | 41 | alias speak="bun run $(pwd)/src/index.ts" |
29 | 42 | ``` |
30 | 43 |
|
31 | | -## Requirements |
| 44 | +**Requirements:** macOS Apple Silicon · Bun · Python 3.10+ · sox (`brew install sox`) |
32 | 45 |
|
33 | | -- macOS with Apple Silicon (M Series) |
34 | | -- [Bun](https://bun.sh) |
35 | | -- Python 3.10+ |
36 | | -- sox (for long documents): `brew install sox` |
| 46 | +--- |
37 | 47 |
|
38 | | -## Basic Usage |
| 48 | +## Usage |
39 | 49 |
|
40 | 50 | ```bash |
41 | 51 | speak "Hello, world!" --play # Generate and play |
42 | | -speak article.md --stream # Stream long content |
43 | | -speak --clipboard --play # Read from clipboard |
| 52 | +speak article.md --stream # Stream long content |
44 | 53 | speak document.md --output out.wav # Save to file |
| 54 | +speak --clipboard --play # Read from clipboard |
45 | 55 | ``` |
46 | 56 |
|
47 | | -## Key Features |
| 57 | +--- |
48 | 58 |
|
49 | | -```bash |
50 | | -# Long documents - auto-chunk for reliability |
51 | | -speak book.md --auto-chunk --output book.wav |
| 59 | +## Voice Cloning |
52 | 60 |
|
53 | | -# Resume interrupted generation |
54 | | -speak --resume manifest.json |
| 61 | +Clone any voice from a 10-30 second sample: |
55 | 62 |
|
56 | | -# Batch processing |
57 | | -speak *.md --output-dir ~/Audio/ |
| 63 | +```bash |
| 64 | +# Use your cloned voice |
| 65 | +speak "Hello" --voice ~/.chatter/voices/morgan_freeman.wav --play |
| 66 | +``` |
58 | 67 |
|
59 | | -# Estimate duration before generating |
60 | | -speak --estimate document.md |
| 68 | +--- |
61 | 69 |
|
62 | | -# Concatenate audio files |
63 | | -speak concat part1.wav part2.wav --out combined.wav |
| 70 | +## Long Documents |
| 71 | + |
| 72 | +```bash |
| 73 | +speak book.md --auto-chunk --output book.wav # Auto-chunk for reliability |
| 74 | +speak --resume manifest.json # Resume interrupted generation |
| 75 | +speak *.md --output-dir ~/Audio/ # Batch processing |
| 76 | +speak --estimate document.md # Estimate duration first |
64 | 77 | ``` |
65 | 78 |
|
66 | | -## Commands |
| 79 | +--- |
67 | 80 |
|
68 | | -| Command | Description | |
69 | | -|---------|-------------| |
70 | | -| `speak <text\|file>` | Generate speech | |
71 | | -| `speak health` | Check system status | |
72 | | -| `speak models` | List available models | |
73 | | -| `speak concat <files>` | Combine audio files | |
74 | | -| `speak daemon kill` | Stop TTS server | |
75 | | - |
76 | | -## Common Options |
77 | | - |
78 | | -| Option | Description | |
79 | | -|--------|-------------| |
80 | | -| `--play` | Play after generation | |
81 | | -| `--stream` | Stream as it generates | |
82 | | -| `--output <path>` | Output file or directory | |
83 | | -| `--auto-chunk` | Chunk long documents | |
84 | | -| `--estimate` | Show duration estimate | |
85 | | -| `--dry-run` | Preview without generating | |
| 81 | +## Commands |
86 | 82 |
|
87 | | -## Documentation |
| 83 | +``` |
| 84 | +speak <text|file> Generate speech |
| 85 | +speak health Check system status |
| 86 | +speak models List available models |
| 87 | +speak concat <files> Combine audio files |
| 88 | +speak daemon kill Stop TTS server |
| 89 | +``` |
88 | 90 |
|
89 | | -- **[docs/usage.md](docs/usage.md)** - Complete usage guide |
90 | | -- **[docs/configuration.md](docs/configuration.md)** - Config file, environment variables, shell setup |
91 | | -- **[docs/troubleshooting.md](docs/troubleshooting.md)** - Common issues and fixes |
92 | | -- **[SKILL.md](SKILL.md)** - Agent-optimized reference |
93 | | -- **[CHANGELOG.md](CHANGELOG.md)** - Version history |
94 | | -- **[.agentic/](.agentic/)** - Agentic engineering artifacts (optimization reports, focus group tests) |
| 91 | +--- |
95 | 92 |
|
96 | | -## Development |
| 93 | +## Options |
97 | 94 |
|
98 | | -```bash |
99 | | -bun install # Install dependencies |
100 | | -bun test # Run tests |
101 | | -bun run typecheck # Type check |
| 95 | +``` |
| 96 | +--play Play after generation |
| 97 | +--stream Stream as it generates |
| 98 | +--output Output file or directory |
| 99 | +--voice Custom voice file (WAV) |
| 100 | +--auto-chunk Chunk long documents |
| 101 | +--estimate Show duration estimate |
| 102 | +--dry-run Preview without generating |
102 | 103 | ``` |
103 | 104 |
|
104 | | -## For AI Agents |
| 105 | +--- |
105 | 106 |
|
106 | | -**Recommended:** Install via the skills registry: |
107 | | -```bash |
108 | | -npx skills add EmZod/speak |
109 | | -``` |
| 107 | +## Performance |
110 | 108 |
|
111 | | -Or manually copy [SKILL.md](SKILL.md) to your agent's skills directory: |
112 | | -```bash |
113 | | -cp SKILL.md ~/.claude/skills/speak-tts/SKILL.md |
114 | 109 | ``` |
| 110 | +Long documents ████████████████████ Streaming, auto-chunk |
| 111 | +Voice cloning ████████████████████ Any voice from sample |
| 112 | +Emotion tags ████████████████████ [laugh], [sigh], etc. |
| 113 | +Quality ████████████████████ Audiobook grade |
| 114 | +``` |
| 115 | + |
| 116 | +--- |
| 117 | + |
| 118 | +## See Also |
| 119 | + |
| 120 | +Need instant audio (~90ms)? Try [**speakturbo**](https://github.com/EmZod/Speak-Turbo). |
115 | 121 |
|
116 | | -See [AGENTS.md](AGENTS.md) for setup details. |
| 122 | +--- |
117 | 123 |
|
118 | | -## License |
| 124 | +## Documentation |
| 125 | + |
| 126 | +| File | Content | |
| 127 | +|------|---------| |
| 128 | +| [SKILL.md](SKILL.md) | Full usage guide for agents | |
| 129 | +| [docs/usage.md](docs/usage.md) | Complete CLI reference | |
| 130 | +| [docs/troubleshooting.md](docs/troubleshooting.md) | Common issues & fixes | |
| 131 | +| [AGENTS.md](AGENTS.md) | Architecture & development | |
| 132 | + |
| 133 | +--- |
119 | 134 |
|
120 | | -MIT |
| 135 | +<p align="center"> |
| 136 | + <sub>MIT License · Built on <a href="https://github.com/resemble-ai/chatterbox">Chatterbox TTS</a></sub> |
| 137 | +</p> |
0 commit comments