diff --git a/README.md b/README.md
index 4dcacf6..c8c084f 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,5 @@
-
- md2audio
-
+# md2audio
+
@@ -25,603 +24,281 @@
-Convert markdown H2 sections to individual audio files using multiple TTS (Text-to-Speech) providers including macOS `say`, Linux `espeak-ng`, and ElevenLabs API.
-
-## Features
-
-- **Cross-Platform TTS Providers**: macOS `say`, Linux `espeak-ng`, and ElevenLabs API
-- **Automatic Platform Detection**: Uses the best provider for your OS automatically
-- **Process files or directories** recursively with structure mirroring
-- **Target duration control**: Adjust timing with annotations like `(8s)`
-- **Multiple formats**: AIFF, M4A, and MP3 output
-- **Voice caching**: Fast lookups with SQLite WAL mode
-- **Developer-friendly**: Debug mode, dry-run preview, progress indicators
-
-## Prerequisites
-
-### For macOS say Provider (Default on macOS)
-
-- macOS (uses built-in `say` command)
-- Go 1.25 or later (to build the tool)
-
-### For Linux espeak Provider (Default on Linux)
-
-- Linux (Ubuntu, Debian, Fedora, Arch, etc.)
-- Go 1.25 or later (to build the tool)
-- `espeak-ng` or `espeak` installed:
-
- ```bash
- # Ubuntu/Debian
- sudo apt install espeak-ng ffmpeg
+Convert Markdown H2 sections to individual audio files using multiple TTS providers.
- # Fedora/RHEL
- sudo dnf install espeak-ng ffmpeg
+> [!WARNING]
+> This project is under active development. You may encounter bugs or incomplete features. Please report any issues on the [GitHub issue tracker](https://github.com/indaco/md2audio/issues).
- # Arch Linux
- sudo pacman -S espeak-ng ffmpeg
- ```
-
-- `ffmpeg` for audio format conversion (MP3, M4A support)
-
-### For ElevenLabs Provider (Works on all platforms)
+## Features
-- Any OS (Windows, macOS, Linux)
-- Go 1.25 or later (to build the tool)
-- ElevenLabs API key ([Get one here](https://elevenlabs.io/))
-- Set `ELEVENLABS_API_KEY` environment variable or create `.env` file
+- **Multiple TTS Providers**: Choose from macOS say, Linux espeak, Google Cloud TTS, or ElevenLabs
+- **Cross-Platform**: Works on macOS, Linux, and Windows (with cloud providers)
+- **Automatic Platform Detection**: Uses the best provider for your OS by default
+- **Timing Control**: Specify target durations with annotations like `(8s)`
+- **Batch Processing**: Process files or entire directories recursively
+- **Voice Caching**: Fast voice lookups with SQLite-based caching
+- **Multiple Formats**: AIFF, M4A, MP3, WAV, OGG output formats
+- **Developer Tools**: Debug mode, dry-run preview, progress indicators
## Installation
-### Using go install
+### 1. Global Install (via go install)
```bash
go install github.com/indaco/md2audio/cmd/md2audio@latest
```
-### Building from source
+### 2. Prebuilt binaries
+
+Download the pre-compiled binaries from the [releases page](https://github.com/md2audio/tempo/releases) and move the binary to a folder in your system's PATH.
+
+### 3. Build from Source
```bash
git clone https://github.com/indaco/md2audio.git
cd md2audio
-go build -o md2audio ./cmd/md2audio
+go build -o md2audio ./cmd/md2audio # move the binary to a folder in your system's PATH
```
-The binary will be created in the current directory. You can move it to a location in your PATH:
+or with [just](https://just.systems/man/en/)
```bash
-sudo mv md2audio /usr/local/bin/
+just install
```
-## TTS Providers
+## Basic Usage
-md2audio supports multiple Text-to-Speech providers. The best provider for your platform is selected automatically:
+```bash
+# Process a markdown file (uses default provider for your OS)
+./md2audio -f script.md -p british-female
-### macOS say (Default on macOS)
+# Process entire directory
+./md2audio -d ./docs -p british-female -o ./audio
-- **Platform**: macOS only
-- **Cost**: Free (built-in)
-- **Setup**: No configuration needed
-- **Quality**: Good for local development and testing
-- **Formats**: AIFF, M4A
-- **Voices**: ~70 voices in various languages
+# List available voices
+./md2audio -list-voices
+```
-### Linux espeak-ng (Default on Linux)
+## TTS Providers
-- **Platform**: Linux only
-- **Cost**: Free (open-source)
-- **Setup**: Install `espeak-ng` and `ffmpeg`
-- **Quality**: Good for local development and testing
-- **Formats**: WAV, MP3, M4A, AIFF (via ffmpeg)
-- **Voices**: 50+ voices in various languages
-- **Voice Mapping**: Automatically maps macOS voice names (e.g., "Kate" → en-gb)
+md2audio supports multiple text-to-speech providers. Choose the one that best fits your needs:
-### ElevenLabs
+| Provider | Platform | Cost | Quality | Best For |
+| ------------------------------------------------ | -------- | ---- | ------- | ------------------------------ |
+| **[say](docs/providers/say.md)** | macOS | Free | Good | Local dev/testing |
+| **[espeak](docs/providers/espeak.md)** | Linux | Free | Basic | Linux dev/testing |
+| **[Google Cloud TTS](docs/providers/google.md)** | All | Paid | Premium | Enterprise, multi-language |
+| **[ElevenLabs](docs/providers/elevenlabs.md)** | All | Paid | Premium | Production content, audiobooks |
-- **Platform**: Cross-platform (works on any OS)
-- **Cost**: Paid API ([Pricing](https://elevenlabs.io/pricing))
-- **Setup**: Requires API key
-- **Quality**: Premium, highly realistic voices
-- **Formats**: MP3
-- **Voices**: Multiple professional voices with emotional control
+**[Compare Providers](docs/provider-comparison.md)** - Detailed comparison to help you choose
-#### Setting up ElevenLabs
+### Quick Provider Examples
-1. Get your API key from [ElevenLabs](https://elevenlabs.io/)
+```bash
+# macOS say (default on macOS)
+./md2audio -f script.md -p british-female
-2. Set the environment variable:
+# Linux espeak (default on Linux)
+./md2audio -f script.md -provider espeak -v en-gb
- ```bash
- export ELEVENLABS_API_KEY='your-api-key'
- ```
+# Google Cloud TTS
+./md2audio -provider google -google-voice en-US-Neural2-F -f script.md
-3. Or create a `.env` file in your project directory:
+# ElevenLabs
+./md2audio -provider elevenlabs -elevenlabs-voice-id VOICE_ID -f script.md
+```
- ```bash
- # Copy the example file
- cp .env.example .env
- # Then edit .env and add your API key
- ```
+## Markdown Format
- Or create it directly:
+Use H2 headers (`##`) to denote sections. Add optional timing annotations:
- ```bash
- echo 'ELEVENLABS_API_KEY=your-api-key' > .env
- ```
+```markdown
+## Introduction (8s)
-4. (Optional) Configure voice settings in `.env`:
+This section will be adjusted to approximately 8 seconds.
- ```bash
- # Voice quality settings (all optional, with sensible defaults)
- ELEVENLABS_STABILITY=0.5 # Voice consistency (0.0-1.0, default: 0.5)
- ELEVENLABS_SIMILARITY_BOOST=0.5 # Voice similarity (0.0-1.0, default: 0.5)
- ELEVENLABS_STYLE=0.0 # Voice style/emotion (0.0-1.0, default: 0.0)
- ELEVENLABS_USE_SPEAKER_BOOST=true # Boost similarity (true/false, default: true)
- ELEVENLABS_SPEED=1.0 # Default speed for non-timed sections (0.7-1.2, default: 1.0)
- ```
+## Main Content (5-10s)
- **Note:**
- - `ELEVENLABS_SPEED` only applies to sections WITHOUT timing annotations
- - Sections with `(5s)` timing will calculate speed automatically
- - Higher stability = more consistent but less expressive
- - Higher similarity_boost = closer to original voice characteristics
- - Style adds emotional range (0 = disabled, higher = more expressive)
+This targets 10 seconds (end time is used).
-5. List available voices:
+## Conclusion
- ```bash
- ./md2audio -provider elevenlabs -list-voices
- ```
+No timing specified - uses default speaking rate.
+```
-## Usage
+**Supported timing formats**: `(8s)`, `(10.5s)`, `(0-8s)`, `(15 seconds)`
-### Basic Examples
+## Command Line Options
-#### Using Default Provider (say on macOS, espeak on Linux)
+### General Options
-```bash
-# Check version
-./md2audio -version
+| Flag | Description | Default |
+| -------------- | ------------------------------------------------------ | ------------------------------ |
+| `-f` | Input markdown file | - |
+| `-d` | Input directory (recursive) | - |
+| `-o` | Output directory | `./audio_sections` |
+| `-provider` | TTS provider (`say`, `espeak`, `elevenlabs`, `google`) | Auto-detect |
+| `-format` | Output format (`aiff`, `m4a`, `mp3`, `wav`, `ogg`) | `aiff` (macOS) / `wav` (Linux) |
+| `-prefix` | Filename prefix | `section` |
+| `-list-voices` | List available voices | - |
+| `-version` | Print version | - |
+| `-debug` | Enable debug logging | `false` |
+| `-dry-run` | Preview without generating files | `false` |
-# List available voices (automatically uses the best provider for your OS)
-./md2audio -list-voices
+### Provider-Specific Options
-# Process a single markdown file with voice preset
-# Works on both macOS (say) and Linux (espeak) automatically!
-./md2audio -f script.md -p british-female
+Each provider has its own configuration options. See the provider guides for details:
-# Process entire directory recursively
-./md2audio -d ./docs -p british-female
+- **say/espeak**: `-p` (voice preset), `-v` (voice name), `-r` (speaking rate)
+- **Google Cloud**: `-google-voice`, `-google-language`, `-google-credentials`, `-google-speed`, `-google-pitch`, `-google-volume`
+- **ElevenLabs**: `-elevenlabs-voice-id`, `-elevenlabs-model`, `-elevenlabs-api-key` (voice settings via env vars)
-# Use specific voice with slower rate for clarity
-# On macOS: uses "Kate" voice directly
-# On Linux: maps "Kate" to "en-gb" voice automatically
-./md2audio -f script.md -v Kate -r 170
+**[Provider Documentation](docs/providers/)** for complete option lists
-# Generate M4A files instead of default format
-# macOS default: AIFF, Linux default: WAV
-./md2audio -d ./content -p british-female -format m4a
+## Examples
-# Custom output directory and prefix
-./md2audio -f script.md -o ./voiceovers -prefix demo
+### Basic Examples
-# Preview what would be generated (dry-run mode)
+```bash
+# Preview what would be generated
./md2audio -f script.md -p british-female -dry-run
-# Enable debug logging to troubleshoot issues
-./md2audio -f script.md -p british-female -debug
-
-# Combine dry-run with debug for detailed preview
-./md2audio -d ./docs -p british-female -dry-run -debug
+# Generate M4A files instead of AIFF
+./md2audio -f script.md -p british-female -format m4a
-# Explicitly use espeak provider (on any Linux system)
-./md2audio -f script.md -provider espeak -v en-gb
+# Process directory with custom output location
+./md2audio -d ./content -p us-female -o ./voiceovers
-# Explicitly use say provider (on macOS)
-./md2audio -f script.md -provider say -v Kate
+# Enable debug logging
+./md2audio -f script.md -debug
```
-#### Using ElevenLabs Provider
+### Provider-Specific Examples
```bash
-# List available ElevenLabs voices (cached for faster access)
-./md2audio -provider elevenlabs -list-voices
-
-# Refresh voice cache (when new voices are available)
-./md2audio -provider elevenlabs -list-voices -refresh-cache
-
-# Export voices to JSON for reference
-./md2audio -provider elevenlabs -export-voices elevenlabs_voices.json
-
-# Process a single file with ElevenLabs
-./md2audio -provider elevenlabs \
- -elevenlabs-voice-id 21m00Tcm4TlvDq8ikWAM \
- -f script.md
-
-# Process entire directory with ElevenLabs
+# Google Cloud TTS with Neural2 voice
+export GOOGLE_APPLICATION_CREDENTIALS="/path/to/creds.json"
+./md2audio -provider google \
+ -google-voice en-US-Neural2-F \
+ -format mp3 \
+ -d ./docs
+
+# ElevenLabs with custom settings
+export ELEVENLABS_API_KEY='your-key'
./md2audio -provider elevenlabs \
-elevenlabs-voice-id 21m00Tcm4TlvDq8ikWAM \
- -d ./docs \
- -o ./audio_output
-
-# Use specific ElevenLabs model
-./md2audio -provider elevenlabs \
- -elevenlabs-voice-id YOUR_VOICE_ID \
-elevenlabs-model eleven_multilingual_v2 \
-f script.md
-```
-
-### Debug Mode
-
-Enable debug logging to troubleshoot issues or understand what's happening under the hood:
-
-```bash
-# Enable debug logging
-./md2audio -f script.md -p british-female -debug
-```
-
-**Debug mode shows:**
-
-- Cache hits/misses for voice lookups
-- API request details (ElevenLabs)
-- File processing progress
-- Internal operation details
-
-**When to use debug mode:**
-
-- Troubleshooting API issues with ElevenLabs
-- Understanding cache behavior
-- Investigating performance problems
-- Reporting bugs with detailed logs
-
-### Dry-Run Mode
-
-Preview what would be generated without creating any audio files:
-
-```bash
-# Dry-run mode - shows what would be generated
-./md2audio -f script.md -p british-female -dry-run
-
-# Combine with debug for maximum visibility
-./md2audio -d ./docs -provider elevenlabs -elevenlabs-voice-id YOUR_ID -dry-run -debug
-```
-
-**Dry-run mode shows:**
-
-- Which sections would be processed
-- Output file paths that would be created
-- Timing information for timed sections
-- Preview of text content
-
-**When to use dry-run mode:**
-
-- Testing markdown format before generation
-- Verifying output paths and filenames
-- Checking section count and structure
-- Planning batch processing jobs
-
-**Example output:**
-
-```
-💡 DRY-RUN MODE: No files will be created
-ℹ Section 1/3:
- - title: Introduction
- 💡 Target duration: 8.0 seconds
- 💡 Text: Welcome to this demonstration...
- Would create: ./audio_sections/section_01_introduction.aiff
-
-ℹ Section 2/3:
- - title: Main Content
- 💡 Text: Here is the main content...
- Would create: ./audio_sections/section_02_main_content.aiff
-
-✔ Would generate 3 audio files
+# espeak on Linux with MP3 output
+./md2audio -provider espeak \
+ -v en-gb \
+ -format mp3 \
+ -d ./docs
```
-### Voice Caching
+## Voice Caching
-To improve performance, md2audio caches voice lists from providers. This is especially useful for ElevenLabs to avoid repeated API calls:
+md2audio caches voice lists locally for faster access:
```bash
-# First call - fetches from API and caches (slower)
-./md2audio -provider elevenlabs -list-voices
+# First call - fetches from provider and caches
+./md2audio -provider google -list-voices
# Subsequent calls - uses cache (instant)
-./md2audio -provider elevenlabs -list-voices
+./md2audio -provider google -list-voices
-# Force refresh when new voices are available
-./md2audio -provider elevenlabs -list-voices -refresh-cache
+# Force refresh when new voices available
+./md2audio -provider google -list-voices -refresh-cache
-# Export cached voices to JSON file for reference
-./md2audio -provider elevenlabs -export-voices elevenlabs_voices.json
-./md2audio -provider say -export-voices say_voices.json
+# Export to JSON for reference
+./md2audio -provider google -export-voices voices.json
```
-**Cache Details:**
-
-- **Location**: `~/.md2audio/voice_cache.db` (SQLite database)
-- **Duration**: 30 days (voices don't change frequently)
-- **Benefits**: Instant voice listing, reduced API calls, offline access to voice list
-- **Refresh**: Use `-refresh-cache` flag when you know new voices are available
-
-### Command Line Options
-
-#### General Options
-
-| Flag | Description | Default |
-| ---------------- | --------------------------------------------------- | ----------------------- |
-| `-f` | Input markdown file (use `-f` or `-d`) | - |
-| `-d` | Input directory (recursive, use `-f` or `-d`) | - |
-| `-o` | Output directory | `./audio_sections` |
-| `-format` | Output format | `aiff` |
-| `-prefix` | Filename prefix | `section` |
-| `-list-voices` | List all available voices (uses cache if available) | - |
-| `-refresh-cache` | Force refresh of voice cache | `false` |
-| `-export-voices` | Export cached voices to JSON file | - |
-| `-provider` | TTS provider (`say`, `espeak`, or `elevenlabs`) | Auto-detect by platform |
-| `-version` | Print version and exit | - |
-| `-debug` | Enable debug logging | `false` |
-| `-dry-run` | Show what would be generated without creating files | `false` |
-
-#### say/espeak Provider Options
-
-These options work for both `say` (macOS) and `espeak` (Linux) providers:
-
-| Flag | Description | Default |
-| ---- | -------------------------------------- | ------------------- |
-| `-p` | Voice preset (see Voice Presets below) | `Kate` (if not set) |
-| `-v` | Specific voice name (overrides `-p`) | - |
-| `-r` | Speaking rate (lower = slower) | `180` |
-
-**Note:** Voice names are automatically mapped between platforms. For example, "Kate" uses the Kate voice on macOS and en-gb on Linux.
+- **Cache Location**: `~/.md2audio/voice_cache.db`
+- **Cache Duration**: 30 days
+- **Supported Providers**: All providers
-#### ElevenLabs Provider Options
+## Output Structure
-| Flag | Description | Default |
-| ---------------------- | ----------------------------------- | ------------------------ |
-| `-elevenlabs-voice-id` | ElevenLabs voice ID (required) | - |
-| `-elevenlabs-model` | ElevenLabs model ID | `eleven_multilingual_v2` |
-| `-elevenlabs-api-key` | ElevenLabs API key (prefer env var) | `ELEVENLABS_API_KEY` env |
-
-### Voice Presets
-
-These presets work on both macOS and Linux (automatically mapped):
-
-| Preset | macOS Voice | Linux Voice |
-| ------------------- | ----------- | ----------- |
-| `british-female` | Kate | en-gb |
-| `british-male` | Daniel | en-gb |
-| `us-female` | Samantha | en-us |
-| `us-male` | Alex | en-us |
-| `australian-female` | Karen | en-au |
-| `indian-female` | Veena | en-in |
-
-**Cross-Platform Usage:**
+### Single File
```bash
-# Same command works on both macOS and Linux!
./md2audio -f script.md -p british-female
-
-# Or use specific voices (automatically mapped)
-./md2audio -f script.md -v Kate # macOS: Kate, Linux: en-gb
```
-### ElevenLabs Voice Settings
-
-ElevenLabs voice quality can be fine-tuned using environment variables. All settings are optional and have sensible defaults:
-
-| Setting | Range | Default | Description |
-| ------------------------------ | ---------- | ------- | ---------------------------------------------------------------------- |
-| `ELEVENLABS_STABILITY` | 0.0-1.0 | 0.5 | Voice consistency. Higher = more consistent but less expressive |
-| `ELEVENLABS_SIMILARITY_BOOST` | 0.0-1.0 | 0.5 | Voice similarity to original. Higher = closer to voice characteristics |
-| `ELEVENLABS_STYLE` | 0.0-1.0 | 0.0 | Emotional range. 0 = disabled, higher = more expressive |
-| `ELEVENLABS_USE_SPEAKER_BOOST` | true/false | true | Boost similarity of synthesized speech |
-| `ELEVENLABS_SPEED` | 0.7-1.2 | 1.0 | Default speaking speed (only for sections without timing annotations) |
-
-**Speed Behavior:**
-
-- Sections **with** timing annotations like `## Scene 1 (5s)` → Speed is calculated automatically to fit duration
-- Sections **without** timing annotations → Uses `ELEVENLABS_SPEED` setting (default: 1.0)
+Output:
-**Example `.env` configuration:**
-
-```bash
-ELEVENLABS_API_KEY=your-api-key
-ELEVENLABS_STABILITY=0.7 # More consistent voice
-ELEVENLABS_SIMILARITY_BOOST=0.8 # Closer to original voice
-ELEVENLABS_STYLE=0.3 # Slight emotional variation
-ELEVENLABS_SPEED=1.1 # 10% faster for non-timed sections
```
-
-## Markdown Format
-
-The script expects H2 headers (`##`) to denote sections. You can optionally specify target duration for each section:
-
-```markdown
-## Scene 1: Introduction (8s)
-
-This is the content for scene 1. It will be converted to audio that lasts exactly 8 seconds.
-
-## Scene 2: Main Demo (12s)
-
-This is the content for scene 2. The speaking rate will be automatically adjusted to fit 12 seconds.
-
-## Scene 3: Conclusion
-
-This section has no timing specified, so it will use the default speaking rate (-r flag).
+audio_sections/
+├── section_01_introduction.aiff
+├── section_02_main_content.aiff
+└── section_03_conclusion.aiff
```
-### Timing Formats Supported
-
-- `(8s)` - Target duration of 8 seconds
-- `(10.5s)` - Target duration of 10.5 seconds
-- `(0-8s)` - Range format, uses end time (8 seconds)
-- `(15 seconds)` - Also works with "seconds" spelled out
-
-**How it works (macOS say provider only):**
-
-- The script counts the words in your text
-- Calculates the required words-per-minute (WPM) to fit the target duration
-- Automatically adjusts the speaking rate for that section
-- Shows you the actual duration vs target after generation
-
-**Important Notes:**
-
-- **Timing is supported with both providers**, but with different accuracy:
- - **macOS say provider**: Uses `-r` (rate) parameter for speed control
- - Very wide range of speaking rates (90-360 wpm)
- - Actual duration may differ from target (typically within 1-3 seconds)
- - Applies 0.95 adjustment factor for better accuracy
-
- - **ElevenLabs provider**: Uses `speed` parameter (NEW!)
- - Limited range: 0.7x (slower) to 1.2x (faster) of natural pace
- - More accurate natural-sounding speech
- - If target duration requires speed outside this range, audio will be clamped
- - Example: 5s target → 5.75s actual (within 15% for typical content)
-
-- **Timing accuracy tip**: Test with your content and adjust timing annotations as needed. For very tight timing requirements, consider the say provider's wider speed range.
-
-## Directory Processing
-
-Process entire directory trees recursively with the `-d` flag:
+### Directory Processing
```bash
-./md2audio -d ./docs -p british-female -o ./audio_output
-```
-
-**Input structure:**
-
-```
-docs/
-├── intro.md
-├── chapter1/
-│ ├── part1.md
-│ └── part2.md
-└── chapter2/
- └── overview.md
+./md2audio -d ./docs -p british-female
```
-**Output structure (mirrors input):**
+Input structure is mirrored in output:
```
-audio_output/
-├── intro/
-│ ├── section_01_welcome.aiff
-│ └── section_02_overview.aiff
-├── chapter1/
-│ ├── part1/
-│ │ ├── section_01_title.aiff
-│ │ └── section_02_title.aiff
-│ └── part2/
-│ └── section_01_title.aiff
-└── chapter2/
- └── overview/
- └── section_01_title.aiff
+docs/ audio_sections/
+├── intro.md → ├── intro/
+├── chapter1/ → ├── chapter1/
+│ ├── part1.md → │ ├── part1/
+│ └── part2.md → │ └── part2/
+└── chapter2/ → └── chapter2/
+ └── overview.md → └── overview/
```
-**Key features:**
-
-- Processes all `.md` files recursively
-- Creates mirror directory structure
-- Each markdown file gets its own subdirectory
-- Preserves folder hierarchy from input
-- Continues processing even if individual files fail
+## Troubleshooting
-**Example with examples folder:**
+### Voice Not Found
```bash
-# Process the included examples
-./md2audio -d ./examples -p british-female -format m4a
-
-# Results in organized audio files matching the examples structure
-```
-
-## Output
-
-Files are named using the pattern:
+# List all available voices for your provider
+./md2audio -list-voices
+# Use exact voice name from the list
+./md2audio -f script.md -v "Samantha"
```
-{prefix}_{number}_{sanitized_title}.{format}
-```
-
-Example outputs:
-
-- `section_01_scene_1_introduction.aiff`
-- `section_02_scene_2_main_demo.aiff`
-
-## Tips for Video Editing
-
-1. Generate separate files per section (this is automatic)
-2. Add timing to your markdown headers to match your screen recording
-3. Import all audio files into your video editing software
-4. Place each audio clip on the timeline where needed
-5. The audio will match your specified durations automatically
-### Timing Tips
+### Provider Setup Issues
-- **Be realistic**: Very short durations with lots of text will sound rushed
-- **Test first**: Generate one section to verify the pacing feels natural
-- **Adjust if needed**: If timing is off, adjust the duration in your markdown and regenerate
-- **Word count matters**: ~2-3 words per second is natural speech
-- **Override if needed**: The `-r` flag still works for sections without timing
+See the provider-specific guide for detailed setup instructions:
-## Troubleshooting
-
-**Voice not found:**
-
-- Run `./md2audio -list-voices` to see available voices
-- Use the exact voice name with `-v` flag
-
-**No sections found:**
-
-- Ensure your markdown uses `##` for headers (H2)
-- Check there's content after each header
+- [say Setup](docs/providers/say.md#setup) - No setup needed
+- [espeak Setup](docs/providers/espeak.md#installation) - Install espeak-ng
+- [Google Cloud Setup](docs/providers/google.md#setup) - GCP credentials
+- [ElevenLabs Setup](docs/providers/elevenlabs.md#setup) - API key
-**Audio quality:**
-
-- AIFF format is higher quality but larger
-- M4A format is compressed and smaller
-- Adjust rate with `-r` flag for clarity
+### Debug Mode
-## Example Workflow
+Enable debug logging to troubleshoot issues:
```bash
-# 1. Check your markdown format
-cat examples/demo_script.md
-
-# 2. List available voices
-./md2audio -list-voices
-
-# 3. Generate audio files
-./md2audio -f examples/demo_script.md -p british-female -r 175 -format m4a
-
-# 4. Import the files from ./audio_sections into your video editor
+./md2audio -f script.md -p british-female -debug
```
-## Notes
-
-- The script automatically cleans markdown formatting (links, bold, italic)
-- Empty sections are skipped
-- Section titles are sanitized for safe filenames
-- Speaking rate default is 180 (macOS default is 200)
-
-## For Developers
-
-Interested in contributing or understanding the codebase?
+Shows:
-See the [Contributing Guide](CONTRIBUTING.md) for detailed information about:
-
-- Project architecture and package organization
-- Development tools and workflow
-- Code quality standards
-- Setting up your development environment
+- Cache hits/misses
+- API request details
+- File processing progress
+- Internal operations
## Contributing
-Contributions are welcome!
+Contributions are welcome! See the [Contributing Guide](CONTRIBUTING.md) for:
-See the [Contributing Guide](/CONTRIBUTING.md) for setup instructions.
+- Project architecture
+- Development setup
+- Code quality standards
+- Provider implementation guide
## License
-This project is licensed under the MIT License – see the [LICENSE](./LICENSE) file for details.
+MIT License - see [LICENSE](LICENSE) for details
diff --git a/docs/provider-comparison.md b/docs/provider-comparison.md
new file mode 100644
index 0000000..6195599
--- /dev/null
+++ b/docs/provider-comparison.md
@@ -0,0 +1,111 @@
+# TTS Provider Comparison
+
+This guide helps you choose the best text-to-speech provider for your needs.
+
+## Quick Comparison Table
+
+| Feature | [say](providers/say.md) | [espeak](providers/espeak.md) | [ElevenLabs](providers/elevenlabs.md) | [Google Cloud](providers/google.md) |
+| --------------- | ----------------------- | ----------------------------- | ------------------------------------- | ----------------------------------- |
+| **Platform** | macOS only | Linux only | All platforms | All platforms |
+| **Cost** | Free | Free | Paid ($5-99/mo) | Paid ($4-16/M chars) |
+| **Quality** | Good | Basic | Premium | Premium |
+| **Voices** | ~70 voices | ~50 voices | 20+ premium | 400+ voices |
+| **Languages** | 30+ | 50+ | 30+ | 50+ |
+| **Offline** | Yes | Yes | ❌ No | ❌ No |
+| **Speed Range** | 90-360 WPM | Variable | 0.7x-1.2x | 0.25x-4.0x |
+| **Formats** | AIFF, M4A | WAV, MP3, M4A | MP3 | MP3, WAV, OGG |
+| **Setup** | None | Install espeak | API key | GCP credentials |
+| **Best For** | macOS dev/test | Linux dev/test | Premium quality | Enterprise/Scale |
+
+## Detailed Comparison
+
+### Voice Quality
+
+#### say (macOS)
+
+- **Pros**: Natural-sounding, good for local testing
+- **Cons**: Not neural TTS, somewhat robotic
+- **Use Case**: Development, testing, local projects
+
+#### espeak (Linux)
+
+- **Pros**: Lightweight, fast, open source
+- **Cons**: Robotic voice, limited expressiveness
+- **Use Case**: Development, testing, scripting
+
+#### ElevenLabs
+
+- **Pros**: Highly realistic, emotional control, voice cloning
+- **Cons**: Requires paid subscription, limited speed range
+- **Use Case**: Production content, audiobooks, podcasts
+
+#### Google Cloud TTS
+
+- **Pros**: Neural2/WaveNet voices, massive voice library, enterprise SLA
+- **Cons**: Requires GCP setup, costs scale with usage
+- **Use Case**: Enterprise, multi-language, high-volume
+
+### Speed Control Comparison
+
+| Provider | Speed Range | Timing Accuracy | Notes |
+| ------------ | ----------- | --------------- | ------------------------------- |
+| say | 90-360 WPM | ±1-3 seconds | Wide range, good flexibility |
+| espeak | Variable | ±2-4 seconds | Adjusts rate parameter |
+| ElevenLabs | 0.7x-1.2x | ±15% | Limited range, natural quality |
+| Google Cloud | 0.25x-4.0x | ±10% | **Widest range**, best accuracy |
+
+**For precise timing control**: Google Cloud TTS (0.25x-4.0x range)
+**For natural quality**: ElevenLabs (limited but realistic)
+**For flexibility**: say (wide WPM range)
+
+### Voice Selection
+
+#### say (macOS)
+
+- ~70 voices across 30+ languages
+- Organized by language/region
+- Good variety, standard quality
+- List with: `./md2audio -list-voices`
+
+#### espeak (Linux)
+
+- ~50 voices across 50+ languages
+- Simple language codes (en-us, en-gb, etc.)
+- Open source voice synthesis
+- List with: `./md2audio -provider espeak -list-voices`
+
+#### ElevenLabs
+
+- 20+ professional voices
+- Highly distinctive personalities
+- Voice cloning available (paid tiers)
+- Emotional range control
+- List with: `./md2audio -provider elevenlabs -list-voices`
+
+#### Google Cloud TTS
+
+- **400+ voices** across 50+ languages
+- Multiple quality tiers:
+ - Standard (basic quality)
+ - WaveNet (high quality)
+ - Neural2 (best quality)
+ - Studio (premium, highest fidelity)
+ - Polyglot (multi-language)
+- List with: `./md2audio -provider google -list-voices`
+
+### Output Formats
+
+| Provider | Formats | Notes |
+| ------------ | ------------------- | ------------------------------ |
+| say | AIFF, M4A | AIFF default, converts to M4A |
+| espeak | WAV, MP3, M4A, AIFF | WAV default, ffmpeg for others |
+| ElevenLabs | MP3 | MP3 only from API |
+| Google Cloud | MP3, WAV, OGG | Multiple formats supported |
+
+## Next Steps
+
+- [say Provider Guide](providers/say.md)
+- [espeak Provider Guide](providers/espeak.md)
+- [ElevenLabs Provider Guide](providers/elevenlabs.md)
+- [Google Cloud TTS Provider Guide](providers/google.md)
+- [Timing Control Guide](timing-guide.md)
diff --git a/docs/providers/elevenlabs.md b/docs/providers/elevenlabs.md
new file mode 100644
index 0000000..6b15f2a
--- /dev/null
+++ b/docs/providers/elevenlabs.md
@@ -0,0 +1,296 @@
+# ElevenLabs Provider
+
+The ElevenLabs provider uses the premium ElevenLabs AI text-to-speech API for highly realistic voice synthesis.
+
+## Platform
+
+- **Cross-platform** (Works on macOS, Linux, and Windows)
+- Cloud-based API service
+
+## Features
+
+- Premium, highly realistic voices
+- Emotional voice control
+- Voice cloning capabilities
+- Professional-grade quality
+- Multilingual support
+- Fine-grained voice settings
+- Speed control (0.7x - 1.2x)
+- Timing control support
+
+## Prerequisites
+
+- Any operating system (macOS, Linux, Windows)
+- ElevenLabs API key ([Sign up here](https://elevenlabs.io/))
+- Internet connection (cloud-based service)
+
+## Setup
+
+### 1. Get API Key
+
+1. Sign up at [ElevenLabs](https://elevenlabs.io/)
+2. Navigate to your profile settings
+3. Copy your API key
+
+### 2. Configure API Key
+
+**Option A: Environment Variable**
+
+```bash
+export ELEVENLABS_API_KEY='your-api-key-here'
+```
+
+**Option B: .env File**
+
+```bash
+# Create .env file in your project directory
+echo 'ELEVENLABS_API_KEY=your-api-key-here' > .env
+```
+
+**Option C: Command Line Flag**
+
+```bash
+./md2audio -provider elevenlabs \
+ -elevenlabs-api-key your-api-key-here \
+ -elevenlabs-voice-id 21m00Tcm4TlvDq8ikWAM \
+ -f script.md
+```
+
+### 3. (Optional) Configure Voice Settings
+
+Fine-tune voice quality in `.env`:
+
+```bash
+ELEVENLABS_API_KEY=your-api-key-here
+ELEVENLABS_STABILITY=0.5 # 0.0-1.0 (default: 0.5)
+ELEVENLABS_SIMILARITY_BOOST=0.5 # 0.0-1.0 (default: 0.5)
+ELEVENLABS_STYLE=0.0 # 0.0-1.0 (default: 0.0)
+ELEVENLABS_USE_SPEAKER_BOOST=true # true/false (default: true)
+ELEVENLABS_SPEED=1.0 # 0.7-1.2 (default: 1.0)
+```
+
+## Usage
+
+### List Available Voices
+
+```bash
+# List all ElevenLabs voices (cached)
+./md2audio -provider elevenlabs -list-voices
+
+# Refresh voice cache
+./md2audio -provider elevenlabs -list-voices -refresh-cache
+
+# Export voices to JSON
+./md2audio -provider elevenlabs -export-voices voices.json
+```
+
+### Basic Generation
+
+```bash
+# Generate audio from markdown
+./md2audio -provider elevenlabs \
+ -elevenlabs-voice-id 21m00Tcm4TlvDq8ikWAM \
+ -f script.md
+
+# Process entire directory
+./md2audio -provider elevenlabs \
+ -elevenlabs-voice-id 21m00Tcm4TlvDq8ikWAM \
+ -d ./docs \
+ -o ./audio_output
+```
+
+### Using Specific Models
+
+```bash
+# Use multilingual model (default)
+./md2audio -provider elevenlabs \
+ -elevenlabs-voice-id YOUR_VOICE_ID \
+ -elevenlabs-model eleven_multilingual_v2 \
+ -f script.md
+
+# Use English-only model (lower latency)
+./md2audio -provider elevenlabs \
+ -elevenlabs-voice-id YOUR_VOICE_ID \
+ -elevenlabs-model eleven_monolingual_v1 \
+ -f script.md
+```
+
+## Voice Settings
+
+### Stability (0.0-1.0)
+
+Controls voice consistency:
+
+- **Low (0.0-0.3)**: More expressive, variable
+- **Medium (0.4-0.6)**: Balanced (default: 0.5)
+- **High (0.7-1.0)**: Very consistent, less expressive
+
+```bash
+ELEVENLABS_STABILITY=0.7 # More consistent voice
+```
+
+### Similarity Boost (0.0-1.0)
+
+Controls how closely the voice matches the original:
+
+- **Low (0.0-0.3)**: More creative interpretation
+- **Medium (0.4-0.6)**: Balanced (default: 0.5)
+- **High (0.7-1.0)**: Closer to original voice characteristics
+
+```bash
+ELEVENLABS_SIMILARITY_BOOST=0.8 # Closer to original voice
+```
+
+### Style (0.0-1.0)
+
+Controls emotional expression:
+
+- **0.0**: No style/emotion (default, most stable)
+- **0.1-0.5**: Subtle emotional variation
+- **0.6-1.0**: High emotional expression
+
+```bash
+ELEVENLABS_STYLE=0.3 # Slight emotional variation
+```
+
+### Speaker Boost (true/false)
+
+Enhances voice similarity:
+
+- **true**: Better voice matching (default)
+- **false**: Standard voice synthesis
+
+```bash
+ELEVENLABS_USE_SPEAKER_BOOST=true
+```
+
+### Speed (0.7-1.2)
+
+Default speaking speed for non-timed sections:
+
+- **0.7**: 30% slower
+- **1.0**: Natural speed (default)
+- **1.2**: 20% faster
+
+```bash
+ELEVENLABS_SPEED=1.1 # 10% faster for non-timed sections
+```
+
+**Note**: Sections with timing annotations (e.g., `## Intro (5s)`) automatically calculate speed to fit the duration.
+
+## Timing Control
+
+ElevenLabs supports timing annotations with automatic speed adjustment:
+
+```markdown
+## Introduction (8s)
+This section will be adjusted to fit 8 seconds using speed control.
+
+## Main Demo (5-10s)
+Targets 10 seconds (end time is used).
+
+## Conclusion
+No timing specified - uses ELEVENLABS_SPEED setting (default: 1.0).
+```
+
+**Speed Range**: 0.7x - 1.2x
+
+- **Accuracy**: Typically within 15% of target duration
+- **Quality**: Natural-sounding speech maintained
+- **Limitation**: If target requires speed outside range, it will be clamped with a warning
+
+## Output Format
+
+- **MP3 only** - ElevenLabs API returns MP3 audio
+- High quality compression
+- Suitable for all platforms
+
+## Common Voice IDs
+
+Popular ElevenLabs voices:
+
+| Voice ID (2024) | Name | Description |
+|---------------------------|-----------|----------------------|
+| 21m00Tcm4TlvDq8ikWAM | Rachel | Calm, professional |
+| AZnzlk1XvdvUeBnXmlld | Domi | Strong, confident |
+| EXAVITQu4vr4xnSDxMaL | Bella | Soft, friendly |
+| ErXwobaYiN019PkySvjV | Antoni | Well-rounded, male |
+| MF3mGyEYCl7XYWbV9V6O | Elli | Emotional, young |
+| TxGEqnHWrfWFTfGW9XjX | Josh | Deep, professional |
+| VR6AewLTigWG4xSOukaG | Arnold | Mature, authoritative|
+| pNInz6obpgDQGcFmaJgB | Adam | Deep, narrative |
+| yoZ06aMxZJJ28mfd3POQ | Sam | Dynamic, energetic |
+
+Run `-list-voices` to see your account's available voices.
+
+## Pricing
+
+See [ElevenLabs Pricing](https://elevenlabs.io/pricing) for current details.
+
+## Tips
+
+1. **Start Simple**: Begin with default settings, then fine-tune
+2. **Test First**: Generate one section to verify voice and settings
+3. **Cache Voices**: First `-list-voices` call caches for 30 days
+4. **Timing**: For tight timing needs, test and adjust markdown annotations
+5. **Quality vs. Cost**: Higher quality settings may use more characters
+
+## Troubleshooting
+
+### API Key Errors
+
+```bash
+# Verify API key is set
+echo $ELEVENLABS_API_KEY
+
+# Or check .env file
+cat .env | grep ELEVENLABS_API_KEY
+```
+
+### Voice Not Found
+
+```bash
+# List your available voices
+./md2audio -provider elevenlabs -list-voices
+
+# Copy the exact voice ID from the list
+```
+
+### Timing Issues
+
+```bash
+# Check calculated speed in output
+# If speed is clamped (0.7 or 1.2), adjust target duration
+
+# Example: If "Warning: Required speed (1.5) exceeds maximum"
+# Increase target duration: (5s) → (7s)
+```
+
+## Performance
+
+- **Latency**: Cloud-based, requires internet
+- **Quality**: Premium, highly realistic
+- **Rate Limits**: Depends on plan
+- **Caching**: Voice list cached locally for 30 days
+- **Retry Logic**: Automatic retry on transient failures
+
+## Limitations
+
+- MP3 format only (no WAV/AIFF)
+- Requires internet connection
+- API rate limits apply
+- Speed range limited to 0.7x - 1.2x
+- Costs scale with usage
+
+## Best Practices
+
+1. **Use .env**: Keep API keys out of scripts
+2. **Cache Voices**: Run `-list-voices` once, then use cached list
+3. **Batch Processing**: Process multiple files in one run
+4. **Monitor Usage**: Check ElevenLabs dashboard regularly
+5. **Test Settings**: Find optimal stability/similarity for your use case
+
+## Next Steps
+
+- Try [Google Cloud TTS](google.md) for even wider speed range
+- Check [Provider Comparison](../provider-comparison.md) for detailed comparison
diff --git a/docs/providers/espeak.md b/docs/providers/espeak.md
new file mode 100644
index 0000000..afe785b
--- /dev/null
+++ b/docs/providers/espeak.md
@@ -0,0 +1,241 @@
+# Linux espeak Provider
+
+The `espeak` provider uses the open-source eSpeak NG text-to-speech synthesizer for Linux systems.
+
+## Platform
+
+- **Linux only** (Ubuntu, Debian, Fedora, Arch, etc.)
+- Open source and free
+
+## Features
+
+- Free and open source
+- 50+ voices in various languages
+- Multiple output formats (WAV, MP3, M4A, AIFF via ffmpeg)
+- Timing control support
+- Offline operation
+- Cross-platform voice mapping (macOS voice names work automatically)
+
+## Prerequisites
+
+- Linux operating system
+- `espeak-ng` or `espeak` installed
+- `ffmpeg` for format conversion (MP3, M4A, AIFF)
+
+## Installation
+
+### Ubuntu/Debian
+
+```bash
+sudo apt install espeak-ng ffmpeg
+```
+
+### Fedora/RHEL
+
+```bash
+sudo dnf install espeak-ng ffmpeg
+```
+
+### Arch Linux
+
+```bash
+sudo pacman -S espeak-ng ffmpeg
+```
+
+## Setup
+
+After installation, verify espeak is available:
+
+```bash
+# Check espeak-ng is installed
+which espeak-ng
+
+# Or check for espeak (older version)
+which espeak
+
+# Test voice
+espeak-ng "Hello, this is a test"
+```
+
+## Usage
+
+### Basic Usage
+
+```bash
+# List available voices
+./md2audio -provider espeak -list-voices
+
+# Generate audio with default voice
+./md2audio -f script.md
+
+# Use voice preset (automatically mapped)
+./md2audio -f script.md -p british-female
+
+# Use specific voice
+./md2audio -f script.md -v en-gb
+```
+
+### Voice Presets (Cross-Platform)
+
+These presets work the same on Linux and macOS:
+
+| Preset | macOS Voice | Linux Voice (espeak) |
+|---------------------|-------------|----------------------|
+| `british-female` | Kate | en-gb |
+| `british-male` | Daniel | en-gb |
+| `us-female` | Samantha | en-us |
+| `us-male` | Alex | en-us |
+| `australian-female` | Karen | en-au |
+| `indian-female` | Veena | en-in |
+
+**Cross-platform example:**
+
+```bash
+# Same command works on both macOS and Linux!
+./md2audio -f script.md -p british-female
+
+# macOS voice names are automatically mapped
+./md2audio -f script.md -v Kate # Becomes en-gb on Linux
+```
+
+### Advanced Options
+
+```bash
+# Adjust speaking rate (lower = slower)
+./md2audio -f script.md -v en-gb -r 170
+
+# Generate MP3 instead of WAV
+./md2audio -f script.md -v en-gb -format mp3
+
+# Generate M4A
+./md2audio -f script.md -v en-gb -format m4a
+
+# Process entire directory
+./md2audio -d ./docs -p british-female -o ./audio
+```
+
+## Output Formats
+
+- **WAV** (default) - Uncompressed, high quality
+- **MP3** - Compressed, good quality (requires ffmpeg)
+- **M4A** - Compressed, compatible with Apple devices (requires ffmpeg)
+- **AIFF** - Uncompressed, Apple format (requires ffmpeg)
+
+## Timing Control
+
+The espeak provider supports timing annotations in H2 headers:
+
+```markdown
+## Introduction (8s)
+This section will be adjusted to approximately 8 seconds.
+
+## Main Content (5-10s)
+This will target 10 seconds (uses the end time).
+```
+
+**How it works:**
+
+- Similar to macOS say provider
+- Adjusts speaking rate to fit target duration
+- Uses words-per-minute calculation
+
+## Common Voice Languages
+
+Available voices include:
+
+- **English**: US (en-us), UK (en-gb), Australian (en-au), etc.
+- **Spanish**: es, es-la (Latin America)
+- **French**: fr, fr-be (Belgian)
+- **German**: de
+- **Italian**: it
+- **Portuguese**: pt, pt-br (Brazilian)
+- **Russian**: ru
+- **Chinese**: zh (Mandarin)
+- **Japanese**: ja
+- And many more...
+
+Run `./md2audio -provider espeak -list-voices` to see all available voices.
+
+## Voice Mapping
+
+When you use macOS voice names on Linux, they're automatically mapped:
+
+| macOS Voice | Linux espeak Voice |
+|-------------|-------------------|
+| Kate | en-gb |
+| Daniel | en-gb |
+| Samantha | en-us |
+| Alex | en-us |
+| Karen | en-au |
+| Veena | en-in |
+
+This allows scripts and commands to work across both platforms without modification.
+
+## Tips
+
+1. **Quality**: WAV provides lossless quality, MP3 is more portable
+2. **ffmpeg**: Required for MP3, M4A, and AIFF output formats
+3. **Testing**: Use dry-run mode to preview: `-dry-run`
+4. **Caching**: Voice list is cached for 30 days for faster lookups
+5. **Cross-platform**: Use voice presets for portable scripts
+
+## Troubleshooting
+
+### espeak-ng not found
+
+```bash
+# Ubuntu/Debian
+sudo apt install espeak-ng
+
+# Fedora
+sudo dnf install espeak-ng
+
+# Arch
+sudo pacman -S espeak-ng
+```
+
+### Format conversion fails
+
+```bash
+# Install ffmpeg for MP3/M4A/AIFF support
+sudo apt install ffmpeg # Ubuntu/Debian
+sudo dnf install ffmpeg # Fedora
+sudo pacman -S ffmpeg # Arch
+```
+
+### Voice not found
+
+```bash
+# List all available voices
+./md2audio -provider espeak -list-voices
+
+# Use espeak voice code
+./md2audio -f script.md -v en-gb
+```
+
+### Audio quality issues
+
+- espeak-ng generally has better quality than legacy espeak
+- For higher quality, consider ElevenLabs or Google Cloud TTS
+- Adjust rate for clearer speech: `-r 170`
+
+## Performance
+
+- Fast generation (local processing)
+- No API rate limits
+- Works offline
+- Voice cache updates instantly
+- Lightweight resource usage
+
+## Limitations
+
+- Linux only (not available on macOS or Windows)
+- Robotic voice quality (not neural TTS)
+- Limited voice customization
+- Timing accuracy varies
+
+## Next Steps
+
+- [ElevenLabs](elevenlabs.md) - Cloud-based, premium quality
+- [Google Cloud TTS](google.md) - Enterprise features, Neural2 voices
+- Check [Provider Comparison](../provider-comparison.md) for detailed comparison
diff --git a/docs/providers/google.md b/docs/providers/google.md
new file mode 100644
index 0000000..4159cdf
--- /dev/null
+++ b/docs/providers/google.md
@@ -0,0 +1,196 @@
+# Google Cloud TTS Example
+
+This example demonstrates using Google Cloud Text-to-Speech with md2audio.
+
+## Setup
+
+### 1. Create a Google Cloud Project
+
+1. Go to [Google Cloud Console](https://console.cloud.google.com/)
+2. Create a new project or select an existing one
+3. Enable the Cloud Text-to-Speech API
+4. Create a service account:
+ - Go to IAM & Admin > Service Accounts
+ - Click "Create Service Account"
+ - Grant "Cloud Text-to-Speech User" role
+ - Create and download a JSON key file
+
+### 2. Configure Credentials
+
+Set the environment variable to point to your service account key:
+
+```bash
+export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-key.json"
+```
+
+Or add it to your `.env` file:
+
+```bash
+echo 'GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json' > .env
+```
+
+## Usage Examples
+
+### List Available Voices
+
+```bash
+# List all Google Cloud TTS voices (400+ voices)
+./md2audio -provider google -list-voices
+
+# Export voices to JSON file for reference
+./md2audio -provider google -export-voices google-voices.json
+```
+
+### Generate Audio from Markdown
+
+```bash
+# Process a single file with Neural2 voice (high quality)
+./md2audio -provider google -google-voice en-US-Neural2-F -f examples/demo_script.md
+
+# Process with British English voice
+./md2audio -provider google -google-voice en-GB-Neural2-A -f examples/demo_script.md
+
+# Generate MP3 files
+./md2audio -provider google -google-voice en-US-Neural2-F -format mp3 -f examples/demo_script.md
+
+# Process entire directory
+./md2audio -provider google -google-voice en-US-Neural2-C -d ./docs -o ./audio_output
+```
+
+### Advanced Options
+
+```bash
+# Adjust speaking rate (0.25 = very slow, 4.0 = very fast)
+./md2audio -provider google -google-voice en-US-Neural2-F -google-speed 1.5 -f script.md
+
+# Adjust pitch (-20.0 to 20.0 semitones)
+./md2audio -provider google -google-voice en-US-Neural2-F -google-pitch 2.0 -f script.md
+
+# Adjust volume gain (-96.0 to 16.0 dB)
+./md2audio -provider google -google-voice en-US-Neural2-F -google-volume 3.0 -f script.md
+
+# Use different language
+./md2audio -provider google -google-voice es-ES-Neural2-A -google-language es-ES -f spanish_script.md
+```
+
+## Voice Types
+
+Google Cloud TTS offers multiple voice quality tiers:
+
+### Neural2 (Recommended - Best Quality)
+
+- `en-US-Neural2-F` - Female, American English
+- `en-US-Neural2-M` - Male, American English
+- `en-GB-Neural2-A` - Female, British English
+- `en-GB-Neural2-B` - Male, British English
+
+### WaveNet (High Quality)
+
+- `en-US-Wavenet-F` - Female, American English
+- `en-US-Wavenet-A` - Male, American English
+
+### Standard (Basic Quality)
+
+- `en-US-Standard-A` - Female, American English
+- `en-US-Standard-D` - Male, American English
+
+### Studio (Premium Quality - Highest Fidelity)
+
+- `en-US-Studio-M` - Male voice optimized for studio recordings
+- `en-US-Studio-O` - Female voice optimized for studio recordings
+
+## Output Formats
+
+Google Cloud TTS supports:
+
+- **MP3** - Compressed, good for web use
+- **WAV** - Uncompressed, high quality (LINEAR16 encoding)
+- **OGG** - Compressed with Opus codec
+
+```bash
+# Generate WAV files
+./md2audio -provider google -google-voice en-US-Neural2-F -format wav -f script.md
+
+# Generate OGG files
+./md2audio -provider google -google-voice en-US-Neural2-F -format ogg -f script.md
+```
+
+## Timing Control
+
+Google Cloud TTS has the widest speaking rate range (0.25x - 4.0x):
+
+```bash
+# Slow speech for learning materials
+./md2audio -provider google -google-voice en-US-Neural2-F -google-speed 0.75 -f lesson.md
+
+# Fast speech for quick reviews
+./md2audio -provider google -google-voice en-US-Neural2-F -google-speed 1.5 -f summary.md
+```
+
+The tool also supports timing annotations in H2 headers:
+
+```markdown
+## Introduction (8s)
+This section will be adjusted to speak in approximately 8 seconds.
+
+## Quick Overview (3-5s)
+This will be between 3 and 5 seconds long.
+```
+
+## Multi-Language Support
+
+Google Cloud TTS supports 50+ languages:
+
+```bash
+# Spanish
+./md2audio -provider google -google-voice es-ES-Neural2-A -google-language es-ES -f spanish.md
+
+# French
+./md2audio -provider google -google-voice fr-FR-Neural2-A -google-language fr-FR -f french.md
+
+# German
+./md2audio -provider google -google-voice de-DE-Neural2-F -google-language de-DE -f german.md
+
+# Japanese
+./md2audio -provider google -google-voice ja-JP-Neural2-B -google-language ja-JP -f japanese.md
+```
+
+## Pricing
+
+See [Google Cloud TTS Pricing](https://cloud.google.com/text-to-speech/pricing) for current rates.
+
+## Tips
+
+1. **Voice Selection**: Start with Neural2 voices for the best quality-to-cost ratio
+2. **Caching**: The first `-list-voices` call downloads all voices; subsequent calls use cache (instant)
+3. **Credentials**: Keep your service account key secure, never commit it to version control
+4. **IAM Permissions**: Ensure your service account has the "Cloud Text-to-Speech User" role
+5. **Rate Limits**: Google Cloud TTS has generous rate limits, but for bulk processing consider batching
+6. **Language Codes**: Use the same language code in voice name and `-google-language` flag
+
+## Troubleshooting
+
+### "Google Cloud credentials not found" error
+
+- Ensure `GOOGLE_APPLICATION_CREDENTIALS` is set correctly
+- Check that the service account key file exists and is readable
+- Verify the path doesn't contain typos
+
+### "Permission denied" error
+
+- Check that your service account has the "Cloud Text-to-Speech User" role
+- Ensure the Cloud Text-to-Speech API is enabled in your project
+
+### Voices not appearing
+
+- Run with `-refresh-cache` to force update the voice cache
+- Check your internet connection
+- Verify API access from your network
+
+## Resources
+
+- [Google Cloud TTS Documentation](https://cloud.google.com/text-to-speech/docs)
+- [Voice List](https://cloud.google.com/text-to-speech/docs/voices)
+- [SSML Support](https://cloud.google.com/text-to-speech/docs/ssml) (future feature)
+- [Audio Profiles](https://cloud.google.com/text-to-speech/docs/audio-profiles) (future feature)
+- Check [Provider Comparison](../provider-comparison.md) for detailed comparison
diff --git a/docs/providers/say.md b/docs/providers/say.md
new file mode 100644
index 0000000..3c6e1c1
--- /dev/null
+++ b/docs/providers/say.md
@@ -0,0 +1,171 @@
+# macOS say Provider
+
+The `say` provider uses the built-in macOS text-to-speech system.
+
+## Platform
+
+- **macOS only**
+- Built-in, no installation required
+
+## Features
+
+- Free (built-in with macOS)
+- ~70 voices in various languages
+- Multiple output formats (AIFF, M4A)
+- Wide speaking rate range (90-360 WPM)
+- Timing control support
+- Offline operation
+
+## Prerequisites
+
+- macOS operating system
+- No additional installation needed
+
+## Setup
+
+No setup required! The `say` command is built into macOS.
+
+## Usage
+
+### Basic Usage
+
+```bash
+# List available voices
+./md2audio -provider say -list-voices
+
+# Generate audio with default voice
+./md2audio -f script.md
+
+# Use voice preset
+./md2audio -f script.md -p british-female
+
+# Use specific voice
+./md2audio -f script.md -v Kate
+```
+
+### Voice Presets
+
+| Preset | Voice Name |
+|---------------------|------------|
+| `british-female` | Kate |
+| `british-male` | Daniel |
+| `us-female` | Samantha |
+| `us-male` | Alex |
+| `australian-female` | Karen |
+| `indian-female` | Veena |
+
+### Advanced Options
+
+```bash
+# Adjust speaking rate (lower = slower)
+./md2audio -f script.md -v Kate -r 170
+
+# Generate M4A instead of AIFF
+./md2audio -f script.md -v Kate -format m4a
+
+# Process entire directory
+./md2audio -d ./docs -p british-female -o ./audio
+```
+
+## Output Formats
+
+- **AIFF** (default) - Uncompressed, high quality
+- **M4A** - Compressed, smaller file size
+
+The tool uses AIFF internally and converts to M4A using `afconvert` if requested.
+
+## Timing Control
+
+The say provider supports timing annotations in H2 headers:
+
+```markdown
+## Introduction (8s)
+This section will be adjusted to approximately 8 seconds.
+
+## Main Content (5-10s)
+This will target 10 seconds (uses the end time).
+```
+
+**How it works:**
+
+- Counts words in the text
+- Calculates required WPM to fit target duration
+- Applies 0.95 adjustment factor for better accuracy
+- Wide range: 90-360 WPM
+
+**Accuracy:**
+
+- Typical variance: 1-3 seconds from target
+- Best for general-purpose narration
+- Use `afinfo` to verify actual duration
+
+## Common Voice Languages
+
+Available voices include:
+
+- **English**: US, UK, Australian, Indian, Irish, South African
+- **Spanish**: Spain, Mexico, Argentina
+- **French**: France, Canadian
+- **German**: Germany
+- **Italian**: Italy
+- **Japanese**: Japan
+- **Korean**: Korea
+- **Chinese**: Mandarin, Cantonese
+- And many more...
+
+Run `./md2audio -list-voices` to see all available voices on your system.
+
+## Tips
+
+1. **Quality**: AIFF provides the best quality, M4A is more portable
+2. **Rate**: Default is 180 WPM; adjust between 90-360 for different pacing
+3. **Testing**: Use dry-run mode to preview before generating: `-dry-run`
+4. **Caching**: Voice list is cached for 30 days for faster lookups
+
+## Troubleshooting
+
+### Voice not found
+
+```bash
+# List all available voices
+./md2audio -provider say -list-voices
+
+# Use exact voice name
+./md2audio -f script.md -v "Samantha"
+```
+
+### Audio too fast/slow
+
+```bash
+# Slower speech (lower rate)
+./md2audio -f script.md -v Kate -r 150
+
+# Faster speech (higher rate)
+./md2audio -f script.md -v Kate -r 200
+```
+
+### M4A conversion fails
+
+- Ensure you have the latest macOS updates
+- The `afconvert` command should be available by default
+- Try generating AIFF first to verify the issue
+
+## Performance
+
+- Fast generation (local processing)
+- No API rate limits
+- Works offline
+- Voice cache updates instantly
+
+## Limitations
+
+- macOS only (not available on Windows or Linux)
+- Lower quality compared to Neural TTS services
+- Limited voice customization (no pitch/volume control)
+- Timing accuracy varies (±1-3 seconds typical)
+
+## Next Steps
+
+- Try [ElevenLabs](elevenlabs.md) for higher quality voices
+- Try [Google Cloud TTS](google.md) for enterprise features
+- Check [Provider Comparison](../provider-comparison.md) for detailed comparison