Generate rich, AI-enhanced flashcard decks for Mochi from simple word lists. Each card is enhanced with pronunciation, IPA, example sentences, etymology, and audio — all generated automatically.
- Claude AI Enhancement — Generates romanization, IPA transcription, example sentences, etymology, and part-of-speech tags for each word
- Azure Neural TTS — Produces natural-sounding audio for words and example sentences using male and female voices
- Claude Batch API — Processes hundreds of cards in a single batch request instead of sequential API calls
- Parallel Audio Generation — Generates audio files concurrently with configurable worker count
- Mochi Export — Outputs a
.mochifile ready to import, with embedded audio attachments and structured card templates - Review Preservation — Detects existing
.mochifiles in the input directory and carries over your review/SRS progress - Incremental Builds — Caches enhanced cards and audio files so re-runs skip already-processed content
- Google TTS — Quick text-to-speech via Google Translate (standalone command)
- Python 3.12+
- uv (package manager)
- An Anthropic API key for Claude AI
- An Azure Speech resource for neural TTS
- Azure CLI logged in (
az login) for authentication
# Clone and install dependencies
git clone <repo-url>
cd language-flashcard-generator
uv syncCreate a .env file in the project root:
ANTHROPIC_API_KEY=sk-ant-...
SPEECH_ENDPOINT=https://<region>.api.cognitive.microsoft.com
SPEECH_RESOURCE_ID=/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.CognitiveServices/accounts/<name>Azure TTS authenticates via DefaultAzureCredential (typically az login).
Create a CSV file with word,english pairs:
frequency_lists/zulu/zulu_common.csv
mina,I/me
wena,you (singular)
yena,he/she/him/her
thina,we/us# configs/zulu_common.yaml
key: zulu_common
deck_name: Zulu Common Words
frequency_dir: frequency_lists/zulu
ai_provider: anthropic
model: claude-haiku-4-5-20251001
batch_size: 200
batch_check_interval: 10
rate_limit_delay: 1.0
max_retries: 3
use_batch_api: true
tts:
enabled: true
azure:
male_voice: zu-ZA-ThembaNeural
female_voice: zu-ZA-ThandoNeural
enhancement_fields:
romanization: true
pronunciation_ipa: true
example_sentences: true
etymology: false
additional_meanings: true
part_of_speech: true# Build all cards using the batch API
uv run python main.py build-deck configs/zulu_common.yaml
# Build only the first 10 cards
uv run python main.py build-deck configs/zulu_common.yaml --size 10
# Resume from card 50
uv run python main.py build-deck configs/zulu_common.yaml --start 50
# Re-render .mochi from cached cards (no API calls)
uv run python main.py build-deck configs/zulu_common.yaml --render-onlyThe output .mochi file will be at builds/<key>/output/<Deck_Name>.mochi. Import it into Mochi.
Build a full flashcard deck from a config file.
uv run python main.py build-deck <config.yaml> [--size N] [--start N] [--render-only]
| Flag | Description |
|---|---|
--size N |
Limit to N cards |
--start N |
Start from card index N (for resuming) |
--render-only |
Re-render .mochi from cache without calling any APIs |
Quick text-to-speech using Google Translate TTS.
uv run python main.py google "Hello world" --lang en --output hello.mp3Text-to-speech using Azure Neural TTS.
uv run python main.py azure "Sawubona" --voice zu-ZA-ThembaNeural --output sawubona.mp3| Field | Default | Description |
|---|---|---|
key |
(required) | Unique build key; determines output directory under builds/ |
deck_name |
Enhanced Vocabulary |
Name of the Mochi deck |
frequency_dir |
(required) | Path to directory containing .csv frequency list files |
ai_provider |
anthropic |
AI provider |
model |
claude-3-5-haiku-20241022 |
Claude model to use |
batch_size |
10 |
Number of cards per batch (both sequential and batch API) |
use_batch_api |
false |
Use the Claude Message Batches API for bulk processing |
batch_check_interval |
10 |
Seconds between batch status polls |
rate_limit_delay |
1.0 |
Delay between sequential API calls (seconds) |
max_retries |
3 |
Max retry attempts for failed API calls |
tts.enabled |
true |
Enable Azure TTS audio generation |
tts.azure.male_voice |
zu-ZA-ThembaNeural |
Azure neural voice for male audio |
tts.azure.female_voice |
zu-ZA-ThandoNeural |
Azure neural voice for female audio |
- Load word lists from CSV files in
frequency_dir - For each batch of
batch_sizecards:- Send each card to Claude for enhancement (romanization, IPA, examples, etc.)
- Generate audio in parallel using Azure TTS
- Cache results to
builds/<key>/cached_cards/
- Package everything into a
.mochifile
- Load word lists, filter out already-cached cards
- Submit all uncached cards to the Claude Batch API in chunks of
batch_size - Poll for completion every
batch_check_intervalseconds - Download results, parse enhanced content
- Generate all audio in parallel using Azure TTS
- Cache results and package into a
.mochifile
Batch mode is significantly faster for large decks (100+ cards).
language-flashcard-generator/
├── main.py # CLI entry point (Typer)
├── enhancer.py # Core enhancement engine
├── models.py # Pydantic data models
├── configs/ # YAML config files
│ └── zulu_common.yaml
├── frequency_lists/ # Input word lists
│ └── zulu/
│ ├── zulu_common.csv
│ └── zulu_medical.csv
├── builds/ # Build artifacts (per-config)
│ └── zulu_common/
│ ├── cached_cards/ # JSON cache per card
│ ├── audio_files/ # Generated .mp3 files
│ ├── output/ # Final .mochi files
│ ├── input/ # Place existing .mochi here to preserve reviews
│ ├── logs/ # Enhancement logs
│ ├── progress.json # Sequential mode progress
│ └── batch_progress.json # Batch mode progress
├── utils/
│ └── find_duplicates.py # Utility to deduplicate frequency lists
├── pyproject.toml
└── .env # API keys (not committed)
Detect and remove duplicate words from frequency list files:
# Find duplicates
uv run python utils/find_duplicates.py frequency_lists/zulu/zulu_common.csv
# Remove duplicates and save to new file
uv run python utils/find_duplicates.py frequency_lists/zulu/zulu_common.csv --remove --output cleaned.csvTo carry over SRS review history when rebuilding a deck:
- Export your current deck from Mochi as a
.mochifile - Place it in
builds/<key>/input/ - Run
build-deck— review data (intervals, scores, timestamps) will be merged into the new export