Language Flashcard Generator

Generate rich, AI-enhanced flashcard decks for Mochi from simple word lists. Each card is enhanced with pronunciation, IPA, example sentences, etymology, and audio — all generated automatically.

Features

Claude AI Enhancement — Generates romanization, IPA transcription, example sentences, etymology, and part-of-speech tags for each word
Azure Neural TTS — Produces natural-sounding audio for words and example sentences using male and female voices
Claude Batch API — Processes hundreds of cards in a single batch request instead of sequential API calls
Parallel Audio Generation — Generates audio files concurrently with configurable worker count
Mochi Export — Outputs a .mochi file ready to import, with embedded audio attachments and structured card templates
Review Preservation — Detects existing .mochi files in the input directory and carries over your review/SRS progress
Incremental Builds — Caches enhanced cards and audio files so re-runs skip already-processed content
Google TTS — Quick text-to-speech via Google Translate (standalone command)

Prerequisites

Python 3.12+
uv (package manager)
An Anthropic API key for Claude AI
An Azure Speech resource for neural TTS
Azure CLI logged in (az login) for authentication

Setup

# Clone and install dependencies
git clone <repo-url>
cd language-flashcard-generator
uv sync

Create a .env file in the project root:

ANTHROPIC_API_KEY=sk-ant-...
SPEECH_ENDPOINT=https://<region>.api.cognitive.microsoft.com
SPEECH_RESOURCE_ID=/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.CognitiveServices/accounts/<name>

Azure TTS authenticates via DefaultAzureCredential (typically az login).

Quick Start

1. Prepare a frequency list

Create a CSV file with word,english pairs:

frequency_lists/zulu/zulu_common.csv

mina,I/me
wena,you (singular)
yena,he/she/him/her
thina,we/us

2. Create a config

# configs/zulu_common.yaml
key: zulu_common
deck_name: Zulu Common Words
frequency_dir: frequency_lists/zulu

ai_provider: anthropic
model: claude-haiku-4-5-20251001

batch_size: 200
batch_check_interval: 10

rate_limit_delay: 1.0
max_retries: 3
use_batch_api: true

tts:
  enabled: true
  azure:
    male_voice: zu-ZA-ThembaNeural
    female_voice: zu-ZA-ThandoNeural

enhancement_fields:
  romanization: true
  pronunciation_ipa: true
  example_sentences: true
  etymology: false
  additional_meanings: true
  part_of_speech: true

3. Build the deck

# Build all cards using the batch API
uv run python main.py build-deck configs/zulu_common.yaml

# Build only the first 10 cards
uv run python main.py build-deck configs/zulu_common.yaml --size 10

# Resume from card 50
uv run python main.py build-deck configs/zulu_common.yaml --start 50

# Re-render .mochi from cached cards (no API calls)
uv run python main.py build-deck configs/zulu_common.yaml --render-only

The output .mochi file will be at builds/<key>/output/<Deck_Name>.mochi. Import it into Mochi.

CLI Commands

`build-deck`

Build a full flashcard deck from a config file.

uv run python main.py build-deck <config.yaml> [--size N] [--start N] [--render-only]

Flag	Description
`--size N`	Limit to N cards
`--start N`	Start from card index N (for resuming)
`--render-only`	Re-render `.mochi` from cache without calling any APIs

`google`

Quick text-to-speech using Google Translate TTS.

uv run python main.py google "Hello world" --lang en --output hello.mp3

`azure`

Text-to-speech using Azure Neural TTS.

uv run python main.py azure "Sawubona" --voice zu-ZA-ThembaNeural --output sawubona.mp3

Config Reference

Field	Default	Description
`key`	(required)	Unique build key; determines output directory under `builds/`
`deck_name`	`Enhanced Vocabulary`	Name of the Mochi deck
`frequency_dir`	(required)	Path to directory containing `.csv` frequency list files
`ai_provider`	`anthropic`	AI provider
`model`	`claude-3-5-haiku-20241022`	Claude model to use
`batch_size`	`10`	Number of cards per batch (both sequential and batch API)
`use_batch_api`	`false`	Use the Claude Message Batches API for bulk processing
`batch_check_interval`	`10`	Seconds between batch status polls
`rate_limit_delay`	`1.0`	Delay between sequential API calls (seconds)
`max_retries`	`3`	Max retry attempts for failed API calls
`tts.enabled`	`true`	Enable Azure TTS audio generation
`tts.azure.male_voice`	`zu-ZA-ThembaNeural`	Azure neural voice for male audio
`tts.azure.female_voice`	`zu-ZA-ThandoNeural`	Azure neural voice for female audio

How It Works

Sequential Mode (`use_batch_api: false`)

Load word lists from CSV files in frequency_dir
For each batch of batch_size cards:
- Send each card to Claude for enhancement (romanization, IPA, examples, etc.)
- Generate audio in parallel using Azure TTS
- Cache results to builds/<key>/cached_cards/
Package everything into a .mochi file

Batch Mode (`use_batch_api: true`)

Load word lists, filter out already-cached cards
Submit all uncached cards to the Claude Batch API in chunks of batch_size
Poll for completion every batch_check_interval seconds
Download results, parse enhanced content
Generate all audio in parallel using Azure TTS
Cache results and package into a .mochi file

Batch mode is significantly faster for large decks (100+ cards).

Project Structure

language-flashcard-generator/
├── main.py                  # CLI entry point (Typer)
├── enhancer.py              # Core enhancement engine
├── models.py                # Pydantic data models
├── configs/                 # YAML config files
│   └── zulu_common.yaml
├── frequency_lists/         # Input word lists
│   └── zulu/
│       ├── zulu_common.csv
│       └── zulu_medical.csv
├── builds/                  # Build artifacts (per-config)
│   └── zulu_common/
│       ├── cached_cards/    # JSON cache per card
│       ├── audio_files/     # Generated .mp3 files
│       ├── output/          # Final .mochi files
│       ├── input/           # Place existing .mochi here to preserve reviews
│       ├── logs/            # Enhancement logs
│       ├── progress.json    # Sequential mode progress
│       └── batch_progress.json  # Batch mode progress
├── utils/
│   └── find_duplicates.py   # Utility to deduplicate frequency lists
├── pyproject.toml
└── .env                     # API keys (not committed)

Utilities

Find Duplicates

Detect and remove duplicate words from frequency list files:

# Find duplicates
uv run python utils/find_duplicates.py frequency_lists/zulu/zulu_common.csv

# Remove duplicates and save to new file
uv run python utils/find_duplicates.py frequency_lists/zulu/zulu_common.csv --remove --output cleaned.csv

Review Preservation

To carry over SRS review history when rebuilding a deck:

Export your current deck from Mochi as a .mochi file
Place it in builds/<key>/input/
Run build-deck — review data (intervals, scores, timestamps) will be merged into the new export

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Language Flashcard Generator

Features

Prerequisites

Setup

Quick Start

1. Prepare a frequency list

2. Create a config

3. Build the deck

CLI Commands

`build-deck`

`google`

`azure`

Config Reference

How It Works

Sequential Mode (`use_batch_api: false`)

Batch Mode (`use_batch_api: true`)

Project Structure

Utilities

Find Duplicates

Review Preservation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
frequency_lists/zulu		frequency_lists/zulu
utils		utils
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
enhancer.py		enhancer.py
main.py		main.py
makefile		makefile
models.py		models.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Language Flashcard Generator

Features

Prerequisites

Setup

Quick Start

1. Prepare a frequency list

2. Create a config

3. Build the deck

CLI Commands

build-deck

google

azure

Config Reference

How It Works

Sequential Mode (use_batch_api: false)

Batch Mode (use_batch_api: true)

Project Structure

Utilities

Find Duplicates

Review Preservation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`build-deck`

`google`

`azure`

Sequential Mode (`use_batch_api: false`)

Batch Mode (`use_batch_api: true`)

Packages