Terminal-based Chinese vocabulary trainer written in Julia.
The app focuses on active recall across three attributes of a word: Hanzi, Pinyin (with tone numbers), and English translation. It schedules reviews using a simple spaced-repetition model and saves your progress to JSON.
- Interactive CLI training sessions.
- Multiple directed tasks per word (e.g.
Hanzi -> Pinyin,Translation -> Hanzi, etc.). - Spaced repetition based on time since last review and current level.
- Persistent progress (
ChineseSave.json) and a human-readable stats export (ChineseStats.txt). - macOS helpers:
- Text-to-speech via
say. - Keyboard layout toggle via
osascript(useful for Hanzi input).
- Text-to-speech via
Interactive training in the terminal:
Per-task review history (colors = tasks):
Per-word learning history:
This trainer models forgetting as an exponential decay process and uses it to:
- Decide which words are due for review.
- Update the learning level after each attempt.
- Keep separate progress for different skills (so you don’t end up “knowing how to write it” but forgetting how it sounds).
For each task, the script derives a priority from time since last review and the current level:
priority = (minutes_since_last_review) / 2^level
Intuition:
- Right after a review,
minutes_since_last_reviewis small → low priority. - Higher
levelmeans a longer effective half-life (2^level) → the same time gap becomes less urgent.
The session selects words whose global priority is high enough (plus level-0 words) and trains them in small random batches.
Priority is converted into memory strength (0..1) using an exponential decay:
memory_strength = 2^(-C * priority)
So the longer you wait (higher priority), the lower the memory strength becomes.
Instead of level += 1 / level -= 1, the update is probabilistic and can jump multiple levels.
- If you answered correctly, the chance to increase the level is higher when
memory_strengthwas low.- Meaning: if you hadn’t seen the item for a long time but still recalled it, that’s strong evidence of learning.
- If you answered incorrectly, the chance to decrease the level is higher when
memory_strengthwas high.- Meaning: if you reviewed recently (model says you should remember) but failed, that’s a bad signal.
The script applies the update as a chain of Bernoulli trials, so in rare cases the level can go up/down by more than one.
Each word tracks statistics for directed tasks between the three attributes:
- Hanzi, Pinyin (tone numbers), Translation
This produces tasks like:
Hanzi -> Pinyin,Hanzi -> TranslationPinyin -> Hanzi,Pinyin -> TranslationTranslation -> Hanzi,Translation -> Pinyin
The global level of a word is derived from these task levels (the weakest task dominates), which prevents “lopsided learning”.
For a chosen word, the trainer selects the weakest starting attribute and then runs a short sequence:
- Start from Hanzi: warm-up Hanzi input → train
Hanzi -> PinyinandHanzi -> Translation - Start from Pinyin (sound): warm-up with TTS → train
Pinyin -> HanziandPinyin -> Translation - Start from Translation: show translation/context → train
Translation -> HanziandTranslation -> Pinyin
Translation recall is implemented as: type keywords → get a shortlist → choose the intended option.
Input format:
translation_kw1+translation_kw2;context_kw1+context_kw2
This allows you to remember the idea (key words) instead of reproducing the exact dictionary wording.
- Julia (recent 1.x).
The main dependencies are declared in Project.toml (e.g. JSON, StatsBase).
Optional plotting is implemented via an extension and requires PyPlot.
To install the base dependencies for this repo:
julia --project=. -e 'import Pkg; Pkg.instantiate()'From the repository folder:
./bin/trainchineseAlternative (without the wrapper):
julia --project=. train_chinese.jlPlotting uses PyPlot and is intentionally installed into the separate cli/ environment.
Recommended:
./bin/trainchinese --install-plotting
./bin/trainchinese --plot-historyManual (equivalent):
julia --project=cli -e 'import Pkg; Pkg.instantiate()'If you want to run the trainer as a command (e.g. trainchinese --help), this repo includes a small wrapper script:
- macOS/Linux:
bin/trainchinese - Windows (PowerShell):
bin/trainchinese.ps1
On macOS/Linux, make it executable and add it to your PATH (via symlink):
chmod +x bin/trainchinese
mkdir -p ~/.local/bin
ln -sf "$(pwd)/bin/trainchinese" ~/.local/bin/trainchineseThen you can run:
trainchinese --help
trainchinese --statstrainchinese --helpCommon flags:
--stats— print current pool statistics and exit--plot-history— plot per-task history (colors = tasks) and exit (requiresPyPlot)--plot-history --no-show— generate the plot without opening a GUI window--plot-history --save-plot FILE.png— save the plot to a file and exit--plot-word-history— plot per-word learning history and exit (requiresPyPlot)--install-plotting— install plotting deps into thecli/environment--save FILE— override save JSON (default:ChineseSave.json)--vocab FILE— override vocabulary TXT (default:ChineseVocabulary.txt)--stats-out FILE— override stats export TXT (default:ChineseStats.txt)
If you run without the wrapper, you may need to instantiate dependencies first:
julia --project=. -e 'import Pkg; Pkg.instantiate()'The wrapper (bin/trainchinese) instantiates the cli/ environment automatically on first run.
The trainer will:
- Load progress from
ChineseSave.json. - Load vocabulary from
ChineseVocabulary.txtand merge new entries into the pool. - Start an interactive training session and periodically write updated progress/stats.
This repo contains a Project.toml and a small module wrapper in src/TrainChinese.jl, so you can also use it from the Julia REPL:
import Pkg
Pkg.activate(".")
Pkg.instantiate()
using TrainChinese
TrainChinese.main()This is the source vocabulary list. Each non-empty, non-comment line uses a |-separated format:
id | hanzi | pinyin | translation | optional context
Notes:
idshould be unique and stable (examples in the repo:hsk1.4,duo.12).- Pinyin may contain tone marks in the file; the script converts them to tone numbers.
contextis optional and helps disambiguate similar translations.
Saved progress (levels, last review timestamps, per-task history). This repo may contain a longer file as an example/test dataset.
A quick tabular export for human inspection (per-task levels + mean/min per word).
CC-CEDICT-based dictionary file used by the trainer for additional lookups.
- macOS: the script uses
sayandosascriptin the interactive flow. - Linux/Windows: you can still run the trainer, but you may want to disable or replace the macOS-specific helpers if they don’t exist on your system.
- Make TTS/layout helpers cross-platform (feature flags or OS detection).
- Add a small installed CLI wrapper (so you can run it without typing
julia --project=...). - Expand automated tests for parsing, scheduling, and persistence.
MIT License. See LICENSE.


