Concilium is a research-oriented Rust prototype for the controlled generation of a novel artificial language intended to model an unfamiliar, non-human linguistic system. The project does not attempt to imitate any documented natural language directly. Instead, it uses English glosses as a semantic indexing layer so that human-readable meanings can be attached to forms generated by a distinct phonological and grammatical system.
The present objective is exploratory: to investigate whether a computational pipeline can derive lexemes, clause structure, and sound-pattern regularities that are not inherited from attested human languages while remaining internally coherent enough to support translation and later expansion.
The working hypothesis behind Concilium is that a credible alien language should not be constructed as a simple substitution cipher over English. A stronger approach is to separate semantic intent from linguistic realization.
In this project:
- English provides glosses such as
i,you,see, andtree. - Concilium generates its own surface forms from a defined phonological inventory.
- Grammar is realized independently through its own word order and inflection rules.
- Sound changes are applied to establish a recognizable language identity rather than a random word list.
This means English functions as the annotation layer, not as the final language model.
Concilium is organized as a small research engine with clear domain boundaries:
phonology: defines the phoneme inventory, weighted selection, syllable templates, and phonotactic constraints.lexicon: generates lexical forms from semantic glosses.mutation: applies ordered sound-change rules that give the language a distinct identity.grammar: realizes clauses according to Concilium word order and morphology.evolution: assembles the language from its blueprint and exposes translation-oriented lookup methods.presets: stores the current experimental configuration for the Concilium language.
This separation is intentional. It keeps the project extensible for later work on morphology, syntax, corpus generation, or comparative experiments.
The current Concilium configuration is defined in src/presets.rs.
- Language name:
Concilium - Word order:
SOV - Plural marking: suffix
-en - Past marking: prefix
ka- - Example sound changes include
k -> kh,s -> shbefore vowels, anda -> aebetween consonants.
Because Concilium is SOV, English clauses must often be reordered during translation. A gloss sequence equivalent to English I see you is realized structurally as I you see.
Concilium uses a multi-mode output system. To run the engine, you must specify which output you want to generate:
This generates a markdown table of all English glosses and their Concilium translations in Words.md.
cargo run -- wordsThis translates all English sentences found in the data/ directory and outputs them to Sentences.md.
cargo run -- sentencesThis translates paragraph entries from data/english_paragraphs.md and outputs them to Paragraphs.md.
cargo run -- paragraphsThe engine loads corpus data from the data/ directory. It supports:
.mdfiles: Each line (except headings) is treated as a sentence, and words are extracted for the lexicon..jsonfiles: Extracts string values to populate the lexicon and sentence list.
The project includes basic tests for:
- phonotactic validity
- environment-sensitive sound change application
- grammatical clause realization
- language generation and clause translation availability
Run:
cargo testUse the helper script in gen/ to generate Concilium audio with Kokoro. It defaults to full sentence lines, says the English sentence first and then the Concilium version, and now renders the Concilium side from converted phonemes instead of making Kokoro guess from English-like spellings.
Create and activate a local virtual environment:
cd gen
python3 -m venv venv
source venv/bin/activate
python -m ensurepip --upgradeInstall a CPU-only PyTorch wheel first so kokoro does not pull the large CUDA build by default:
pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install 'kokoro>=0.3.4' soundfileKokoro also expects espeak-ng to be available on the system. The default voice is now Onyx (am_onyx), which uses the American English pipeline. Then generate sentence audio files from Sentences.md:
python kokoro_words.py --input ../Sentences.md --output-dir ../audio/kokoro_sentencesUseful options:
python kokoro_words.py --input ../Sentences.md --output-dir ../audio/kokoro_sentences --limit 1
python kokoro_words.py --input ../Sentences.md --output-dir ../audio/kokoro_sentences --voice am_onyx --speed 0.85
python kokoro_words.py --input ../Sentences.md --output-dir ../audio/kokoro_sentences --prompt-style concilium
python kokoro_words.py --input ../Sentences.md --output-dir ../audio/kokoro_sentences --concilium-only
python kokoro_words.py --input ../Sentences.md --output-dir ../audio/kokoro_sentences --concilium-render spoken
python kokoro_words.py --input ../Paragraphs.md --unit paragraphs --output-dir ../audio/kokoro_paragraphs --limit 1
python kokoro_words.py --input ../Sentences.md --unit words --output-dir ../audio/kokoro_words
python kokoro_words.py --input ../Sentences.md --export-overrides-template pronunciation_overrides.tsvThe script writes .wav files plus a manifest.tsv file with the original pronunciation hints and the prompt sent to Kokoro.
The script reads gen/pronunciation_overrides.tsv by default. This lets you keep using the pronunciation column in your translated markdown while manually correcting any word that Kokoro still says badly.
Important:
- In the default
bilingualmode, the English sentence still comes first. --concilium-onlyskips the English lead-in entirely.- Overrides only affect the Concilium side of the sentence.
- By default the Concilium side is rendered from phonemes.
- Use
--prompt-style conciliumonly if you explicitly want audio without the English lead-in.
Format:
concilium pronunciation spoken
dimdran [d-ee-m-dr-ah-n] deem-drahn
krukhdraes [kr-oo-kh-dr-eye-s] krookh-driceHow it works:
concilium: the Concilium word as it appears in your translated markdownpronunciation: the bracketed pronunciation from the markdown tablespoken: the exact English-ish prompt Kokoro should read for that word
You can generate a starter template from any translated markdown file:
python kokoro_words.py --input ../Sentences.md --export-overrides-template pronunciation_overrides.tsvThen edit the spoken column by ear until the pronunciation sounds right.
This repository should be understood as an experimental linguistic generator rather than a finished conlang toolkit. The current implementation establishes a disciplined baseline for further research into:
- alien phonotactics
- non-human semantic categorization
- extended morphology
- script design
- corpus generation
- machine-assisted translation between English glosses and Concilium