Concilium

Abstract

Concilium is a research-oriented Rust prototype for the controlled generation of a novel artificial language intended to model an unfamiliar, non-human linguistic system. The project does not attempt to imitate any documented natural language directly. Instead, it uses English glosses as a semantic indexing layer so that human-readable meanings can be attached to forms generated by a distinct phonological and grammatical system.

The present objective is exploratory: to investigate whether a computational pipeline can derive lexemes, clause structure, and sound-pattern regularities that are not inherited from attested human languages while remaining internally coherent enough to support translation and later expansion.

Research Aim

The working hypothesis behind Concilium is that a credible alien language should not be constructed as a simple substitution cipher over English. A stronger approach is to separate semantic intent from linguistic realization.

In this project:

English provides glosses such as i, you, see, and tree.
Concilium generates its own surface forms from a defined phonological inventory.
Grammar is realized independently through its own word order and inflection rules.
Sound changes are applied to establish a recognizable language identity rather than a random word list.

This means English functions as the annotation layer, not as the final language model.

Method

Concilium is organized as a small research engine with clear domain boundaries:

phonology: defines the phoneme inventory, weighted selection, syllable templates, and phonotactic constraints.
lexicon: generates lexical forms from semantic glosses.
mutation: applies ordered sound-change rules that give the language a distinct identity.
grammar: realizes clauses according to Concilium word order and morphology.
evolution: assembles the language from its blueprint and exposes translation-oriented lookup methods.
presets: stores the current experimental configuration for the Concilium language.

This separation is intentional. It keeps the project extensible for later work on morphology, syntax, corpus generation, or comparative experiments.

Current Linguistic Profile

The current Concilium configuration is defined in src/presets.rs.

Language name: Concilium
Word order: SOV
Plural marking: suffix -en
Past marking: prefix ka-
Example sound changes include k -> kh, s -> sh before vowels, and a -> ae between consonants.

Because Concilium is SOV, English clauses must often be reordered during translation. A gloss sequence equivalent to English I see you is realized structurally as I you see.

Running The Prototype

Concilium uses a multi-mode output system. To run the engine, you must specify which output you want to generate:

Generate Lexicon (Words)

This generates a markdown table of all English glosses and their Concilium translations in Words.md.

cargo run -- words

Generate Sentences

This translates all English sentences found in the data/ directory and outputs them to Sentences.md.

cargo run -- sentences

Generate Paragraphs

This translates paragraph entries from data/english_paragraphs.md and outputs them to Paragraphs.md.

cargo run -- paragraphs

Data Corpus

The engine loads corpus data from the data/ directory. It supports:

.md files: Each line (except headings) is treated as a sentence, and words are extracted for the lexicon.
.json files: Extracts string values to populate the lexicon and sentence list.

Validation

The project includes basic tests for:

phonotactic validity
environment-sensitive sound change application
grammatical clause realization
language generation and clause translation availability

Run:

cargo test

Generate Audio With Kokoro

Use the helper script in gen/ to generate Concilium audio with Kokoro. It defaults to full sentence lines, says the English sentence first and then the Concilium version, and now renders the Concilium side from converted phonemes instead of making Kokoro guess from English-like spellings.

Create and activate a local virtual environment:

cd gen
python3 -m venv venv
source venv/bin/activate
python -m ensurepip --upgrade

Install a CPU-only PyTorch wheel first so kokoro does not pull the large CUDA build by default:

pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install 'kokoro>=0.3.4' soundfile

Kokoro also expects espeak-ng to be available on the system. The default voice is now Onyx (am_onyx), which uses the American English pipeline. Then generate sentence audio files from Sentences.md:

python kokoro_words.py --input ../Sentences.md --output-dir ../audio/kokoro_sentences

Useful options:

python kokoro_words.py --input ../Sentences.md --output-dir ../audio/kokoro_sentences --limit 1
python kokoro_words.py --input ../Sentences.md --output-dir ../audio/kokoro_sentences --voice am_onyx --speed 0.85
python kokoro_words.py --input ../Sentences.md --output-dir ../audio/kokoro_sentences --prompt-style concilium
python kokoro_words.py --input ../Sentences.md --output-dir ../audio/kokoro_sentences --concilium-only
python kokoro_words.py --input ../Sentences.md --output-dir ../audio/kokoro_sentences --concilium-render spoken
python kokoro_words.py --input ../Paragraphs.md --unit paragraphs --output-dir ../audio/kokoro_paragraphs --limit 1
python kokoro_words.py --input ../Sentences.md --unit words --output-dir ../audio/kokoro_words
python kokoro_words.py --input ../Sentences.md --export-overrides-template pronunciation_overrides.tsv

The script writes .wav files plus a manifest.tsv file with the original pronunciation hints and the prompt sent to Kokoro.

Pronunciation Overrides

The script reads gen/pronunciation_overrides.tsv by default. This lets you keep using the pronunciation column in your translated markdown while manually correcting any word that Kokoro still says badly.

Important:

In the default bilingual mode, the English sentence still comes first.
--concilium-only skips the English lead-in entirely.
Overrides only affect the Concilium side of the sentence.
By default the Concilium side is rendered from phonemes.
Use --prompt-style concilium only if you explicitly want audio without the English lead-in.

Format:

concilium	pronunciation	spoken
dimdran	[d-ee-m-dr-ah-n]	deem-drahn
krukhdraes	[kr-oo-kh-dr-eye-s]	krookh-drice

How it works:

concilium: the Concilium word as it appears in your translated markdown
pronunciation: the bracketed pronunciation from the markdown table
spoken: the exact English-ish prompt Kokoro should read for that word

You can generate a starter template from any translated markdown file:

python kokoro_words.py --input ../Sentences.md --export-overrides-template pronunciation_overrides.tsv

Then edit the spoken column by ear until the pronunciation sounds right.

Research Notes

This repository should be understood as an experimental linguistic generator rather than a finished conlang toolkit. The current implementation establishes a disciplined baseline for further research into:

alien phonotactics
non-human semantic categorization
extended morphology
script design
corpus generation
machine-assisted translation between English glosses and Concilium

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Concilium

Abstract

Research Aim

Method

Current Linguistic Profile

Running The Prototype

Generate Lexicon (Words)

Generate Sentences

Generate Paragraphs

Data Corpus

Validation

Generate Audio With Kokoro

Pronunciation Overrides

Research Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
audio/kokoro_paragraphs		audio/kokoro_paragraphs
data		data
gen		gen
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Paragraphs.md		Paragraphs.md
README.md		README.md
Sentences.md		Sentences.md
Words.md		Words.md

Folders and files

Latest commit

History

Repository files navigation

Concilium

Abstract

Research Aim

Method

Current Linguistic Profile

Running The Prototype

Generate Lexicon (Words)

Generate Sentences

Generate Paragraphs

Data Corpus

Validation

Generate Audio With Kokoro

Pronunciation Overrides

Research Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages