Skip to content

paul-mothapo/Concilium

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Concilium

Abstract

Concilium is a research-oriented Rust prototype for the controlled generation of a novel artificial language intended to model an unfamiliar, non-human linguistic system. The project does not attempt to imitate any documented natural language directly. Instead, it uses English glosses as a semantic indexing layer so that human-readable meanings can be attached to forms generated by a distinct phonological and grammatical system.

The present objective is exploratory: to investigate whether a computational pipeline can derive lexemes, clause structure, and sound-pattern regularities that are not inherited from attested human languages while remaining internally coherent enough to support translation and later expansion.

Research Aim

The working hypothesis behind Concilium is that a credible alien language should not be constructed as a simple substitution cipher over English. A stronger approach is to separate semantic intent from linguistic realization.

In this project:

  • English provides glosses such as i, you, see, and tree.
  • Concilium generates its own surface forms from a defined phonological inventory.
  • Grammar is realized independently through its own word order and inflection rules.
  • Sound changes are applied to establish a recognizable language identity rather than a random word list.

This means English functions as the annotation layer, not as the final language model.

Method

Concilium is organized as a small research engine with clear domain boundaries:

  • phonology: defines the phoneme inventory, weighted selection, syllable templates, and phonotactic constraints.
  • lexicon: generates lexical forms from semantic glosses.
  • mutation: applies ordered sound-change rules that give the language a distinct identity.
  • grammar: realizes clauses according to Concilium word order and morphology.
  • evolution: assembles the language from its blueprint and exposes translation-oriented lookup methods.
  • presets: stores the current experimental configuration for the Concilium language.

This separation is intentional. It keeps the project extensible for later work on morphology, syntax, corpus generation, or comparative experiments.

Current Linguistic Profile

The current Concilium configuration is defined in src/presets.rs.

  • Language name: Concilium
  • Word order: SOV
  • Plural marking: suffix -en
  • Past marking: prefix ka-
  • Example sound changes include k -> kh, s -> sh before vowels, and a -> ae between consonants.

Because Concilium is SOV, English clauses must often be reordered during translation. A gloss sequence equivalent to English I see you is realized structurally as I you see.

Running The Prototype

Concilium uses a multi-mode output system. To run the engine, you must specify which output you want to generate:

Generate Lexicon (Words)

This generates a markdown table of all English glosses and their Concilium translations in Words.md.

cargo run -- words

Generate Sentences

This translates all English sentences found in the data/ directory and outputs them to Sentences.md.

cargo run -- sentences

Generate Paragraphs

This translates paragraph entries from data/english_paragraphs.md and outputs them to Paragraphs.md.

cargo run -- paragraphs

Data Corpus

The engine loads corpus data from the data/ directory. It supports:

  • .md files: Each line (except headings) is treated as a sentence, and words are extracted for the lexicon.
  • .json files: Extracts string values to populate the lexicon and sentence list.

Validation

The project includes basic tests for:

  • phonotactic validity
  • environment-sensitive sound change application
  • grammatical clause realization
  • language generation and clause translation availability

Run:

cargo test

Generate Audio With Kokoro

Use the helper script in gen/ to generate Concilium audio with Kokoro. It defaults to full sentence lines, says the English sentence first and then the Concilium version, and now renders the Concilium side from converted phonemes instead of making Kokoro guess from English-like spellings.

Create and activate a local virtual environment:

cd gen
python3 -m venv venv
source venv/bin/activate
python -m ensurepip --upgrade

Install a CPU-only PyTorch wheel first so kokoro does not pull the large CUDA build by default:

pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install 'kokoro>=0.3.4' soundfile

Kokoro also expects espeak-ng to be available on the system. The default voice is now Onyx (am_onyx), which uses the American English pipeline. Then generate sentence audio files from Sentences.md:

python kokoro_words.py --input ../Sentences.md --output-dir ../audio/kokoro_sentences

Useful options:

python kokoro_words.py --input ../Sentences.md --output-dir ../audio/kokoro_sentences --limit 1
python kokoro_words.py --input ../Sentences.md --output-dir ../audio/kokoro_sentences --voice am_onyx --speed 0.85
python kokoro_words.py --input ../Sentences.md --output-dir ../audio/kokoro_sentences --prompt-style concilium
python kokoro_words.py --input ../Sentences.md --output-dir ../audio/kokoro_sentences --concilium-only
python kokoro_words.py --input ../Sentences.md --output-dir ../audio/kokoro_sentences --concilium-render spoken
python kokoro_words.py --input ../Paragraphs.md --unit paragraphs --output-dir ../audio/kokoro_paragraphs --limit 1
python kokoro_words.py --input ../Sentences.md --unit words --output-dir ../audio/kokoro_words
python kokoro_words.py --input ../Sentences.md --export-overrides-template pronunciation_overrides.tsv

The script writes .wav files plus a manifest.tsv file with the original pronunciation hints and the prompt sent to Kokoro.

Pronunciation Overrides

The script reads gen/pronunciation_overrides.tsv by default. This lets you keep using the pronunciation column in your translated markdown while manually correcting any word that Kokoro still says badly.

Important:

  • In the default bilingual mode, the English sentence still comes first.
  • --concilium-only skips the English lead-in entirely.
  • Overrides only affect the Concilium side of the sentence.
  • By default the Concilium side is rendered from phonemes.
  • Use --prompt-style concilium only if you explicitly want audio without the English lead-in.

Format:

concilium	pronunciation	spoken
dimdran	[d-ee-m-dr-ah-n]	deem-drahn
krukhdraes	[kr-oo-kh-dr-eye-s]	krookh-drice

How it works:

  • concilium: the Concilium word as it appears in your translated markdown
  • pronunciation: the bracketed pronunciation from the markdown table
  • spoken: the exact English-ish prompt Kokoro should read for that word

You can generate a starter template from any translated markdown file:

python kokoro_words.py --input ../Sentences.md --export-overrides-template pronunciation_overrides.tsv

Then edit the spoken column by ear until the pronunciation sounds right.

Research Notes

This repository should be understood as an experimental linguistic generator rather than a finished conlang toolkit. The current implementation establishes a disciplined baseline for further research into:

  • alien phonotactics
  • non-human semantic categorization
  • extended morphology
  • script design
  • corpus generation
  • machine-assisted translation between English glosses and Concilium

About

From the Latin word for council. Short, elegant, and scholarly. A research project or a linguistic engine.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors