Montre

A modern, embeddable corpus query engine with first-class support for aligned corpora.

montre (/mɔ̃tʁ/): “shows,” “reveals,” “makes visible” — from French montrer, “to show.” The Latin root is monstrare, “to point out, indicate.”

No server, external services, or prerequisites.

A corpus is a self-contained directory with its own data, indexes, and (optionally) alignments. Build it in one line from your annotation files, or from a TOML manifest describing multiple components.

Designed to be used from the CLI or embedded directly in Julia or Python.

Install

curl -fsSL https://raw.githubusercontent.com/myersm0/montre/main/install.sh | sh

Quick start

# Build a corpus from a directory of CoNLL-U files:
montre build -i data/maupassant/ -o my-corpus/

# Query
montre query my-corpus/ '[pos="ADJ"] [pos="NOUN"]'

# Count
montre count my-corpus/ '[pos="ADJ"] [pos="NOUN"]'
montre count my-corpus/ '[pos="NOUN"]' --by-document
montre count my-corpus/ '[pos="NOUN"]' --by-component

# Filter
montre query my-corpus/ '[pos="ADJ"] [pos="NOUN"]' --document la-parure
montre query my-corpus/ '[pos="ADJ"] [pos="NOUN"]' --component fr

# Inspect
montre info my-corpus/
montre docs my-corpus/
montre layers my-corpus/
montre vocab my-corpus/ pos
montre vocab my-corpus/ lemma --top 50 --component fr

Query language

Montre uses a CQL-based language, extended with labels, constraints, and alignment-aware operations.

Core patterns

# Token queries
[pos="NOUN"]
[lemma="maison"]
[word="chat" & pos="NOUN"]
[lemma=/^un.*/]
[pos!="PUNCT"]

# Sequences
[pos="DET"] [pos="ADJ"]* [pos="NOUN"]

# Quantifiers
[pos="ADJ"]+
[pos="ADJ"]*
[pos="ADJ"]?
[pos="ADJ"]{2,4}

# Alternation
([pos="ADJ"] | [pos="ADV"])+ [pos="NOUN"]

Structural constraints

[pos="DET"] [pos="NOUN"] within s
[lemma="chat"] within doc

Morphological features

Requires using the flag --decompose-feats at build time.

[pos="NOUN" & feats.Number="Plur"]
[feats.Gender="Masc" & feats.Tense="Past"]

Component and document filtering

[pos="NOUN"] within component:fr
[pos="ADJ"] [pos="NOUN"] within doc:"la-parure","boule-de-suif"

Labeled captures and global constraints

a:[pos="NOUN"] []* b:[pos="NOUN"] :: a.lemma = b.lemma
a:[pos="ADJ"] b:[pos="NOUN"] :: a.lemma != b.lemma
a:[] []{0,20} b:[] :: distance(a,b) >= 5

Constraints are evaluated over full matches using labeled spans.

Parallel corpus support

Montre was designed from the ground up specifically for parallel corpora.

Montre treats a parallel corpus as a single object with multiple components and explicit alignment relations, rather than as separate corpora joined at query time.

Key features

Multiple components (languages, editions, translations)
Named alignments at any span level (sentence, paragraph, stanza)
Multiple competing alignment sets (LaBSE, vecalign, manual)
Alignment projection between components

Example

# Query French, project to English
[lemma="maison"] within component:fr =labse=>

This enables:

tracing translations across languages
detecting omissions or expansions
comparing editions or variants

Build a multi-component corpus

[corpus]
name = "isosceles"
decompose_feats = true

[components.maupassant-fr]
path = "data/maupassant/fr/conllu/"
language = "fr"

[components.maupassant-en]
path = "data/maupassant/en/conllu/"
language = "en"

[alignments.labse]
source = "maupassant-fr"
target = "maupassant-en"
edges = "alignments/labse/"
source_layer = "sentence"
target_layer = "sentence"

montre build -m corpus.toml -o my-corpus/

Performance

Montre is competitive with established corpus engines while prioritizing structural flexibility and embeddability.

On a 1.5M token corpus (Maupassant French/English, Apple M4 Max):

Query	Matches	Time
`[pos="NOUN"]`	244,184	0.6ms
`[pos="ADJ"] [pos="NOUN"]`	30,672	12ms
`[pos="ADJ"]? [pos="NOUN"]`	272,019	71ms
`([pos="ADJ"] \| [pos="ADV"])+ [pos="NOUN"]`	33,444	27ms
`([pos="ADJ"] \| [pos="DET"])+ [pos="NOUN"]`	198,735	71ms

Key properties:

Quantifiers use a run-based execution model (scales with matches, not corpus size)
--count-only avoids hit allocation entirely (nanosecond-scale for simple queries)
Memory-mapped indexes reduce load time and memory footprint by an order of magnitude

Bindings

Montre exposes a C FFI for embedding in other languages.

Julia (almost complete)

Montre.jl

using Montre

corpus = open_corpus("./my-corpus")
hits = query(corpus, "[pos=\"ADJ\"] [pos=\"NOUN\"]")

for line in concordance(corpus, hits)
    println(line)
end

Python (early)

Bindings via PyO3 are in progress.

import montre

corpus = montre.open("./my-corpus")
for hit in corpus.query('[pos="DET"] [pos="NOUN"]'):
    print(hit.start, hit.end)

Roadmap

Coming soon:

Statistics: group, collocation
Python bindings (feature-complete, pip install)
REPL (persistent corpus session)
TUI for interactive exploration
Support for additional input formats (VRT, Stanza JSON, TEI)

Citing Montre

A paper describing Montre is in preparation. In the meantime, if you use Montre in published research, please cite:

@software{myers-montre,
  author       = {Myers, Michael J.},
  title        = {Montre: A Modern Corpus Query Engine for Aligned Corpora},
  year         = {2026},
  url          = {https://github.com/myersm0/montre},
  version      = {0.4.0}
}

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
.github/workflows		.github/workflows
benches		benches
crates		crates
docs		docs
testdata		testdata
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Montre

Install

Quick start

Query language

Core patterns

Structural constraints

Morphological features

Component and document filtering

Labeled captures and global constraints

Parallel corpus support

Key features

Example

Build a multi-component corpus

Performance

Key properties:

Bindings

Julia (almost complete)

Python (early)

Roadmap

Citing Montre

License

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Montre

Install

Quick start

Query language

Core patterns

Structural constraints

Morphological features

Component and document filtering

Labeled captures and global constraints

Parallel corpus support

Key features

Example

Build a multi-component corpus

Performance

Key properties:

Bindings

Julia (almost complete)

Python (early)

Roadmap

Citing Montre

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors 1

Languages

Packages