Languages Command

The languages command detects languages used in the text content of a GEDCOM file -- notes, biographical stories, and event descriptions.

Usage

gedcom-tools languages <file> [options]

Options

Option	Description
`--format {text,json}`	Output format (default: text)
`-v, --verbose`	Show timing with processing phases
`-q, --quiet`	One-line summary
`--min-length N`	Minimum text length for detection (default: 10). Shorter texts are skipped as unreliable
`--language LANG`	Show records in a specific language (name or ISO 639-1 code)
`--show-text`	Show the detected text for each match (requires `--language`)

Modes

Aggregate Mode (default)

Without --language, the command scans all text content and shows a breakdown table of languages by category.

Filter Mode (`--language`)

With --language, the command lists the specific persons, notes, and events that contain text in that language. Accepts language names ("English", "Greek"), ISO 639-1 codes ("en", "el"), or the special value "unknown" for unclassifiable texts. Matching is case-insensitive.

How It Works

Encoding is detected and the language detection models are loaded
A note index is built so pointer notes (NOTE @N1@) can be resolved and analyzed once (cached)
INDI and FAM records are walked, classifying each note into its category
Unreferenced top-level notes are classified last
Results are sorted by total count (aggregate) or by xref (filter)

In verbose mode, each of these five phases is shown with timing.

Supported Languages

Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hungarian, Italian, Japanese, Korean, Latin, Norwegian Bokmal, Norwegian Nynorsk, Polish, Portuguese, Romanian, Russian, Spanish, Swedish, Turkish, Ukrainian.

Output

Text Output (Aggregate)

File: tree.ged
Encoding: UTF-8

=== Language Detection ===
  Texts analyzed: 42 (5 skipped, too short)

  Language             Notes  Stories  Events   Total
  ─────────────────────────────────────────────────────
  English                 10       15       8      33
  Greek                    2        4       3       9
  ─────────────────────────────────────────────────────
  Total                   12       19      11      42

  Distinct languages: 2 (excluding unknown)

  Notes   = standalone top-level notes
  Stories = biographical notes on individuals
  Events  = notes on births, deaths, marriages, and other events
  Tip: use --language <name> to list individual records in that language.

Text Output (Filter)

File: tree.ged
Encoding: UTF-8

=== Greek (el) ===
  Texts analyzed: 42 (5 skipped, too short)

  Persons with biographical notes (3):
    Eleni Papadopoulos (@I5@)
    Nikolaos Andreou (@I12@)
    Maria Konstantinou (@I44@)

  Standalone notes (1):
    @N7@

  Events with notes (3):
    @I5@  BIRT  — Eleni Papadopoulos
    @F3@  MARR
    @F3@  (family note)

FAM-level notes that aren't under an event sub-record show as "(family note)" with a null event tag.

Text Output (Filter with `--show-text`)

When --show-text is used, the detected text is shown indented below each match. Newlines in the original text are collapsed to spaces.

File: tree.ged
Encoding: UTF-8

=== Greek (el) ===
  Texts analyzed: 42 (5 skipped, too short)

  Persons with biographical notes (2):
    Eleni Papadopoulos (@I5@)
      Γεννήθηκε στην Αθήνα και μεγάλωσε στη Θεσσαλονίκη
    Nikolaos Andreou (@I12@)
      Σπούδασε ιατρική στο Πανεπιστήμιο Αθηνών

  Events with notes (1):
    @I5@  BIRT  — Eleni Papadopoulos
      Γεννήθηκε στο νοσοκομείο της Αθήνας τον Ιούνιο

JSON Output (Aggregate)

{
  "file": "tree.ged",
  "mode": "aggregate",
  "encoding": { "detected": "UTF-8", "has_bom": false, "declared": "UTF-8" },
  "languages": [
    { "language": "English", "code": "en", "notes": 10, "stories": 15, "events": 8, "total": 33 },
    { "language": "Greek", "code": "el", "notes": 2, "stories": 4, "events": 3, "total": 9 }
  ],
  "summary": { "total_texts": 42, "skipped_short": 5, "distinct_languages": 2, "min_length": 20 },
  "categories": {
    "notes": "Standalone top-level notes",
    "stories": "Biographical notes on individuals",
    "events": "Notes on births, deaths, marriages, and other events"
  }
}

JSON Output (Filter)

{
  "file": "tree.ged",
  "mode": "filter",
  "encoding": { "detected": "UTF-8", "has_bom": false, "declared": "UTF-8" },
  "language": "Greek",
  "code": "el",
  "persons": [
    { "xref": "@I5@", "name": "Eleni Papadopoulos" },
    { "xref": "@I12@", "name": "Nikolaos Andreou" },
    { "xref": "@I44@", "name": "Maria Konstantinou" }
  ],
  "notes": [
    { "xref": "@N7@" }
  ],
  "events": [
    { "parent_xref": "@I5@", "event_tag": "BIRT", "name": "Eleni Papadopoulos" },
    { "parent_xref": "@F3@", "event_tag": "MARR", "name": null },
    { "parent_xref": "@F3@", "event_tag": null, "name": null }
  ],
  "summary": {
    "person_count": 3,
    "note_count": 1,
    "event_count": 3,
    "total_matches": 7,
    "total_texts": 42,
    "skipped_short": 5,
    "min_length": 20
  }
}

JSON Output (Filter with `--show-text`)

When --show-text is used, each person, note, and event object includes a "texts" array with the full detected text (newlines preserved).

{
  "persons": [
    { "xref": "@I5@", "name": "Eleni Papadopoulos", "texts": ["Γεννήθηκε στην Αθήνα..."] }
  ],
  "notes": [
    { "xref": "@N7@", "texts": ["Σημείωση για την οικογένεια..."] }
  ],
  "events": [
    { "parent_xref": "@I5@", "event_tag": "BIRT", "name": "Eleni Papadopoulos", "texts": ["Γεννήθηκε στο νοσοκομείο..."] }
  ]
}

Quiet Mode

Aggregate: 2 language(s) detected across 42 text(s)

Filter: Greek: 3 persons, 1 note, 3 events

Returns empty output when there are no results.

Exit Codes

Code	Meaning
0	Success
1	Error during processing
2	Usage error (file not found, invalid language)

Notes

A person with multiple notes in the target language appears only once in filter results (deduplicated by xref)
event_tag is null for FAM direct notes -- these are family-level notes not attached to a specific event
Pointer notes are resolved and their text is analyzed once; the result is cached to avoid duplicate detection
--min-length and --language can be combined
--show-text requires --language -- using it without --language exits with code 2
Passing an unrecognized language name prints supported languages to stderr and exits with code 2
In quiet text mode, --show-text is silently ignored (the one-line summary is unchanged). In quiet JSON mode, texts arrays are still included

Related Commands

search -- find individuals using flexible query syntax
validate -- check file structure and data issues
stats -- summary statistics for a GEDCOM file
isolated -- detect unconnected individuals
compare -- cross-file individual matching

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Languages Command

Usage

Options

Modes

Aggregate Mode (default)

Filter Mode (`--language`)

Categories

How It Works

Supported Languages

Output

Text Output (Aggregate)

Text Output (Filter)

Text Output (Filter with `--show-text`)

JSON Output (Aggregate)

JSON Output (Filter)

JSON Output (Filter with `--show-text`)

Quiet Mode

Exit Codes

Notes

Related Commands

FilesExpand file tree

languages.md

Latest commit

History

languages.md

File metadata and controls

Languages Command

Usage

Options

Modes

Aggregate Mode (default)

Filter Mode (--language)

Categories

How It Works

Supported Languages

Output

Text Output (Aggregate)

Text Output (Filter)

Text Output (Filter with --show-text)

JSON Output (Aggregate)

JSON Output (Filter)

JSON Output (Filter with --show-text)

Quiet Mode

Exit Codes

Notes

Related Commands

Filter Mode (`--language`)

Text Output (Filter with `--show-text`)

JSON Output (Filter with `--show-text`)