Skip to content

Modern Composition #426

@wismill

Description

@wismill

Modern Composition

NOTE: This document is a draft.

Introduction

The current power of Compose sequences is great but looks limited compared to macOS.

macOS uses a state machine, which is quite powerful. In fact, the current implementation of Compose in xkbcommon also uses a state machine internally, but we do not use its full power.

I propose we change that and create a new format in order to:

  • Avoid repetition of sequences sharing same prefix. It may speed up parsing.
  • Simplify the sequences: implicit keysyms.
  • Allow to set a custom “feedback” string while composing.
  • Allow to have recursive sequences.
  • Allow to output and then continue composing. This feature could be interpreted as a custom locked layer.
  • Allow to define a “terminator” to control the behavior when there is no matching sequence:
    • Control whether to output something or not.
    • Control whether to continue composing or not.
    • Do the previous depending on predicates on the input (see “filter”).

Proposed changes

  • Start using versions for Compose file format. The current (legacy) format has implicit version 1.
  • For the newer format, an explicit version number is required.
  • Use an additional new environment variable XKB_COMPOSE_FILE to detect what Compose file to load. This way we can keep compatibility with X11 and its XCOMPOSEFILE variable. XKB_COMPOSE_FILE has precedence over XCOMPOSEFILE.
  • Some features of the new format may no be supported by apps using xkbcommon. Thus we should guard them with flags.
  • Refactor the Compose table to handle the new format.

New Compose file format

The new Compose file format is based on a restricted set of features of YAML 1.2.

Documented example:

# First document is reserved for configuration
compose version: 2  # mandatory format version. Legacy files have implicit version: 1.
--- # Start a new YAML document
# States are identified by a name. TODO: recommendations for standard dead keys
acute:
  # Optional corresponding keysyms. If none: custom state
  keysym: dead_acute
  # If set, the following string is displayed while composing
  feedback: "´"
  # State transitions
  transitions:
    # Implicit entry of one character.
    # Equivalent to legacy: <dead_acute> <a>: "á" aacute
    # Equivalent to new: {char: á, keysym: aacute, next: __none__}
    a: "á"
    # Implicit entry of multiple characters.
    # Equivalent to legacy: <dead_acute> <q>: "q́"
    # Equivalent to new: {string: "q́", keysym: __none__, next: __none__}
    q: ""
    # Explicit entry of one character without keysym.
    # Equivalent to legacy: <dead_acute> <e>: "é" eacute
    # Equivalent to new: {char: é, keysym: eacute, next: __none__}
    e: {char: "é"}
    # Explicit entry of one character with keysym.
    # Equivalent to legacy: <dead_acute> <i>: "í" iacute
    # Equivalent to new: {char: í, keysym: iacute, next: __none__}
    i: {char: "í", keysym: iacute}
    # Explicit entry of multiple characters.
    # Equivalent to legacy: <dead_acute> <x>: "x́"
    # Equivalent to new: {string: "x́", keysym: __none__, next: __none__}
    x: {string: "x́"}
    # Chained dead key
    # Equivalent to legacy:
    #   <dead_acute> <dead_macron> <e>: U1E17 "ḗ"
    #   <dead_acute> <dead_macron> <o>: U1E53 "ṓ"
    # Equivalent to new: {char: __none__, next: macron_and_acute}
    dead_macron: {next: macron_and_acute}
    # Sequences (avoid creating explicit intermediate states, e.g. “double_acute”)
    # Equivalent to legacy: <dead_acute> <dead_acute> <o>: "ő" odoubleacute
    dead_acute o: "ő" # U+0151 LATIN SMALL LETTER O WITH DOUBLE ACUTE
    # Equivalent to legacy: <dead_acute> <dead_acute> <u>: "ű" udoubleacute
    dead_acute u: "ű" # U+0171 LATIN SMALL LETTER U WITH DOUBLE ACUTE
    # Loop. Equivalent to: {next: acute}
    # No legacy equivalent
    dead_acute dead_acute: {next: __loop__}
    # TODO: how to handle overlaps?
    dead_acute dead_acute o: 🦧
    # Wildcard (aka “terminator”): match any input.
    # Here we match any input, then discard it and stop.
    # This is the default behaviour (no need to set it) and
    # correspond to the legacy behaviour.
    _: {next: __none__}
macron_and_acute:
  # NOTE: custom state (no associated keysym)
  feedback: "\u02DD" # U+02DD DOUBLE ACUTE ACCENT
  transitions:
    e: "" # U+1E17 LATIN SMALL LETTER E WITH MACRON AND ACUTE
    o: "" # U+1E53 LATIN SMALL LETTER O WITH MACRON AND ACUTE
    # Wildcard: match any input, discard it, output "\u02DD" and stop
    _: {char: "\u02DD"}
compose:
  keysym: Multi_key
  transitions:
    # Some classical XCompose sequences
    period period: ""
    period minus: "·"
    period equal: ""
    f o r a l l: "" # U+2200 FOR ALL
    # Chained dead key (level 1)
    m: {next: math}
math:
  keysym: 0x11000000 # custom keysym
  transitions:
    # Chained dead keys (level 2)
    i: {next: math-italic}
    b: {next: math-bold}
    s: {next: math-double-struck}
    # Wildcard: match any input, output it unchanged, then stop
    _: {keysym: __input__}
math-italic:
  transitions:
    a: {char: "𝑎", next: __loop__}
    i: {char: "𝑖", next: __loop__}
    # Wildcard: match any input, output it unchanged, then loop
    _: {keysym: __input__, next: __loop__}
math-bold:
  transitions:
    a: "𝐚"
    i: "𝐢"
    # Wildcard with built-in filters
    _:
      # Discard but keep looping
      - {filter: __letter__, next: __loop__}
      # Output unchanged and loop
      - {filter: __number__, keysym: __input__, next: __loop__}
      - {filter: __punctuation__, keysym: __input__, next: __loop__}
      # Output unchanged and stop
      - {keysym: __input__}
math-double-struck:
  feedback: 𝔸
  transitions:
    e: {char: 𝕖}
    E: {char: 𝔼, keysym: U1D53C}
--- # Start a new YAML document
# Include locale Compose
!include "%L"
--- # Start a new YAML document
# Include custom Compose file
!include "%H/path/to/other-compose-file"

Partially converted en_US.UTF8/Compose:

compose version: 2
---
acute:
  keysym: dead_acute
  feedback: "´"
  transitions:
    space: "'"
    dead_acute: "´"
    A: Á
    E: É
    I: Í
    J: # LATIN CAPITAL LETTER J plus COMBINING ACUTE
    O: Ó
    #
    dead_diaeresis: {next: diaeresis_and_acute}
    Multi_key quotedbl: {next: diaeresis_and_acute}
    Udiaeresis: Ǘ # LATIN CAPITAL LETTER U WITH DIAERESIS AND ACUTE
    #
    dead_abovering: {next: abovering_and_acute}
    Multi_key o: {next: abovering_and_acute}
    Aring: "Ǻ" # LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE
    #
diaeresis_and_acute:
  transitions:
    space: "΅" # GREEK DIALYTIKA TONOS
    U: Ǘ # LATIN CAPITAL LETTER U WITH DIAERESIS AND ACUTE
    #
abovering_and_acute:
  transition:
    A: "Ǻ" # LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE
    #
cedilla:
  keysym: 
  transitions:
    space: "¸"
    c: ç
    C: Ç
    #
compose:
  keysym: Multi_key
  transitions:
    apostrophe: {next: acute}
    comma: {next: cedilla}
    # TODO: Check how to handle thee following (overlapping with previous, because unrelated)
    #       Maybe use: `comma: {filter: __letter__, next: cedilla}` ?
    comma apostrophe: "" # SINGLE LOW-9 QUOTATION MARK
    comma quotedbl: "" # DOUBLE LOW-9 QUOTATION MARK
    comma minus: "¬" # NOT SIGN
    #
#

X11 data

We could reuse the new format for compose files templates in the libX11 repository:

  • We convert all the legacy files into the new format.
  • We write a script to translate the new format to the legacy format.
  • We could add additional fields for development; they are then filtered to output the new format.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions