-
Notifications
You must be signed in to change notification settings - Fork 141
Closed as not planned
Labels
composeIndicates a need for improvements or additions to Compose handlingIndicates a need for improvements or additions to Compose handlingdiscussion: backward compatibilityenhancementIndicates new feature requestsIndicates new feature requests
Description
Modern Composition
NOTE: This document is a draft.
Introduction
The current power of Compose sequences is great but looks limited compared to macOS.
macOS uses a state machine, which is quite powerful. In fact, the current implementation of Compose in xkbcommon also uses a state machine internally, but we do not use its full power.
I propose we change that and create a new format in order to:
- Avoid repetition of sequences sharing same prefix. It may speed up parsing.
- Simplify the sequences: implicit keysyms.
- Allow to set a custom “feedback” string while composing.
- Allow to have recursive sequences.
- Allow to output and then continue composing. This feature could be interpreted as a custom locked layer.
- Allow to define a “terminator” to control the behavior when there is no matching sequence:
- Control whether to output something or not.
- Control whether to continue composing or not.
- Do the previous depending on predicates on the input (see “filter”).
Proposed changes
- Start using versions for Compose file format. The current (legacy) format has implicit version 1.
- For the newer format, an explicit version number is required.
- Use an additional new environment variable
XKB_COMPOSE_FILEto detect what Compose file to load. This way we can keep compatibility with X11 and itsXCOMPOSEFILEvariable.XKB_COMPOSE_FILEhas precedence overXCOMPOSEFILE. - Some features of the new format may no be supported by apps using xkbcommon. Thus we should guard them with flags.
- Refactor the Compose table to handle the new format.
New Compose file format
The new Compose file format is based on a restricted set of features of YAML 1.2.
Documented example:
# First document is reserved for configuration
compose version: 2 # mandatory format version. Legacy files have implicit version: 1.
--- # Start a new YAML document
# States are identified by a name. TODO: recommendations for standard dead keys
acute:
# Optional corresponding keysyms. If none: custom state
keysym: dead_acute
# If set, the following string is displayed while composing
feedback: "´"
# State transitions
transitions:
# Implicit entry of one character.
# Equivalent to legacy: <dead_acute> <a>: "á" aacute
# Equivalent to new: {char: á, keysym: aacute, next: __none__}
a: "á"
# Implicit entry of multiple characters.
# Equivalent to legacy: <dead_acute> <q>: "q́"
# Equivalent to new: {string: "q́", keysym: __none__, next: __none__}
q: "q́"
# Explicit entry of one character without keysym.
# Equivalent to legacy: <dead_acute> <e>: "é" eacute
# Equivalent to new: {char: é, keysym: eacute, next: __none__}
e: {char: "é"}
# Explicit entry of one character with keysym.
# Equivalent to legacy: <dead_acute> <i>: "í" iacute
# Equivalent to new: {char: í, keysym: iacute, next: __none__}
i: {char: "í", keysym: iacute}
# Explicit entry of multiple characters.
# Equivalent to legacy: <dead_acute> <x>: "x́"
# Equivalent to new: {string: "x́", keysym: __none__, next: __none__}
x: {string: "x́"}
# Chained dead key
# Equivalent to legacy:
# <dead_acute> <dead_macron> <e>: U1E17 "ḗ"
# <dead_acute> <dead_macron> <o>: U1E53 "ṓ"
# Equivalent to new: {char: __none__, next: macron_and_acute}
dead_macron: {next: macron_and_acute}
# Sequences (avoid creating explicit intermediate states, e.g. “double_acute”)
# Equivalent to legacy: <dead_acute> <dead_acute> <o>: "ő" odoubleacute
dead_acute o: "ő" # U+0151 LATIN SMALL LETTER O WITH DOUBLE ACUTE
# Equivalent to legacy: <dead_acute> <dead_acute> <u>: "ű" udoubleacute
dead_acute u: "ű" # U+0171 LATIN SMALL LETTER U WITH DOUBLE ACUTE
# Loop. Equivalent to: {next: acute}
# No legacy equivalent
dead_acute dead_acute: {next: __loop__}
# TODO: how to handle overlaps?
dead_acute dead_acute o: 🦧
# Wildcard (aka “terminator”): match any input.
# Here we match any input, then discard it and stop.
# This is the default behaviour (no need to set it) and
# correspond to the legacy behaviour.
_: {next: __none__}
macron_and_acute:
# NOTE: custom state (no associated keysym)
feedback: "\u02DD" # U+02DD DOUBLE ACUTE ACCENT
transitions:
e: "ḗ" # U+1E17 LATIN SMALL LETTER E WITH MACRON AND ACUTE
o: "ṓ" # U+1E53 LATIN SMALL LETTER O WITH MACRON AND ACUTE
# Wildcard: match any input, discard it, output "\u02DD" and stop
_: {char: "\u02DD"}
compose:
keysym: Multi_key
transitions:
# Some classical XCompose sequences
period period: "…"
period minus: "·"
period equal: "•"
f o r a l l: "∀" # U+2200 FOR ALL
# Chained dead key (level 1)
m: {next: math}
math:
keysym: 0x11000000 # custom keysym
transitions:
# Chained dead keys (level 2)
i: {next: math-italic}
b: {next: math-bold}
s: {next: math-double-struck}
# Wildcard: match any input, output it unchanged, then stop
_: {keysym: __input__}
math-italic:
transitions:
a: {char: "𝑎", next: __loop__}
i: {char: "𝑖", next: __loop__}
# Wildcard: match any input, output it unchanged, then loop
_: {keysym: __input__, next: __loop__}
math-bold:
transitions:
a: "𝐚"
i: "𝐢"
# Wildcard with built-in filters
_:
# Discard but keep looping
- {filter: __letter__, next: __loop__}
# Output unchanged and loop
- {filter: __number__, keysym: __input__, next: __loop__}
- {filter: __punctuation__, keysym: __input__, next: __loop__}
# Output unchanged and stop
- {keysym: __input__}
math-double-struck:
feedback: 𝔸
transitions:
e: {char: 𝕖}
E: {char: 𝔼, keysym: U1D53C}
--- # Start a new YAML document
# Include locale Compose
!include "%L"
--- # Start a new YAML document
# Include custom Compose file
!include "%H/path/to/other-compose-file"Partially converted en_US.UTF8/Compose:
compose version: 2
---
acute:
keysym: dead_acute
feedback: "´"
transitions:
space: "'"
dead_acute: "´"
A: Á
E: É
I: Í
J: J́ # LATIN CAPITAL LETTER J plus COMBINING ACUTE
O: Ó
# …
dead_diaeresis: {next: diaeresis_and_acute}
Multi_key quotedbl: {next: diaeresis_and_acute}
Udiaeresis: Ǘ # LATIN CAPITAL LETTER U WITH DIAERESIS AND ACUTE
# …
dead_abovering: {next: abovering_and_acute}
Multi_key o: {next: abovering_and_acute}
Aring: "Ǻ" # LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE
# …
diaeresis_and_acute:
transitions:
space: "΅" # GREEK DIALYTIKA TONOS
U: Ǘ # LATIN CAPITAL LETTER U WITH DIAERESIS AND ACUTE
# …
abovering_and_acute:
transition:
A: "Ǻ" # LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE
# …
cedilla:
keysym:
transitions:
space: "¸"
c: ç
C: Ç
# …
compose:
keysym: Multi_key
transitions:
apostrophe: {next: acute}
comma: {next: cedilla}
# TODO: Check how to handle thee following (overlapping with previous, because unrelated)
# Maybe use: `comma: {filter: __letter__, next: cedilla}` ?
comma apostrophe: "‚" # SINGLE LOW-9 QUOTATION MARK
comma quotedbl: "„" # DOUBLE LOW-9 QUOTATION MARK
comma minus: "¬" # NOT SIGN
# …
# …X11 data
We could reuse the new format for compose files templates in the libX11 repository:
- We convert all the legacy files into the new format.
- We write a script to translate the new format to the legacy format.
- We could add additional fields for development; they are then filtered to output the new format.
Metadata
Metadata
Assignees
Labels
composeIndicates a need for improvements or additions to Compose handlingIndicates a need for improvements or additions to Compose handlingdiscussion: backward compatibilityenhancementIndicates new feature requestsIndicates new feature requests