Skip to content

Support Wayland environments on Linux #3

@savetheclocktower

Description

@savetheclocktower

Have you checked for existing feature requests?

  • Completed

Summary

This library supports X11 environments on Linux (somewhat), but is not currently able to detect the proper keyboard layout in Wayland. But let's back up a bit, since this is the first issue on this repo and some folks maybe didn't even know it existed.


First of all: why does this library exist? For the reasons in this blog post.

Pulsar, like Atom before it, offers very powerful and nuanced interpretation of keystrokes. For instance: if a user types @, Pulsar’s interpretation of that keystroke may vary based on how the keyboard produced the character. Quoth the blog post:

Take for example the Swiss-German keyboard layout on macOS. If you want to type an @ character on that layout, you need to hold alt and press the g key.

And later:

We quickly discovered that the new DOM APIs weren’t enough. While the KeyboardEvent.key property accurately reports the typed character, it doesn’t tell us whether that character depends on the current combination of modifier keys. So, for example, if we see an event with a key property of @ and an altKey property of true, should we interpret it as alt-@ or @? If @ is a printed key on the current layout, we want to honor the modifier in the keystroke descriptor, but if the user was holding alt just to access the @ key, we don’t want to include the alt- modifier in the description.

It’s a bit of an outlier, but this is fair. If your keyboard layout requires you to press alt-g to make an @, then pressing alt-g could be interpreted as @, alt-g, or even alt-@, depending on your outlook. And what about ctrl-alt-g? Should that be interpreted as ctrl-@? ctrl-alt-g? ctrl-alt-@?

Someone might be tempted to write the following code to handle this scenario:

  1. Inspecting the keyboard event, it would see a code property of KeyG, an altKey property of true, and a key property of @.
  2. Since KeyG was pressed, the code would assume that, had alt not been pressed, the keyboard would’ve produced a g character.
  3. On other platforms, it would typically interpret this as alt-g; but since we’re on macOS, there’s special logic that prevents “shadowing” of alt- modified keys that produce ASCII characters; so this would be interpreted as @.

But there’s a problem that the excerpted text hints at: despite what it looks like, you cannot assume that the KeyG key (a physical keyboard key) would’ve produced a g character when pressed. The KeyG code simply means “the place where G is on a QWERTY keyboard.”

Codes like KeyG and Digit1 describe physical keys in the positions typical of the vast majority of keyboards in use today. Their names correspond to QWERTY positioning and do not move around if the keyboard layout changes. For instance, a French-style AZERTY keyboard layout maps KeyQ to a and KeyW to z.

Luckily, there’s now an API in Chromium that allows us to know which character would’ve been typed had no modifiers been present:

let layoutMap = await navigator.keyboard.getLayoutMap();
layoutMap.get('KeyG'); // 'g'

This is a major step toward what we need. Does this mean we don’t have to write a bunch of native code to handle this now?

Not quite.

Challenges

In most cases, we only care about comparing the character that a keystroke produced (if any) to the one it would’ve produced without any modifiers.

When the shift key is involved, though, it’s trickier. Suppose we did ctrl+shift+alt+x:

  • We’d look at the code produced by the event and compare it to what would’ve been produced by ctrl+shift+x.
  • Are they equal? If so, alt didn’t change the character that was delivered to us, so we’ll preserve the modifier, since it wasn’t a necessary part of producing the character we reacted to.
  • If they’re not equal, it means the user only pressed alt in order to produce the character we reacted to, so it’s not something we should treat as a modifier.

As far as I can tell, the KeyboardLayoutMap API can’t do stuff like this; it can only tell us which key would’ve been produced had a button been pressed with no modifiers. The use case is mainly for gaming; you want to be able to tell a user, e.g., which keys correspond to the standard WASD layout. (On an AZERTY keyboard, the same keys would spell out ZQSD.)

So we still need a way to loop through all keys present on the keyboard and ask the OS what character, if any, would be produced if we pressed that key…

  • …on its own.
  • …with Shift pressed.
  • …with AltGraph pressed.
  • …with Shift and AltGraph pressed.

Wayland support on Linux

The crucial failing here is that our current keyboard-handling code on Linux presumes an X11 environment, so the associated APIs fall down when we’re on Wayland.

We could migrate entirely to libxkbcommon, or we could add a new code path for Wayland environments and leave the current X11-specific code in its own branch. Sadly, my early experiments with the libxkbcommon API have not been incredibly fruitful.

If Claude is to be believed — a pretty big if — the techniques we use to spy on the current input mechanism in X11 don’t have direct equivalents in Wayland. To my increasing disbelief, I am having trouble finding a way to ask a question like “which keymap is the current user typing with right now?” in imperative fashion. The techniques I’ve found outside of Claude for this sort of thing seem to involve reimplementing a nontrivial portion of the Wayland protocol itself along with libxkbcommon and listening for, e.g., keymap events.

The upside of this is that it would allow keyboard-layout to subscribe to keymap layout switches on Linux, a feature it’s never had before. The downside is how complex it is and how ill-equipped I feel to write it.

Conclusions

I’ll throw myself against a brick wall for a little while. Meanwhile, the new KeyboardLayoutMap API in the browser is huge, even though it’s not 100% of what we want. At some point I’ll probably play around with it so I can get a sense of how much functionality we’d be giving up if we had only KeyboardLayoutMap to fall back on. (Not easy to tell at a glance; amazingly, for how much code has been written to avoid this problem, there aren’t a lot of specs documenting how this should behave.)

What benefits does this feature provide?

Preserves the status quo for Linux users; prevents users of non-US keyboard layouts from having their keystrokes misinterpreted.

Any alternatives?

Yes, as explained above:

  • The KeyboardLayoutMap API in the browser
  • It seems a bit batty, but wayland-client is a pure-JS implementation of a Wayland client. It's got types like wl_keyboard that seem to contain keymap metadata. It would be hard to use this alongside native code, but maybe it can be used as a sanity check, or as a way of iterating more quickly, or as a spec helper.

Other examples:

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions