Text primitives extract #500

waywardmonkeys · 2025-12-30T19:41:40Z

This PR introduces a new core vocabulary crate, text_primitives, and
updates parley/fontique to use it for shared “leaf” text types
(OpenType tags/settings, font attributes, generic families, language
tags, bidi controls, and wrap/break enums).

The goal is to keep these fundamental types small, reusable,
no_std-friendly, and insulated from higher-level dependencies,
while making it easier for the rest of the text stack to share
the same representations.

What’s in `text_primitives`

Tag/Setting (OpenType interop) with Tag stored as [u8; 4].
Font attributes: FontWeight, FontWidth, FontStyle.
CSS-ish wrap/break controls: WordBreak, OverflowWrap, TextWrapMode, plus BaseDirection.
Bidi controls: BidiControl, BidiDirection, BidiOverride.
GenericFamily (CSS generic family vocabulary).
Language as a compact, zero-allocation language[-Script][-REGION]
prefix type, with strict parsing and a parse_prefix API for
extracting the remainder.

Notable behavior / API changes

Enums remain exhaustive (avoids #[non_exhaustive]) where stability/panic-safety matters.
CSS parsing helpers are explicitly named parse_css (font weight/width/style).
Display for font attributes is aligned with valid CSS output (e.g.
numeric font-weight prints as a number; oblique prints as
oblique <angle>deg).
Language parsing is stricter to avoid silent data loss; trailing subtags
are structurally validated then discarded; extlang is rejected;
parse_prefix returns (Language, remainder).
Added #[must_use] to Language accessors.

Integration changes

parley and fontique now use text_primitives types instead of defining/duplicating
equivalents.
Fontconfig numeric mappings remain in fontique

waywardmonkeys · 2025-12-30T19:42:19Z

As discussed in #495.

This needs some bike shedding by @dfrg and maybe others.

It needs a better name for this crate as well.

text_primitives/src/tag.rs

text_primitives/src/font.rs

taj-p

Nice work!! This is great. I've left some minor comments and questions. If @dfrg is happy with the high level approach (please see the below comment @dfrg about ICU4X), then I'm happy to approve the impl details once they're addressed.

text_primitives/src/text.rs

text_primitives/src/font.rs

taj-p · 2026-01-02T18:38:38Z

text_primitives/src/font.rs

+    /// assert_eq!(FontWeight::parse("850"), Some(FontWeight::new(850.0)));
+    /// assert_eq!(FontWeight::parse("invalid"), None);
+    /// ```
+    pub fn parse(s: &str) -> Option<Self> {


From an API perspective, it's odd for us to support values like SEMI_LIGHT, EXTRA_BOLD, etc and even displaying 100 outputs the string "thin" but we're only able to parse CSS style strings ("normal", "bold", or a number).

I think this should be named parse_css (for all three types) so that it's honest about what it does and leaves room for a more permissive parse or FromStr impl later.

This inconsistency bothered me too (and I think I'm the one that originally wrote it!). +1 to renaming to parse_css and adding more permissive conversions for those that want to capture more values.

Maybe add a to_css method that returns only valid CSS values as well.

These are all renamed to parse_css.

I didn't add a to_css.

text_primitives/src/language.rs

taj-p · 2026-01-02T20:10:33Z

text_primitives/src/language.rs

+impl FromStr for Language {
+    type Err = ParseLanguageError;
+
+    fn from_str(s: &str) -> Result<Self, Self::Err> {


This parsing does not error for invalid regions or scripts. For example, the following invalid subtags are dropped. I'm not sure what the correct behaviour is, but it's odd for a parse function and ParseLanguageError to not error on malformed input.

Since this is a primitive crate, it might be worth leaning towards stricter parsing because otherwise consumers will suffer silent data loss: "en-Latin-US" loses "US" with no indication making debugging difficult.

Maybe it's worth adding InvalidScript and InvalidRegion to ParseLanguageError. For example:

#[test] fn invalid_script_drops_subsequent_region() { // "Latin" (5 chars) fails script check, fails region check, // and "US" is never examined because we only check one subtag for region let lang = Language::parse("en-Latin-US").unwrap(); assert_eq!(lang.as_str(), "en"); assert_eq!(lang.script(), None); assert_eq!(lang.region(), None); // US is lost! }

I believe ICU4X errors on invalid subtags

Generally agree with making the parsing a bit more strict. One thing I'd like to test is how the web handles malformed languages. If we set lang="tr-Latin-TR" do browsers still handle casing as Turkish?

Additionally, perhaps we should also return the remainder of the string even on a successful parse, allowing the user to process the remaining subtags. For example, "tr-Latin-TR" might result in (Language("tr"), "Latin-TR")

There's now a parse_prefix (matching how we do things in color roughly).

text_primitives/src/language.rs

Also, fix up some expect strings.

This matches what we do in other crates.

waywardmonkeys · 2026-01-03T05:36:34Z

I think that I've hit all of the feedback points.

No one has yet suggested a new name.

taj-p

LGTM with comments addressed and @dfrg happy with high level direction 🙏 .

So good 🎉 🚀

text_primitives/src/language.rs

text_primitives/src/bidi.rs

taj-p · 2026-01-05T03:06:51Z

text_primitives/src/language.rs

+    }
+}
+
+fn parse_language_prefix(s: &str) -> Result<(Language, &str), ParseLanguageError> {


I had a rough read of the implementation. It looks good! That said, I haven't given this a super thorough read through - I'm relying on the tests, some more tests I wrote, and, if there are perf gains to be had, I imagine we can find them if this ever shows up in a perf profile.

taj-p · 2026-01-05T03:11:35Z

As for the name, I don't mind text_primitives. Happy for someone else to weigh in. Considering it's not as yet a published crate, there might not be a lot of urgency (?). Other options:

text_types
text_attrs
text_props
??

No one has yet suggested a new name.

What's your preference @waywardmonkeys ?

waywardmonkeys · 2026-01-05T07:21:26Z

No one has yet suggested a new name.

What's your preference @waywardmonkeys ?

My preference was to not think about it and hope someone else would. :)

DJMcNab

I think that we can land this with just @taj-p's approval; I do agree it would be helpful to have confirmation from Chad that they don't want to see anything changed, but it's pretty clear from their comments in this thread and elsewhere (i.e. in #495) that they is at least not opposed to this direction.

As such, any tweaks from Chad can just as easily come in a follow-up (or, worst case, we can always revert). So to unblock follow-ups (#495), I think we should land it.

waywardmonkeys · 2026-01-05T10:18:46Z

Nice work!! This is great. I've left some minor comments and questions. If @dfrg is happy with the high level approach (please see the below comment @dfrg about ICU4X), then I'm happy to approve the impl details once they're addressed.

The "below comment" is #500 (comment)

dfrg · 2026-01-05T11:22:54Z

I still think we should reconsider the name but I’m happy with the current state of the code so fine with seeing this merged to unblock other PRs. Thanks all!

waywardmonkeys added 10 commits December 31, 2025 01:29

Add text_primitives crate

70dfda1

Update Cargo.lock

14b4b14

fontique: use text_primitives font attributes

091cb86

text_primitives: fix FontWeight::from_fontconfig interpolation

101d452

text_primitives: use integer keyword mapping in Display

81860e1

fontique: use text_primitives GenericFamily

e79e2bc

text_primitives: optional bytemuck impls for GenericFamily

b4b6cc1

fontique: forward bytemuck feature to text_primitives

30d0068

parley: use text_primitives Tag and Setting

f9bbeb7

parley: use text_primitives wrap enums

f008adf

Fix ci

c0b0e5e

waywardmonkeys requested a review from dfrg December 30, 2025 19:50

nicoburns reviewed Dec 30, 2025

View reviewed changes

text_primitives/src/tag.rs Outdated Show resolved Hide resolved

dfrg reviewed Dec 30, 2025

View reviewed changes

text_primitives/src/font.rs Outdated Show resolved Hide resolved

waywardmonkeys added 4 commits December 31, 2025 10:46

text_primitives: store Tag as [u8; 4]

91bd7f1

fontique: move from_fontconfig conversions out of text_primitives

49de9f0

text_primitives: move GenericFamily into its own module

8bb4d46

text_primitives: document and test font parsing

af78b7c

taj-p reviewed Jan 2, 2026

View reviewed changes

waywardmonkeys added 10 commits January 3, 2026 10:48

text_primitives: impl core::error::Error for ParseLanguageError

c16480e

Make wrap/style enums exhaustive

7035d72

text_primitives: rename CSS parsers to parse_css

3b8b3d6

text_primitives: const + inline trivial font accessors

e22f016

text_primitives: restore/extend font docs

a8b4b5c

text_primitives: stricter Language parsing

f8d8f15

fontique: document + reorder FromFontconfig mappings

f030add

text_primitives: improve FontStyle + GenericFamily variant docs

4d5f0a4

Get rid of tag_to_harfrust

3b1606f

text_primitives: Always inline adjustments on Language

8c9b751

Also, fix up some expect strings.

waywardmonkeys added 8 commits January 3, 2026 11:42

text_primitives: Add doc comment about features

2e2a6e9

This matches what we do in other crates.

ci: Check cargo rdme for text_primitives

ded0b34

text_primitives: Mention bytemuck feature

a79b852

text_primitives: Inline Language::parse

1352d7b

text_primitives: add must_use to Language accessors

11022ca

fontique: mention fonts.conf docs on FromFontconfig

f7363c3

text_primitives: clarify OverflowWrap::Normal docs

5938bab

text_primitives: add Language::parse_prefix

f50a13d

taj-p approved these changes Jan 5, 2026

View reviewed changes

waywardmonkeys added 3 commits January 5, 2026 14:08

text_primitives: clarify Language docs

8bf67a8

text_primitives: make bidi enums exhaustive

76b73e3

text_primitives: accept 4-char digit variants in Language

4c53e3b

text_primitives: make GenericFamily exhaustive

ca4e3aa

DJMcNab approved these changes Jan 5, 2026

View reviewed changes

waywardmonkeys added this pull request to the merge queue Jan 5, 2026

Merged via the queue into linebender:main with commit 67aa796 Jan 5, 2026
24 checks passed

waywardmonkeys deleted the text-primitives-extract branch January 5, 2026 10:37

Text primitives extract #500

Text primitives extract #500

Uh oh!

Conversation

waywardmonkeys commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What’s in text_primitives

Notable behavior / API changes

Integration changes

Uh oh!

waywardmonkeys commented Dec 30, 2025

Uh oh!

Uh oh!

Uh oh!

taj-p left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

taj-p Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

dfrg Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

waywardmonkeys Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

taj-p Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

dfrg Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

waywardmonkeys Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

waywardmonkeys commented Jan 3, 2026

Uh oh!

taj-p left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

taj-p Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

taj-p commented Jan 5, 2026

Uh oh!

waywardmonkeys commented Jan 5, 2026

Uh oh!

DJMcNab left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

waywardmonkeys commented Jan 5, 2026

Uh oh!

Uh oh!

dfrg commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

waywardmonkeys commented Dec 30, 2025 •

edited

Loading

What’s in `text_primitives`

DJMcNab left a comment •

edited

Loading