Skip to content

Conversation

@waywardmonkeys
Copy link
Contributor

@waywardmonkeys waywardmonkeys commented Dec 30, 2025

This PR introduces a new core vocabulary crate, text_primitives, and
updates parley/fontique to use it for shared “leaf” text types
(OpenType tags/settings, font attributes, generic families, language
tags, bidi controls, and wrap/break enums).

The goal is to keep these fundamental types small, reusable,
no_std-friendly, and insulated from higher-level dependencies,
while making it easier for the rest of the text stack to share
the same representations.

What’s in text_primitives

  • Tag/Setting (OpenType interop) with Tag stored as [u8; 4].
  • Font attributes: FontWeight, FontWidth, FontStyle.
  • CSS-ish wrap/break controls: WordBreak, OverflowWrap, TextWrapMode, plus BaseDirection.
  • Bidi controls: BidiControl, BidiDirection, BidiOverride.
  • GenericFamily (CSS generic family vocabulary).
  • Language as a compact, zero-allocation language[-Script][-REGION]
    prefix type, with strict parsing and a parse_prefix API for
    extracting the remainder.

Notable behavior / API changes

  • Enums remain exhaustive (avoids #[non_exhaustive]) where stability/panic-safety matters.
  • CSS parsing helpers are explicitly named parse_css (font weight/width/style).
  • Display for font attributes is aligned with valid CSS output (e.g.
    numeric font-weight prints as a number; oblique prints as
    oblique <angle>deg).
  • Language parsing is stricter to avoid silent data loss; trailing subtags
    are structurally validated then discarded; extlang is rejected;
    parse_prefix returns (Language, remainder).
  • Added #[must_use] to Language accessors.

Integration changes

  • parley and fontique now use text_primitives types instead of defining/duplicating
    equivalents.
  • Fontconfig numeric mappings remain in fontique

@waywardmonkeys
Copy link
Contributor Author

As discussed in #495.

This needs some bike shedding by @dfrg and maybe others.

It needs a better name for this crate as well.

@waywardmonkeys waywardmonkeys requested a review from dfrg December 30, 2025 19:50
Copy link
Contributor

@taj-p taj-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!! This is great. I've left some minor comments and questions. If @dfrg is happy with the high level approach (please see the below comment @dfrg about ICU4X), then I'm happy to approve the impl details once they're addressed.

/// assert_eq!(FontWeight::parse("850"), Some(FontWeight::new(850.0)));
/// assert_eq!(FontWeight::parse("invalid"), None);
/// ```
pub fn parse(s: &str) -> Option<Self> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an API perspective, it's odd for us to support values like SEMI_LIGHT, EXTRA_BOLD, etc and even displaying 100 outputs the string "thin" but we're only able to parse CSS style strings ("normal", "bold", or a number).

I think this should be named parse_css (for all three types) so that it's honest about what it does and leaves room for a more permissive parse or FromStr impl later.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This inconsistency bothered me too (and I think I'm the one that originally wrote it!). +1 to renaming to parse_css and adding more permissive conversions for those that want to capture more values.

Maybe add a to_css method that returns only valid CSS values as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all renamed to parse_css.

I didn't add a to_css.

impl FromStr for Language {
type Err = ParseLanguageError;

fn from_str(s: &str) -> Result<Self, Self::Err> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parsing does not error for invalid regions or scripts. For example, the following invalid subtags are dropped. I'm not sure what the correct behaviour is, but it's odd for a parse function and ParseLanguageError to not error on malformed input.

Since this is a primitive crate, it might be worth leaning towards stricter parsing because otherwise consumers will suffer silent data loss: "en-Latin-US" loses "US" with no indication making debugging difficult.

Maybe it's worth adding InvalidScript and InvalidRegion to ParseLanguageError. For example:

    #[test]
    fn invalid_script_drops_subsequent_region() {
        // "Latin" (5 chars) fails script check, fails region check, 
        // and "US" is never examined because we only check one subtag for region
        let lang = Language::parse("en-Latin-US").unwrap();
        assert_eq!(lang.as_str(), "en");
        assert_eq!(lang.script(), None);
        assert_eq!(lang.region(), None); // US is lost!
    }

I believe ICU4X errors on invalid subtags

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally agree with making the parsing a bit more strict. One thing I'd like to test is how the web handles malformed languages. If we set lang="tr-Latin-TR" do browsers still handle casing as Turkish?

Additionally, perhaps we should also return the remainder of the string even on a successful parse, allowing the user to process the remaining subtags. For example, "tr-Latin-TR" might result in (Language("tr"), "Latin-TR")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's now a parse_prefix (matching how we do things in color roughly).

@waywardmonkeys
Copy link
Contributor Author

I think that I've hit all of the feedback points.

No one has yet suggested a new name.

Copy link
Contributor

@taj-p taj-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with comments addressed and @dfrg happy with high level direction 🙏 .

So good 🎉 🚀

}
}

fn parse_language_prefix(s: &str) -> Result<(Language, &str), ParseLanguageError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a rough read of the implementation. It looks good! That said, I haven't given this a super thorough read through - I'm relying on the tests, some more tests I wrote, and, if there are perf gains to be had, I imagine we can find them if this ever shows up in a perf profile.

@taj-p
Copy link
Contributor

taj-p commented Jan 5, 2026

As for the name, I don't mind text_primitives. Happy for someone else to weigh in. Considering it's not as yet a published crate, there might not be a lot of urgency (?). Other options:

  • text_types
  • text_attrs
  • text_props
  • ??

No one has yet suggested a new name.

What's your preference @waywardmonkeys ?

@waywardmonkeys
Copy link
Contributor Author

No one has yet suggested a new name.

What's your preference @waywardmonkeys ?

My preference was to not think about it and hope someone else would. :)

Copy link
Member

@DJMcNab DJMcNab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we can land this with just @taj-p's approval; I do agree it would be helpful to have confirmation from Chad that they don't want to see anything changed, but it's pretty clear from their comments in this thread and elsewhere (i.e. in #495) that they is at least not opposed to this direction.

As such, any tweaks from Chad can just as easily come in a follow-up (or, worst case, we can always revert). So to unblock follow-ups (#495), I think we should land it.

@waywardmonkeys
Copy link
Contributor Author

Nice work!! This is great. I've left some minor comments and questions. If @dfrg is happy with the high level approach (please see the below comment @dfrg about ICU4X), then I'm happy to approve the impl details once they're addressed.

The "below comment" is #500 (comment)

@waywardmonkeys waywardmonkeys added this pull request to the merge queue Jan 5, 2026
Merged via the queue into linebender:main with commit 67aa796 Jan 5, 2026
24 checks passed
@waywardmonkeys waywardmonkeys deleted the text-primitives-extract branch January 5, 2026 10:37
@dfrg
Copy link
Collaborator

dfrg commented Jan 5, 2026

I still think we should reconsider the name but I’m happy with the current state of the code so fine with seeing this merged to unblock other PRs. Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants