-
Notifications
You must be signed in to change notification settings - Fork 57
Text primitives extract #500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text primitives extract #500
Conversation
taj-p
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
text_primitives/src/font.rs
Outdated
| /// assert_eq!(FontWeight::parse("850"), Some(FontWeight::new(850.0))); | ||
| /// assert_eq!(FontWeight::parse("invalid"), None); | ||
| /// ``` | ||
| pub fn parse(s: &str) -> Option<Self> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From an API perspective, it's odd for us to support values like SEMI_LIGHT, EXTRA_BOLD, etc and even displaying 100 outputs the string "thin" but we're only able to parse CSS style strings ("normal", "bold", or a number).
I think this should be named parse_css (for all three types) so that it's honest about what it does and leaves room for a more permissive parse or FromStr impl later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This inconsistency bothered me too (and I think I'm the one that originally wrote it!). +1 to renaming to parse_css and adding more permissive conversions for those that want to capture more values.
Maybe add a to_css method that returns only valid CSS values as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all renamed to parse_css.
I didn't add a to_css.
| impl FromStr for Language { | ||
| type Err = ParseLanguageError; | ||
|
|
||
| fn from_str(s: &str) -> Result<Self, Self::Err> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parsing does not error for invalid regions or scripts. For example, the following invalid subtags are dropped. I'm not sure what the correct behaviour is, but it's odd for a parse function and ParseLanguageError to not error on malformed input.
Since this is a primitive crate, it might be worth leaning towards stricter parsing because otherwise consumers will suffer silent data loss: "en-Latin-US" loses "US" with no indication making debugging difficult.
Maybe it's worth adding InvalidScript and InvalidRegion to ParseLanguageError. For example:
#[test]
fn invalid_script_drops_subsequent_region() {
// "Latin" (5 chars) fails script check, fails region check,
// and "US" is never examined because we only check one subtag for region
let lang = Language::parse("en-Latin-US").unwrap();
assert_eq!(lang.as_str(), "en");
assert_eq!(lang.script(), None);
assert_eq!(lang.region(), None); // US is lost!
}I believe ICU4X errors on invalid subtags
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally agree with making the parsing a bit more strict. One thing I'd like to test is how the web handles malformed languages. If we set lang="tr-Latin-TR" do browsers still handle casing as Turkish?
Additionally, perhaps we should also return the remainder of the string even on a successful parse, allowing the user to process the remaining subtags. For example, "tr-Latin-TR" might result in (Language("tr"), "Latin-TR")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's now a parse_prefix (matching how we do things in color roughly).
Also, fix up some expect strings.
This matches what we do in other crates.
|
I think that I've hit all of the feedback points. No one has yet suggested a new name. |
taj-p
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with comments addressed and @dfrg happy with high level direction 🙏 .
So good 🎉 🚀
| } | ||
| } | ||
|
|
||
| fn parse_language_prefix(s: &str) -> Result<(Language, &str), ParseLanguageError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a rough read of the implementation. It looks good! That said, I haven't given this a super thorough read through - I'm relying on the tests, some more tests I wrote, and, if there are perf gains to be had, I imagine we can find them if this ever shows up in a perf profile.
|
As for the name, I don't mind
What's your preference @waywardmonkeys ? |
My preference was to not think about it and hope someone else would. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we can land this with just @taj-p's approval; I do agree it would be helpful to have confirmation from Chad that they don't want to see anything changed, but it's pretty clear from their comments in this thread and elsewhere (i.e. in #495) that they is at least not opposed to this direction.
As such, any tweaks from Chad can just as easily come in a follow-up (or, worst case, we can always revert). So to unblock follow-ups (#495), I think we should land it.
The "below comment" is #500 (comment) |
|
I still think we should reconsider the name but I’m happy with the current state of the code so fine with seeing this merged to unblock other PRs. Thanks all! |
This PR introduces a new core vocabulary crate,
text_primitives, andupdates
parley/fontiqueto use it for shared “leaf” text types(OpenType tags/settings, font attributes, generic families, language
tags, bidi controls, and wrap/break enums).
The goal is to keep these fundamental types small, reusable,
no_std-friendly, and insulated from higher-level dependencies,while making it easier for the rest of the text stack to share
the same representations.
What’s in
text_primitivesTag/Setting(OpenType interop) withTagstored as[u8; 4].FontWeight,FontWidth,FontStyle.WordBreak,OverflowWrap,TextWrapMode, plusBaseDirection.BidiControl,BidiDirection,BidiOverride.GenericFamily(CSS generic family vocabulary).Languageas a compact, zero-allocationlanguage[-Script][-REGION]prefix type, with strict parsing and a
parse_prefixAPI forextracting the remainder.
Notable behavior / API changes
#[non_exhaustive]) where stability/panic-safety matters.parse_css(font weight/width/style).Displayfor font attributes is aligned with valid CSS output (e.g.numeric font-weight prints as a number; oblique prints as
oblique <angle>deg).Languageparsing is stricter to avoid silent data loss; trailing subtagsare structurally validated then discarded;
extlangis rejected;parse_prefixreturns(Language, remainder).#[must_use]toLanguageaccessors.Integration changes
parleyandfontiquenow usetext_primitivestypes instead of defining/duplicatingequivalents.
fontique