UTS 55 conformance

[Unicode Technical Standard #55](https://www.unicode.org/reports/tr55) specifies a concept of ["identifier word boundary"](https://www.unicode.org/reports/tr55/#Identifier-Chunks):

> An *identifier word boundary* is defined by the following rules, using the notation from *Section 1.1, [Notation](https://www.unicode.org/reports/tr29/#Notation)*, in *Unicode Standard Annex #29, Unicode Text Segmentation* [[UAX29](https://www.unicode.org/reports/tr55/#UAX29)].
>
> > *Treat a letter followed by a sequence of nonspacing or enclosing marks as that letter.*
>
> *The regular expressions for the following rules incorporate this one; only the text descriptions rely on it.*
>
> > *🐫 CamelBoundary. An identifier word boundary exists after a lowercase or non-Greek titlecase letter followed by an uppercase or titlecase letter:*
>
> `[ \p{Ll} [\p{Lt}-\p{Grek}] ] [\p{Mn}\p{Me}]*` ÷ `[\p{Lu}\p{Lt}]`
>
> > *🎩 HATBoundary. An identifier word boundary exists before an uppercase or titlecase letter followed by a lowercase letter, or before a non-Greek titlecase letter:*
>
> ÷ `[\p{Lu}\p{Lt}] [\p{Mn}\p{Me}]* \p{Ll}  |  [\p{Lt}-\p{Grek}]`
>
> > *🐍 snake_boundary. An identifier word boundary exists either side either side of a Punctuation character which is not an Other_Punctuation character:*
>
> ÷ `[\p{P}-\p{Po}]`
> `[\p{P}-\p{Po}]` ÷
>
> > *No other identifier word boundaries exist.*
>
> Any × Any

It would be nice if `heck` could follow this spec.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTS 55 conformance #55

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

UTS 55 conformance #55

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions