Skip to content

Conversation

@macchiati
Copy link
Member

@macchiati macchiati commented Dec 24, 2025

Now uses inclusion instead of exclusion.

The source is in https://datatracker.ietf.org/doc/html/rfc5322#section-3.2.3

atext           =   ALPHA / DIGIT /    ; Printable US-ASCII
                       "!" / "#" /        ;  characters not including
                       "$" / "%" /        ;  specials.  Used for atoms.
                       "&" / "'" /
                       "*" / "+" /
                       "-" / "/" /
                       "=" / "?" /
                       "^" / "_" /
                       "`" / "{" /
                       "|" / "}" /
                       "~"
=>
  
  atext           =   [[a-zA-Z][0-9][
                       \! \#
                       \$ \%
                       \& \'
                       \* \+
                       \- \/
                       \= \?
                       \^ \_
                       \` \{
                       \| \}
                       \~]]

=>
atext = [[a-zA-Z][0-9][_ \- ! ? ' \{ \} * / \& # % ` \^ + = | ~ \$]]

@macchiati macchiati changed the title Fix comma Fix comma in UTS 58 data files and code Dec 24, 2025
3D000..3FC3F # 18.0 [11328] (U+3D000..U+3FC3F) SEAL CHARACTER-3D000..SEAL CHARACTER-3FC3F
E0100..E01EF # 4.0 [240] (U+E0100..U+E01EF) VARIATION SELECTOR-17..VARIATION SELECTOR-256

# Total code points: 162119
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this delta might be because of v18 characters.

Not sure what is going on, because main doesn't seem to have the environment variable that was in Robin's PR, that lets us switch between v17 and dev.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to put -DPOST_SYNCHRONIZED_17 on the maven command line invocation of the generator to generate the 17 files rather than the dev=18 ones.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to put -DPOST_SYNCHRONIZED_17 on the maven command line invocation of the generator to generate the 17 files rather than the dev=18 ones.

.freeze();
static final UnicodeSet validEmailLocalPart =
new UnicodeSet(
"[\\p{XID_Continue}\\p{block=basic_latin}-\\p{Cc}]",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The merge went poorly here.
I think you want

    static final UnicodeSet validEmailLocalPart =
            new UnicodeSet(
                            "[\\p{XID_Continue}-\\p{block=basic_latin}]",
                            new ParsePosition(0),
                            VersionedSymbolTable.frozenAt(UNICODE_VERSION))
                    .addAll(EMAIL_ASCII_INCLUDES)
                    .freeze();

@macchiati macchiati closed this Dec 26, 2025
@macchiati macchiati deleted the med-uts58-fix-comma branch December 26, 2025 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants