Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions unicodetools/data/linkification/dev/LinkEmail.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# LinkEmail.txt
# Date: 2025-12-24, 02:37:15 GMT
# Date: 2025-12-24, 21:06:25 GMT
# © 2025 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
Expand All @@ -26,7 +26,8 @@
#
0021 # 1.1 (!) EXCLAMATION MARK
0023..0027 # 1.1 [5] (#..') NUMBER SIGN..APOSTROPHE
002A..0039 # 1.1 [16] (*..9) ASTERISK..DIGIT NINE
002A..002B # 1.1 [2] (*..+) ASTERISK..PLUS SIGN
002D..0039 # 1.1 [13] (-..9) HYPHEN-MINUS..DIGIT NINE
003D # 1.1 (=) EQUALS SIGN
003F # 1.1 (?) QUESTION MARK
0041..005A # 1.1 [26] (A..Z) LATIN CAPITAL LETTER A..LATIN CAPITAL LETTER Z
Expand Down Expand Up @@ -1331,4 +1332,4 @@ FFDA..FFDC # 1.1 [3] (ᅳ..ᅵ) HALFWIDTH HANGUL LETTER EU..HALFWIDTH HAN
3D000..3FC3F # 18.0 [11328] (U+3D000..U+3FC3F) SEAL CHARACTER-3D000..SEAL CHARACTER-3FC3F
E0100..E01EF # 4.0 [240] (U+E0100..U+E01EF) VARIATION SELECTOR-17..VARIATION SELECTOR-256

# Total code points: 162119
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this delta might be because of v18 characters.

Not sure what is going on, because main doesn't seem to have the environment variable that was in Robin's PR, that lets us switch between v17 and dev.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to put -DPOST_SYNCHRONIZED_17 on the maven command line invocation of the generator to generate the 17 files rather than the dev=18 ones.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to put -DPOST_SYNCHRONIZED_17 on the maven command line invocation of the generator to generate the 17 files rather than the dev=18 ones.

# Total code points: 149240
Original file line number Diff line number Diff line change
Expand Up @@ -194,11 +194,14 @@ private LinkTermination(String uset) {
}
}

// Note: the source standards are painful to read.
// https://en.wikipedia.org/wiki/Email_address#Local-part is much easier
// https://datatracker.ietf.org/doc/html/rfc5322#section-3.2.3 has the full list for ASCII part
// See also https://en.wikipedia.org/wiki/Email_address#Local-part
// We add dot (ascii '.'), and then check after for the special dot constraints.

static final UnicodeSet EMAIL_EXCLUDES =
new UnicodeSet("[\\u0020 ; \\: \" ( ) \\[ \\] @ \\\\ < >]").freeze();
static final UnicodeSet EMAIL_ASCII_INCLUDES =
new UnicodeSet("[[a-zA-Z][0-9][_ \\- ! ? ' \\{ \\} * / \\& # % ` \\^ + = | ~ \\$]]")
.add('.')
.freeze();
static final UnicodeSet validEmailLocalPart =
new UnicodeSet(
"[\\p{XID_Continue}\\p{block=basic_latin}-\\p{Cc}]",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The merge went poorly here.
I think you want

    static final UnicodeSet validEmailLocalPart =
            new UnicodeSet(
                            "[\\p{XID_Continue}-\\p{block=basic_latin}]",
                            new ParsePosition(0),
                            VersionedSymbolTable.frozenAt(UNICODE_VERSION))
                    .addAll(EMAIL_ASCII_INCLUDES)
                    .freeze();

Expand Down
Loading