Skip to content

in icu_utf8_char(): codepoint value 0x110000 is accepted, even if it generates invalid UTF-8 #5

@verdy-p

Description

@verdy-p

Inside: static int icu_utf8_char(lua_State *L)
at:

else if (codePoint > 0x110000) {

the test is incorrect:

       else if (codePoint > 0x110000) {
            return luaL_argerror(L,i+1,"invalid codepoint");
        }
        else {
            // 00000000 000zzzzz yyyyyyyy xxxxxxxx
            // 11110zzz 10zzyyyy 10yyyyxx 10xxxxxx
            ...

It seems you have reversed the original test that was

       else if (codePoint < 0x110000) {
            // 00000000 000zzzzz yyyyyyyy xxxxxxxx
            // 11110zzz 10zzyyyy 10yyyyxx 10xxxxxx
            ...
        }
        else {
            return luaL_argerror(L,i+1,"invalid codepoint");
        }

But you forgot the equal sign! So the code point 0x110000 is accepted as if it was valid and will generate a non-standard UTF-8 sequence (0xF8,0x80,0x80,0x80): the bytes 0xC0, 0xC1, and 0xF8 to 0xFF are NEVER part of any valid UTF-8 text.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions