Skip to content

Look at library size #17

@walling

Description

@walling

We could potentially compress the udata better. I've been researching this a bit, and we could shave a good amount of bytes by changing the data layout and save in base-36 (which is fast for JavaScript to decode with parseInt).

I also think it's an issue that the code points are layout in this binary format: yyyyyxxxxxxxxyyyyyyyy. This makes the x=0 section quite big, but many times you'd only use latin1 characters and not characters outside the BMP. A better format would be xxxxxxxxxxxxxyyyyyyyy. This creates more data rows, but you have to decompress less data in average, based on the assumption that normal text only revolves around a few Unicode scripts. Or maybe we should make a split between the way BMP and outside-BMP is stored.

I just need to look at my research files again and write the points of my research down in this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions