Look at library size

We could potentially compress the `udata` better. I've been researching this a bit, and we could shave a good amount of bytes by changing the data layout and save in base-36 (which is fast for JavaScript to decode with `parseInt`).

I also think it's an issue that the code points are layout in this binary format: `yyyyyxxxxxxxxyyyyyyyy`. This makes the x=0 section quite big, but many times you'd only use `latin1` characters and not characters outside the BMP. A better format would be `xxxxxxxxxxxxxyyyyyyyy`. This creates more data rows, but you have to decompress less data in average, based on the assumption that normal text only revolves around a few Unicode scripts. Or maybe we should make a split between the way BMP and outside-BMP is stored.

I just need to look at my research files again and write the points of my research down in this issue.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Look at library size #17

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Look at library size #17

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions