-
Notifications
You must be signed in to change notification settings - Fork 134
Open
Description
I know this guy..;) (from Redis)
did you hand pick the codebook dictionary? how?
have you though about using the most frequent ngrams in language/s?
e.g the top (e.g 32) ngrams from Norvig's ngrams2,3,4,5,6,7,8,9.csv?
How do you optimally pick them for minimum overlap and better compression rates? i.e
ation and tion are the most common 4 and 5 letters long ngrams respectively, tio is the 6th most common 3 letters ngram.
I think you'd get much better/higher compression rates.
I wanna test it, but couldn't find any docs.
so what are these characters?
static char *Smaz_cb[241] = {
"\002s,\266", "\003had\232\002leW", "\003on \216", "", "\001yS",
"\002ma\255\002li\227", "\003or \260", "", "\002ll\230\003s t\277",
kristofferkoch, MaxBarraclough, bluecube, dumblob and ssj-gz
Metadata
Metadata
Assignees
Labels
No labels