codebook with the most frequent ngrams in language/s 

I know this guy..;) (from Redis)
did you hand pick the codebook dictionary? how?
have you though about using the most frequent ngrams in language/s?
e.g the top (e.g 32) ngrams from [Norvig](http://norvig.com/mayzner.html)'s ngrams2,3,4,5,6,7,8,9.csv?
How do you optimally pick them for minimum overlap and better compression rates? i.e 
`ation` and `tion` are the most common 4 and 5 letters long ngrams respectively, `tio` is the 6th most common 3 letters ngram.
I think you'd get much better/higher compression rates.

I wanna test it, but couldn't find any docs.
so what are these characters?
```
static char *Smaz_cb[241] = {
"\002s,\266", "\003had\232\002leW", "\003on \216", "", "\001yS",
"\002ma\255\002li\227", "\003or \260", "", "\002ll\230\003s t\277",
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

codebook with the most frequent ngrams in language/s #11

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

codebook with the most frequent ngrams in language/s #11

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions