-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
These two dictionaries should emit the same codebook because the ordering of the frequencies (the dictionary values) is the same in both cases:
>>> d1 = {x: x for x in range(1, 17)}
>>> d1
{1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16}
>>> d2 = {x: x*x*100 for x in range(1, 17)}
>>> d2
{1: 100, 2: 400, 3: 900, 4: 1600, 5: 2500, 6: 3600, 7: 4900, 8: 6400, 9: 8100, 10: 10000, 11: 12100, 12: 14400, 13: 16900, 14: 19600, 15: 22500, 16: 25600}
However the library emit different codes:
>>> huffman.codebook(d1.items())
{1: '1111110', 2: '1111111', 3: '111110', 4: '10100', 5: '10101', 6: '11110', 7: '0100', 8: '0101', 9: '1011', 10: '1100', 11: '1101', 12: '1110', 13: '000', 14: '001', 15: '011', 16: '100'}
>>> huffman.codebook(d2.items())
{1: '101101000', 2: '101101001', 3: '10110101', 4: '1011011', 5: '101100', 6: '01010', 7: '01011', 8: '10111', 9: '0100', 10: '1010', 11: '000', 12: '001', 13: '011', 14: '100', 15: '110', 16: '111'}
Sometimes the number of bits varies. E.g. d1[1] is encoded using 7 bits but d2[1] uses 9 bits. So this is not the most optimal code (as claimed on the homepage).
And even when the number of bits match, the values of the huffman code differ:
>>> d1 = {x: x for x in range(1, 5)}
>>> d2 = {x: x*x*1000 for x in range(1, 5)}
>>> huffman.codebook(d1.items())
{1: '110', 2: '111', 3: '10', 4: '0'}
>>> huffman.codebook(d2.items())
{1: '000', 2: '001', 3: '01', 4: '1'}
The same frequencies should lead to the same tree and hence the same huffman codes, however this is not the case with this library.
Metadata
Metadata
Assignees
Labels
No labels