Fix off-by-one in HuffCDIC record iteration for AZW3/MOBI by Imaclean74 · Pull Request #2 · zacharydenton/boko

Imaclean74 · 2026-02-15T06:42:50Z

Fixes #3

Summary

huff_record_count in the MOBI header counts ALL records in the HUFF/CDIC block (1 HUFF + N CDICs). The import loop skips the HUFF record by starting at huff_record_index + 1, but iterates huff_record_count times instead of huff_record_count - 1. This reads one record past the CDIC data, which is typically an image (JPEG) or font record. Since these don't start with CDIC, load_cdic() returns InvalidData("Invalid CDIC header").

Every AZW3/MOBI file using Huffman compression (compression type 0x4448 / 'DH') is affected — all such files fail with Invalid CDIC header on export.

Fix

for i in 0..self.mobi.huff_record_count.saturating_sub(1) in both azw3.rs and mobi.rs.

Concrete example

An AZW3 with huff_record_count=3 and huff_record_index=100:

Record 100: HUFF record (Huffman tables)
Record 101: CDIC record 1
Record 102: CDIC record 2
Record 103: JPEG image  ← bug reads this as CDIC, gets "Invalid CDIC header"

The loop starts at 101 (skipping the HUFF), so it should iterate 3 - 1 = 2 times (records 101–102), not 3 times (101–103).

Verification

Confirmed by four independent implementations of the same format — all subtract 1 from the count:

Implementation	How it handles the count
calibre `huffcdic.py`	`huffs[0]` = HUFF, `huffs[1:]` = CDICs
KindleUnpack `mobi_header.py`	`range(1, huffnum)` — loops 1 to count-1
libmobi `compression.c`	`huff_rec_count - 1` CDICs explicitly
SumatraPDF `MobiDoc.cpp`	`cdicsCount = huffmanRecCount - 1`

Test plan

Unit tests added in huffcdic.rs covering valid HUFF/CDIC loading and rejection of non-CDIC data (the exact failure mode)
Full boko test suite passes (527 tests, 0 failures)
Tested with a real HuffCDIC-compressed AZW3 (huff_record_count=3, 1 HUFF + 2 CDICs) — opens and exports successfully with fix, fails with Invalid CDIC header without

A stripped AZW3 fixture (47KB, no copyrighted content) is available for manual reproduction — attached in the comment below.

huff_record_count includes the HUFF header record itself, but the loop starts at huff_record_index + 1 to skip the header and read only CDIC records. Without subtracting 1, the loop reads one record past the actual CDIC data, which can read unrelated data or cause errors. Use saturating_sub(1) to correctly iterate only the CDIC records.

Tests cover: - Valid HUFF table loading and rejection of bad magic - Valid CDIC dictionary loading - CDIC rejection of JPEG bytes (the exact error produced by the off-by-one bug when it reads past the CDIC records into an image) - HuffCdicReader construction with correct vs poisoned CDIC list

Imaclean74 · 2026-02-15T06:50:36Z

Here's a stripped AZW3 fixture (47KB) for manual reproduction. It was derived from a commercial AZW3 with all text content, images, and metadata removed — only the PDB/MOBI headers, HUFF/CDIC records, and the poison JPEG record (trimmed to 16 bytes) are preserved.

Without the fix, Book::open() succeeds but export() fails when the CDIC loading loop reads the JPEG record at huff_record_index + huff_record_count:

InvalidData("Invalid CDIC header")

Imaclean74 · 2026-02-15T06:55:47Z

huffcdic_stripped.azw3.gz

Imaclean74 added 2 commits February 14, 2026 15:06

Imaclean74 mentioned this pull request Feb 15, 2026

HuffCDIC-compressed AZW3/MOBI files fail with 'Invalid CDIC header' #3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix off-by-one in HuffCDIC record iteration for AZW3/MOBI#2

Fix off-by-one in HuffCDIC record iteration for AZW3/MOBI#2
Imaclean74 wants to merge 2 commits intozacharydenton:masterfrom
Imaclean74:fix/huffcdic-off-by-one

Imaclean74 commented Feb 15, 2026 •

edited

Loading

Uh oh!

Imaclean74 commented Feb 15, 2026

Uh oh!

Imaclean74 commented Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Imaclean74 commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Concrete example

Verification

Test plan

Uh oh!

Imaclean74 commented Feb 15, 2026

Uh oh!

Imaclean74 commented Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Imaclean74 commented Feb 15, 2026 •

edited

Loading