Skip to content

Fix off-by-one in HuffCDIC record iteration for AZW3/MOBI#2

Open
Imaclean74 wants to merge 2 commits intozacharydenton:masterfrom
Imaclean74:fix/huffcdic-off-by-one
Open

Fix off-by-one in HuffCDIC record iteration for AZW3/MOBI#2
Imaclean74 wants to merge 2 commits intozacharydenton:masterfrom
Imaclean74:fix/huffcdic-off-by-one

Conversation

@Imaclean74
Copy link

@Imaclean74 Imaclean74 commented Feb 15, 2026

Fixes #3

Summary

huff_record_count in the MOBI header counts ALL records in the HUFF/CDIC block (1 HUFF + N CDICs). The import loop skips the HUFF record by starting at huff_record_index + 1, but iterates huff_record_count times instead of huff_record_count - 1. This reads one record past the CDIC data, which is typically an image (JPEG) or font record. Since these don't start with CDIC, load_cdic() returns InvalidData("Invalid CDIC header").

Every AZW3/MOBI file using Huffman compression (compression type 0x4448 / 'DH') is affected — all such files fail with Invalid CDIC header on export.

Fix

for i in 0..self.mobi.huff_record_count.saturating_sub(1) in both azw3.rs and mobi.rs.

Concrete example

An AZW3 with huff_record_count=3 and huff_record_index=100:

Record 100: HUFF record (Huffman tables)
Record 101: CDIC record 1
Record 102: CDIC record 2
Record 103: JPEG image  ← bug reads this as CDIC, gets "Invalid CDIC header"

The loop starts at 101 (skipping the HUFF), so it should iterate 3 - 1 = 2 times (records 101–102), not 3 times (101–103).

Verification

Confirmed by four independent implementations of the same format — all subtract 1 from the count:

Implementation How it handles the count
calibre huffcdic.py huffs[0] = HUFF, huffs[1:] = CDICs
KindleUnpack mobi_header.py range(1, huffnum) — loops 1 to count-1
libmobi compression.c huff_rec_count - 1 CDICs explicitly
SumatraPDF MobiDoc.cpp cdicsCount = huffmanRecCount - 1

Test plan

  • Unit tests added in huffcdic.rs covering valid HUFF/CDIC loading and rejection of non-CDIC data (the exact failure mode)
  • Full boko test suite passes (527 tests, 0 failures)
  • Tested with a real HuffCDIC-compressed AZW3 (huff_record_count=3, 1 HUFF + 2 CDICs) — opens and exports successfully with fix, fails with Invalid CDIC header without

A stripped AZW3 fixture (47KB, no copyrighted content) is available for manual reproduction — attached in the comment below.

huff_record_count includes the HUFF header record itself, but the loop
starts at huff_record_index + 1 to skip the header and read only CDIC
records. Without subtracting 1, the loop reads one record past the
actual CDIC data, which can read unrelated data or cause errors.

Use saturating_sub(1) to correctly iterate only the CDIC records.
Tests cover:
- Valid HUFF table loading and rejection of bad magic
- Valid CDIC dictionary loading
- CDIC rejection of JPEG bytes (the exact error produced by the
  off-by-one bug when it reads past the CDIC records into an image)
- HuffCdicReader construction with correct vs poisoned CDIC list
@Imaclean74
Copy link
Author

Here's a stripped AZW3 fixture (47KB) for manual reproduction. It was derived from a commercial AZW3 with all text content, images, and metadata removed — only the PDB/MOBI headers, HUFF/CDIC records, and the poison JPEG record (trimmed to 16 bytes) are preserved.

Without the fix, Book::open() succeeds but export() fails when the CDIC loading loop reads the JPEG record at huff_record_index + huff_record_count:

InvalidData("Invalid CDIC header")

@Imaclean74
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HuffCDIC-compressed AZW3/MOBI files fail with 'Invalid CDIC header'

1 participant