Fix off-by-one in HuffCDIC record iteration for AZW3/MOBI#2
Open
Imaclean74 wants to merge 2 commits intozacharydenton:masterfrom
Open
Fix off-by-one in HuffCDIC record iteration for AZW3/MOBI#2Imaclean74 wants to merge 2 commits intozacharydenton:masterfrom
Imaclean74 wants to merge 2 commits intozacharydenton:masterfrom
Conversation
huff_record_count includes the HUFF header record itself, but the loop starts at huff_record_index + 1 to skip the header and read only CDIC records. Without subtracting 1, the loop reads one record past the actual CDIC data, which can read unrelated data or cause errors. Use saturating_sub(1) to correctly iterate only the CDIC records.
Tests cover: - Valid HUFF table loading and rejection of bad magic - Valid CDIC dictionary loading - CDIC rejection of JPEG bytes (the exact error produced by the off-by-one bug when it reads past the CDIC records into an image) - HuffCdicReader construction with correct vs poisoned CDIC list
Author
|
Here's a stripped AZW3 fixture (47KB) for manual reproduction. It was derived from a commercial AZW3 with all text content, images, and metadata removed — only the PDB/MOBI headers, HUFF/CDIC records, and the poison JPEG record (trimmed to 16 bytes) are preserved. Without the fix, |
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #3
Summary
huff_record_countin the MOBI header counts ALL records in the HUFF/CDIC block (1 HUFF + N CDICs). The import loop skips the HUFF record by starting athuff_record_index + 1, but iterateshuff_record_counttimes instead ofhuff_record_count - 1. This reads one record past the CDIC data, which is typically an image (JPEG) or font record. Since these don't start withCDIC,load_cdic()returnsInvalidData("Invalid CDIC header").Every AZW3/MOBI file using Huffman compression (compression type
0x4448/'DH') is affected — all such files fail withInvalid CDIC headeron export.Fix
for i in 0..self.mobi.huff_record_count.saturating_sub(1)in bothazw3.rsandmobi.rs.Concrete example
An AZW3 with
huff_record_count=3andhuff_record_index=100:The loop starts at 101 (skipping the HUFF), so it should iterate
3 - 1 = 2times (records 101–102), not 3 times (101–103).Verification
Confirmed by four independent implementations of the same format — all subtract 1 from the count:
huffcdic.pyhuffs[0]= HUFF,huffs[1:]= CDICsmobi_header.pyrange(1, huffnum)— loops 1 to count-1compression.chuff_rec_count - 1CDICs explicitlyMobiDoc.cppcdicsCount = huffmanRecCount - 1Test plan
huffcdic.rscovering valid HUFF/CDIC loading and rejection of non-CDIC data (the exact failure mode)Invalid CDIC headerwithoutA stripped AZW3 fixture (47KB, no copyrighted content) is available for manual reproduction — attached in the comment below.