Conversation
…cessarily bloats c-decompressor
4f2fe1b to
c52f165
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 51 out of 53 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| index = self._bit_reader.read(self.window_bits) | ||
|
|
||
| string = self._window_buffer.get(index, match_size) | ||
|
|
||
| # Write up to end of buffer (no wrap) | ||
| remaining = self._window_buffer.size - self._window_buffer.pos | ||
| window_write = min(match_size, remaining) | ||
| self._window_buffer.write_bytes(string[:window_write]) |
There was a problem hiding this comment.
Extended-match tokens are specified as “no wrap-around” (source must not cross the window boundary). The implementation currently uses _window_buffer.get(index, match_size), which wraps modulo the window size, so a malformed stream could silently read across the boundary instead of failing. Consider validating index + match_size <= window_size for extended matches and raising an error if it would wrap, aligning behavior with the C implementation’s bounds checks.
| print(f"{file_path.name}: {compressed_size:,} (**{ratio:.3f}**)") | ||
|
|
||
| avg = sum(ratios) / len(ratios) | ||
| print(f"Average Ratio: {avg}") |
There was a problem hiding this comment.
avg = sum(ratios) / len(ratios) will raise ZeroDivisionError if all files are empty/missing or otherwise skipped. Consider guarding for len(ratios) == 0 and exiting with a clearer message. Also, the per-file print includes Markdown formatting (**...**), which is surprising for a CLI script unless the output is intended to be pasted into Markdown.
| rle_count = self._bit_reader.read_huffman() | ||
| rle_count <<= _LEADING_RLE_HUFFMAN_BITS | ||
| rle_count += self._bit_reader.read(_LEADING_RLE_HUFFMAN_BITS) | ||
| rle_count += 1 + 1 |
There was a problem hiding this comment.
In the extended RLE path, read_huffman() can return the _FLUSH sentinel on malformed/corrupted input, which would raise a TypeError when shifting/adding. Consider explicitly rejecting _FLUSH here (e.g., raise ValueError / EOFError) before doing bit operations, so invalid streams fail deterministically and with a clearer error.
| match_size = self._bit_reader.read_huffman() | ||
| match_size <<= _LEADING_EXTENDED_MATCH_HUFFMAN_BITS | ||
| match_size += self._bit_reader.read(_LEADING_EXTENDED_MATCH_HUFFMAN_BITS) | ||
| match_size += self.min_pattern_size + 11 + 1 |
There was a problem hiding this comment.
In the extended-match path, the secondary read_huffman() used for the size payload can also return _FLUSH on malformed input, which would break the arithmetic and produce a confusing exception. Add an explicit check to disallow _FLUSH (and potentially validate the decoded size range) before computing match_size.
| decompresses.append(viper_decompress) | ||
| Decompressors.append(NativeDecompressor) | ||
| decompresses.append(native_decompress) | ||
| except ImportError: |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
No description provided.