-
Notifications
You must be signed in to change notification settings - Fork 39
Add raw LZ4 block format support for cross-language compatibility #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Add lz4_compress_raw() and lz4_uncompress_raw() functions to enable byte-exact compatibility with Python lz4.block, Rust lz4_flex, and Go pierrec/lz4. Changes: - Add php_lz4_compress_raw() static function (bypasses 4-byte size header) - Add php_lz4_uncompress_raw() static function (requires max_size parameter) - Add ZEND_FUNCTION wrappers for both new functions - Add argument info structures for parameter validation - Register new functions in lz4_functions array API: - string lz4_compress_raw(string $data, int $level = 0) Compresses data with NO size header (raw LZ4 block) - string lz4_uncompress_raw(string $data, int $max_size) Decompresses raw LZ4 block (max_size required) Tests: - tests/raw_001.phpt - Basic roundtrip functionality - tests/raw_002.phpt - Python compatibility test vectors - tests/raw_003.phpt - Error handling validation - tests/raw_004.phpt - Compression level testing Validation: - All 4 new tests pass - All 14 existing tests pass (backward compatibility maintained) - Bidirectional cross-language compatibility verified: * Python → PHP decompression: 4/4 test vectors pass * PHP → Python decompression: 4/4 test vectors pass Resolves incompatibility with ByteStorage envelope format used by CacheKit Python/Rust implementations.
WalkthroughThe PR adds raw (headerless) LZ4 block compression and decompression capabilities to the PHP extension. Two new public functions, Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant lz4_compress_raw
participant php_lz4_compress_raw
participant LZ4 Lib
participant User2
participant lz4_uncompress_raw
participant php_lz4_uncompress_raw
participant LZ4 Lib2
User->>lz4_compress_raw: (data, level)
lz4_compress_raw->>php_lz4_compress_raw: data, level
php_lz4_compress_raw->>LZ4 Lib: LZ4_compressBound
LZ4 Lib-->>php_lz4_compress_raw: max_len
php_lz4_compress_raw->>php_lz4_compress_raw: allocate buffer
alt level <= 0
php_lz4_compress_raw->>LZ4 Lib: LZ4_compress_default
else
php_lz4_compress_raw->>LZ4 Lib: LZ4_compress_HC
end
LZ4 Lib-->>php_lz4_compress_raw: compressed_len or error
php_lz4_compress_raw-->>lz4_compress_raw: raw compressed data
lz4_compress_raw-->>User: PHP string (compressed)
User2->>lz4_uncompress_raw: (data, max_size)
lz4_uncompress_raw->>php_lz4_uncompress_raw: data, max_size
php_lz4_uncompress_raw->>php_lz4_uncompress_raw: validate max_size > 0
php_lz4_uncompress_raw->>php_lz4_uncompress_raw: allocate buffer (max_size + 1)
php_lz4_uncompress_raw->>LZ4 Lib2: LZ4_decompress_safe
LZ4 Lib2-->>php_lz4_uncompress_raw: decompressed_len or error
php_lz4_uncompress_raw-->>lz4_uncompress_raw: raw decompressed data
lz4_uncompress_raw-->>User2: PHP string (decompressed)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lz4.c (1)
728-767: Consider validating max_size range before castingThe wrapper correctly implements the raw decompression API following existing patterns. However, line 754 casts
zend_longtoconst intwithout validating thatmax_sizefits withinINT_MAX. On 64-bit systems, if a user providesmax_size > INT_MAX, silent truncation could occur.Note: This is consistent with the existing
lz4_uncompressfunction (line 609), so it's a pre-existing pattern. Consider adding validation for both functions:+ if (max_size > INT_MAX) { + zend_error(E_WARNING, + "lz4_uncompress_raw : max_size exceeds maximum allowed value"); + RETURN_FALSE; + } + /* Call internal decompression function */ if (php_lz4_uncompress_raw(Z_STRVAL_P(data), Z_STRLEN_P(data), (const int)max_size,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
lz4.c(5 hunks)tests/raw_001.phpt(1 hunks)tests/raw_002.phpt(1 hunks)tests/raw_003.phpt(1 hunks)tests/raw_004.phpt(1 hunks)
🔇 Additional comments (8)
lz4.c (4)
77-78: LGTM: Function declarations and registrationThe new raw API functions are properly declared and registered following the extension's existing patterns. The argument info correctly specifies required/optional parameters for each function.
Also applies to: 103-111, 123-124
302-349: LGTM: Raw compression implementationThe implementation correctly produces headerless LZ4 blocks by directly writing compressed data without the 4-byte size prefix. The level validation, error handling, and memory management are all consistent with the existing
php_lz4_compressfunction.
351-387: LGTM: Raw decompression implementation with one noteThe implementation correctly handles raw LZ4 blocks requiring an explicit
max_sizeparameter. Error handling and memory management follow existing patterns.The
max_size + 1allocation (line 367) matches the pattern inphp_lz4_uncompressand provides space for null termination.
691-726: LGTM: Public compress wrapperThe PHP-visible wrapper correctly parses parameters, validates input types, and manages memory. The implementation follows the existing extension patterns with proper version compatibility handling for PHP 5 vs 7+.
tests/raw_001.phpt (1)
1-26: LGTM: Comprehensive basic functionality testThe test correctly verifies:
- Basic compression/decompression roundtrip
- Exact byte-level output (hex validation) for cross-language compatibility
- Proper handling of the original data length requirement in
lz4_uncompress_rawtests/raw_002.phpt (1)
1-33: LGTM: Excellent cross-language compatibility validationThe test vectors validate byte-exact output against Python's
lz4.blockimplementation, which is essential for the stated goal of cross-language interoperability. The test includes varied data patterns (literals and repeated sequences) to exercise different compression paths.tests/raw_003.phpt (1)
1-43: LGTM: Thorough error handling coverageThe test comprehensively validates error handling for:
- Invalid
max_sizevalues (zero and negative)- Corrupted compressed data
- Incorrect
max_sizeparameter (too small for decompressed data)All error paths in
php_lz4_uncompress_raware exercised, ensuring robust validation.tests/raw_004.phpt (1)
1-46: LGTM: Complete compression level validationThe test validates:
- Default fast compression (level 0)
- High compression mode (levels 1 and 9)
- Roundtrip integrity across all compression levels
- Invalid level rejection with appropriate error message
The use of
--EXPECTF--with%d bytespatterns allows for variable compression ratios while still validating that compression occurred.
Add LZ4 Block Format Specification Compliance
Problem
The current
lz4_compress()andlz4_uncompress()functions violate the official LZ4 Block Format specification by embedding a 4-byte size header in the compressed output.From the LZ4 Block Format spec (Section: Metadata):
This deviation breaks interoperability with all standard LZ4 Block implementations:
lz4.block(https://pypi.org/project/lz4/)lz4_flex(https://crates.io/crates/lz4_flex)pierrec/lz4(https://github.com/pierrec/lz4)LZ4_compress_default()(raw output)Solution
Add two new functions that produce specification-compliant LZ4 blocks:
Technical Details
Current behavior (spec violation):
New raw functions (spec-compliant):
The size header approach was chosen for API convenience but creates a proprietary format incompatible with the LZ4 ecosystem.
Cross-Language Compatibility
Test Results:
Example (PHP ↔ Python):
Use Case: Multi-Language Cache Systems
Enables ByteStorage pattern for shared caches:
Compatible with Python, Rust, Go, Node.js - any language with standard LZ4 block support.
Testing
New tests: 4 PHPT tests (100% pass)
raw_001.phpt- Basic roundtripraw_002.phpt- Python compatibility (test vectors)raw_003.phpt- Error handlingraw_004.phpt- Compression levelsRegression tests: All 14 existing tests pass (100%)
Cross-language validation: 8/8 test vectors pass
Backward Compatibility
Zero breaking changes
Performance
Same as existing functions (both call
LZ4_compress_default()directly), but:Why Not Use Frame Format?
The LZ4 Frame Format (magic
0x184D2204) is designed for self-contained files/streams with embedded metadata. It adds:For cache systems and databases, Block format is preferred:
References
Implementation
lz4.c(179 lines added)This PR restores LZ4 specification compliance while maintaining full backward compatibility.
Summary by CodeRabbit
Release Notes