Skip to content

3623: [C] Fix negative block size validation in datafile reader#26

Open
martin-augment wants to merge 1 commit intomainfrom
pr-3623-2026-01-13-20-05-49
Open

3623: [C] Fix negative block size validation in datafile reader#26
martin-augment wants to merge 1 commit intomainfrom
pr-3623-2026-01-13-20-05-49

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

3623: To review by AI

The file_read_block_count() function in datafile.c reads block size
using zigzag encoding, which can produce negative numbers from
malicious Avro container files. These negative values were passed
directly to avro_malloc(), causing allocation failures.

This patch adds validation to reject negative block size values with
a clear error message before attempting memory allocation.

Bug: Negative block size from varint decoding causes
     allocation-size-too-big when cast to size_t

Impact: DoS via crafted .avro file

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions github-actions bot added the C label Jan 13, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 13, 2026

Walkthrough

The change adds a validation check in lang/c/src/datafile.c within the file_read_block_count function to detect negative block lengths. When the decoded block length is determined to be negative, the code sets an "Invalid block size" error and returns EINVAL. This validation occurs immediately after the block size is read and before any memory operations such as resizing or allocation of block data are performed, preventing negative values from affecting subsequent processing steps.

✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2a739f8 and 68e67d5.

📒 Files selected for processing (1)
  • lang/c/src/datafile.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: claude-review
🔇 Additional comments (1)
lang/c/src/datafile.c (1)

454-457: Good security fix for the negative block size validation.

The placement is correct—immediately after decoding the block size and before any memory operations. This prevents negative values (from crafted zigzag-encoded varints) from being cast to size_t and causing allocation failures or excessive memory allocation attempts.

The error message with PRId64 provides useful debugging context for malformed files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Jan 13, 2026

🤖 Augment PR Summary

Summary: Prevents the C datafile reader from accepting negative Avro block sizes, avoiding invalid allocations/reads when files are corrupt.

Changes:

  • Added a negative-size check in file_read_block_count that sets an error and returns EINVAL when the decoded block size is < 0.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

@claude
Copy link
Copy Markdown

claude bot commented Jan 13, 2026

PR Review: Fix negative block size validation in datafile reader

Summary

This PR adds a critical security fix to validate that block sizes read from Avro datafiles are non-negative before attempting memory allocation. The fix prevents potential undefined behavior and security issues from maliciously crafted files.


Code Quality ✅

Positive aspects:

  • Clean, focused change that addresses a specific security issue
  • Follows existing error handling patterns in the codebase using avro_set_error() and check_prefix macro
  • Uses proper format specifier PRId64 for portable int64_t printing
  • Placed in the optimal location (after reading, before allocation)
  • Returns appropriate error code EINVAL consistent with validation failures

Style consistency:

  • Formatting and indentation match the existing codebase
  • Error message format is consistent with other validation errors in the file

Security Analysis 🔒

Critical security fix:
This addresses a high-severity vulnerability where negative block sizes could lead to:

  1. Integer overflow when casting to size_t: The variable len is int64_t but gets passed to avro_malloc(len) and avro_realloc(..., len) which expect size_t (unsigned). A negative int64_t value would wrap to a very large positive size_t value on line 463 and 460.

  2. Excessive memory allocation: After the wrap-around, the allocator could attempt to allocate gigabytes of memory, leading to:

    • Denial of service
    • Out-of-memory crashes
    • Potential heap corruption
  3. Bypass of existing checks: The check on line 467 (if (len > 0)) would not catch negative values that already caused allocation issues.

Attack vector:
A maliciously crafted Avro datafile with a negative block size value could exploit this before the fix.


Potential Issues & Recommendations

1. Missing NULL check after allocation ⚠️

Issue: Lines 460 and 463 perform allocations but don't verify the result before using the pointer.

if (r->current_blockdata && len > r->current_blocklen) {
    r->current_blockdata = (char *) avro_realloc(r->current_blockdata, r->current_blocklen, len);
    r->current_blocklen = len;  // Used without NULL check
} else if (!r->current_blockdata) {
    r->current_blockdata = (char *) avro_malloc(len);
    r->current_blocklen = len;  // Used without NULL check
}

if (len > 0) {
    check_prefix(rval, avro_read(r->reader, r->current_blockdata, len), ...);
    // Dereferences r->current_blockdata without NULL check

Recommendation:
Add NULL checks after allocation attempts:

if (r->current_blockdata && len > r->current_blocklen) {
    char *new_data = (char *) avro_realloc(r->current_blockdata, r->current_blocklen, len);
    if (!new_data) {
        avro_set_error("Cannot allocate memory for file block");
        return ENOMEM;
    }
    r->current_blockdata = new_data;
    r->current_blocklen = len;
} else if (!r->current_blockdata) {
    r->current_blockdata = (char *) avro_malloc(len);
    if (!r->current_blockdata) {
        avro_set_error("Cannot allocate memory for file block");
        return ENOMEM;
    }
    r->current_blocklen = len;
}

Note: This is a pre-existing issue, not introduced by this PR, but worth addressing.

2. Maximum block size validation ⚠️

Issue: While negative sizes are now rejected, there's no upper bound check. An attacker could still provide an extremely large positive value (e.g., INT64_MAX) causing similar DoS issues.

Recommendation:
Consider adding a maximum size check:

if (len < 0) {
    avro_set_error("Invalid block size: %" PRId64, len);
    return EINVAL;
}
if (len > MAX_BLOCK_SIZE) {  // Define reasonable limit
    avro_set_error("Block size too large: %" PRId64, len);
    return EINVAL;
}

Check how other Avro implementations handle this (Java, Python, etc.).

3. len == 0 edge case ℹ️

Observation: When len == 0:

  • No allocation occurs (correct)
  • The if (len > 0) check on line 467 prevents reading/decoding (correct)
  • But avro_reader_memory_set_source() is called on line 475 with potentially stale data

Question: Is a zero-length block valid per the Avro spec? If not, should it also be rejected?


Test Coverage ❓

Missing:

  • No test case appears to be included with this PR
  • Regression testing for negative block sizes is essential

Recommendations:

  1. Add a test with a maliciously crafted file containing a negative block size
  2. Verify the error message and return code
  3. Ensure no memory corruption or leaks occur
  4. Consider fuzzing tests for various edge cases

Example test structure:

// Create a malformed avro file with negative block size
// Attempt to read it
// Verify EINVAL is returned
// Verify error message contains "Invalid block size"

Performance Considerations ✅

Impact: Negligible

  • Single integer comparison added to the read path
  • No performance degradation expected

Conclusion

Overall assessment: This is a critical security fix that should be merged, but with considerations for follow-up improvements.

Recommendation:

  • ✅ Approve for merge (addresses critical vulnerability)
  • 📝 File follow-up issues for:
    1. Missing NULL checks after allocations (separate PR)
    2. Maximum block size validation (separate PR)
    3. Test coverage for this fix (can be same PR or follow-up)

Priority: High - this prevents potential security exploits from maliciously crafted files.

Great work identifying and fixing this vulnerability! 🎉

@martin-augment
Copy link
Copy Markdown
Owner Author

1. Missing NULL check after allocation ⚠️

Issue: Lines 460 and 463 perform allocations but don't verify the result before using the pointer.

if (r->current_blockdata && len > r->current_blocklen) {
    r->current_blockdata = (char *) avro_realloc(r->current_blockdata, r->current_blocklen, len);
    r->current_blocklen = len;  // Used without NULL check
} else if (!r->current_blockdata) {
    r->current_blockdata = (char *) avro_malloc(len);
    r->current_blocklen = len;  // Used without NULL check
}

if (len > 0) {
    check_prefix(rval, avro_read(r->reader, r->current_blockdata, len), ...);
    // Dereferences r->current_blockdata without NULL check

Recommendation: Add NULL checks after allocation attempts:

if (r->current_blockdata && len > r->current_blocklen) {
    char *new_data = (char *) avro_realloc(r->current_blockdata, r->current_blocklen, len);
    if (!new_data) {
        avro_set_error("Cannot allocate memory for file block");
        return ENOMEM;
    }
    r->current_blockdata = new_data;
    r->current_blocklen = len;
} else if (!r->current_blockdata) {
    r->current_blockdata = (char *) avro_malloc(len);
    if (!r->current_blockdata) {
        avro_set_error("Cannot allocate memory for file block");
        return ENOMEM;
    }
    r->current_blocklen = len;
}

Note: This is a pre-existing issue, not introduced by this PR, but worth addressing.

value:useful; category:bug; feedback:The Claude AI reviewer is correct! This is an old but a valid issue that has to be fixed! The result of malloc()/realloc() should be checked for NULL before being used. Prevents null pointer usage

@martin-augment
Copy link
Copy Markdown
Owner Author

2. Maximum block size validation ⚠️

Issue: While negative sizes are now rejected, there's no upper bound check. An attacker could still provide an extremely large positive value (e.g., INT64_MAX) causing similar DoS issues.

Recommendation: Consider adding a maximum size check:

if (len < 0) {
    avro_set_error("Invalid block size: %" PRId64, len);
    return EINVAL;
}
if (len > MAX_BLOCK_SIZE) {  // Define reasonable limit
    avro_set_error("Block size too large: %" PRId64, len);
    return EINVAL;
}

Check how other Avro implementations handle this (Java, Python, etc.).

value:useful; category:bug; feedback:The Claude AI reviewer is correct! An attacker could craft an ,avro file with a really big long value, like i64::MAX, and cause an out of memory error on the machine that reads this file. Prevents an attack with a perfectly valid Avro file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants