Skip to content

3646: AVRO-4228: [c++] Fix BinaryDecoder::arrayNext() to handle negative block counts#33

Open
martin-augment wants to merge 2 commits intomainfrom
pr-3646-2026-02-09-06-29-51
Open

3646: AVRO-4228: [c++] Fix BinaryDecoder::arrayNext() to handle negative block counts#33
martin-augment wants to merge 2 commits intomainfrom
pr-3646-2026-02-09-06-29-51

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

3646: To review by AI

@github-actions github-actions bot added the C++ label Feb 9, 2026
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug in the C++ Avro BinaryDecoder that prevented the correct processing of arrays encoded with negative block counts, a feature defined in the Avro specification. The change involves a targeted fix in the arrayNext() method to use the appropriate decoding logic, complemented by the addition of a comprehensive unit test. This enhancement improves the robustness and adherence of the Avro C++ library to the Avro data serialization standard.

Highlights

  • BinaryDecoder Fix: The BinaryDecoder::arrayNext() method in the C++ Avro library has been updated to correctly handle negative block counts by calling doDecodeItemCount() instead of doDecodeLong(). This ensures compliance with the Avro specification for array encoding.
  • New Test Case: A new test case, testArrayNegativeBlockCount(), has been added to CodecTests.cc. This test specifically validates the decoder's ability to correctly parse arrays that include negative block counts and their associated byte-size prefixes, ensuring the fix works as expected.
Changelog
  • lang/c++/impl/BinaryDecoder.cc
    • Modified BinaryDecoder::arrayNext() to invoke doDecodeItemCount() for accurate array item count retrieval, resolving issues with negative block counts.
  • lang/c++/test/CodecTests.cc
    • Introduced testArrayNegativeBlockCount() to verify the BinaryDecoder's handling of Avro arrays with negative block counts and byte-size information.
    • Integrated the new test case into the existing test suite.
Activity
  • The pull request was created by martin-augment.
  • The pull request is currently awaiting review.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 9, 2026

Walkthrough

This change modifies the arrayNext() implementation in BinaryDecoder.cc to decode item counts using doDecodeItemCount() instead of directly decoding and casting a raw long value. The new approach aligns with the semantics used in arrayStart() and mapStart(), introducing handling for negative indicator values. A corresponding test function testArrayNegativeBlockCount is added to verify the behavior of decoding Avro arrays that contain negative block counts followed by additional encoded data.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-3646-2026-02-09-06-29-51

No actionable comments were generated in the recent review. 🎉


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude bot commented Feb 9, 2026

Code Review: AVRO-4228 - Fix BinaryDecoder::arrayNext() to handle negative block counts

Summary

This PR fixes a critical bug where arrayNext() was not properly handling negative block counts as specified by the Avro binary encoding specification. The fix aligns arrayNext() with the already-correct arrayStart() implementation.

Code Quality ✅

Excellent fix! The change is:

  • Minimal and focused: Single line change that directly addresses the root cause
  • Consistent: Now matches the implementation pattern used by arrayStart(), mapStart(), and mapNext()
  • Well-tested: Includes a comprehensive test case with clear documentation

Bug Analysis 🐛

The Problem:
The old implementation did not handle negative block counts. It incorrectly:

  1. Returned the raw zigzag-decoded long value (which could be negative)
  2. Failed to consume the byte-size when a negative block count was encountered
  3. Cast the potentially negative value to size_t, causing undefined behavior

The Fix:
The new implementation correctly delegates to doDecodeItemCount() which:

  1. Decodes the count
  2. If negative, consumes the following byte-size via doDecodeLong()
  3. Returns abs(count) as the number of items

Test Coverage ✅

The test case testArrayNegativeBlockCount() is excellent:

Strengths:

  • Clear inline documentation explaining the encoding
  • Tests the specific scenario: array with mixed negative and positive block counts
  • Verifies both block boundaries and item values
  • Uses realistic data that could occur in production

Suggestions for Enhancement (optional, not blocking):

  1. Consider adding an edge case test with only negative blocks
  2. Could test a single-block array with negative count
  3. Might add a test that verifies the decoder position after reading

Performance ✅

No performance concerns. The change:

  • Removes a premature cast and delegates to an existing helper
  • The helper doDecodeItemCount() is already used by arrayStart(), mapStart(), and mapNext()
  • May slightly improve performance by avoiding undefined behavior

Security ✅

Security improvement! This fix addresses:

  1. Undefined Behavior: Casting negative values to size_t is undefined behavior in C++
  2. Stream Desynchronization: Not consuming the byte-size would cause the decoder to misinterpret subsequent data
  3. Potential Buffer Issues: A large negative number cast to size_t could become a huge positive number

Consistency Check 🔍

Verified that the pattern is now consistent across all similar methods:

  • ✅ arrayStart() uses doDecodeItemCount()
  • ✅ arrayNext() now uses doDecodeItemCount()
  • ✅ mapStart() uses doDecodeItemCount()
  • ✅ mapNext() uses doDecodeItemCount()

Recommendations

  1. Merge this PR ✅ - This is a critical correctness fix
  2. Consider backporting - If there are stable branches, this bug fix should be backported
  3. Documentation - The Avro specification compliance is improved; no doc changes needed
  4. Follow-up (optional) - Add a few more edge case tests as suggested above

Final Verdict: LGTM 🚀

This is a textbook example of a good bug fix: minimal change, proper understanding of the specification, comprehensive test coverage, and improved correctness. The PR is ready to merge.


Review generated by Claude Code

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Feb 9, 2026

🤖 Augment PR Summary

Summary: Fixes C++ binary decoding of array blocks when block counts are negative (Avro’s “byte-sized block” encoding).

Changes:

  • Update BinaryDecoder::arrayNext() to use the shared doDecodeItemCount() logic, ensuring negative block counts consume the following block byte-size long.
  • Add a unit test that constructs a binary-encoded array using a negative block count and verifies values are decoded correctly.
  • Register the new test in the codec test suite.

Technical Notes: Negative counts per spec indicate abs(count) items follow and a block byte-size long is present in the stream; failing to consume that byte-size desynchronizes decoding.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

// Per the Avro spec, a negative count means: abs(count) items follow,
// preceded by a long byte-size of the block.
//
// Block 1: count=-3, bytesize=3, items: 10, 20, 30
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test puts the negative block count in the first block (handled by arrayStart()), so arrayNext() never encounters a negative count; as written, it likely would have passed before this PR too. Consider adding a later block with a negative count so the test actually exercises the arrayNext() fix.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Augment AI reviewer is correct! The new test passes even without the fix, so it does not prevent regressions at all. The test should be improved to use a negative count for the non-first block(s) too.


size_t BinaryDecoder::arrayNext() {
return static_cast<size_t>(doDecodeLong());
return doDecodeItemCount();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doDecodeItemCount() negates result when it’s negative; if the encoded count is INT64_MIN, -result overflows (undefined behavior). Since arrayNext() now routes through this path, consider explicitly guarding that value and throwing instead of overflowing.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:delightful; category:security; feedback: The Augment AI reviewer is correct! If the count value is size_t's min then the negation will lead to an overflow and even more issues because of this. Prevents a vulnerability if an attacker crafts an Avro file with such min count.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes an issue in BinaryDecoder::arrayNext() to handle negative block counts as specified by the Avro specification. This addresses a bug where negative block counts were not being handled, which could lead to incorrect parsing and potential resource exhaustion. While the fix reuses existing logic and adds a new test case, a critical vulnerability remains in doDecodeItemCount(): negating LONG_MIN results in undefined behavior and a potential denial of service via a near-infinite loop. There is also a suggestion to make the new test case more robust and concise.


size_t BinaryDecoder::arrayNext() {
return static_cast<size_t>(doDecodeLong());
return doDecodeItemCount();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The call to doDecodeItemCount() on line 173 introduces a potential undefined behavior and denial of service. The doDecodeItemCount() function (defined on line 163) contains an expression -result on line 167 that overflows when result is LONG_MIN (-9223372036854775808). An attacker can trigger this by providing a specially crafted Avro varint (e.g., 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0x01). In a 2's complement environment, this typically results in a very large positive value being returned, which can cause the caller to enter a near-infinite loop, leading to a denial of service. It is recommended to handle the LONG_MIN case in doDecodeItemCount() or use a safer negation like static_cast<size_t>(-(result + 1)) + 1.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:delightful; category:security; feedback: The Gemini AI reviewer is correct! If the count value is size_t's min then the negation will lead to an overflow and even more issues because of this. Prevents a vulnerability if an attacker crafts an Avro file with such min count.

Comment on lines +2131 to +2136
BOOST_CHECK_EQUAL(result.size(), 5u);
BOOST_CHECK_EQUAL(result[0], 10);
BOOST_CHECK_EQUAL(result[1], 20);
BOOST_CHECK_EQUAL(result[2], 30);
BOOST_CHECK_EQUAL(result[3], 40);
BOOST_CHECK_EQUAL(result[4], 50);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This series of checks can be made more robust and concise. If the initial size check fails, the subsequent element access result[i] could lead to out-of-bounds access and a test crash, because BOOST_CHECK does not abort the test. Using BOOST_CHECK_EQUAL_COLLECTIONS is a better approach as it is idiomatic in this file, checks for size equality internally, and compares all elements in a single statement. This makes the test safer and more readable.

    const std::vector<int32_t> expected = {10, 20, 30, 40, 50};
    BOOST_CHECK_EQUAL_COLLECTIONS(result.begin(), result.end(), expected.begin(), expected.end());

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! Using BOOST_CHECK_EQUAL_COLLECTIONS will provide better error messages when asserting the values in a collection, like in this test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant