3646: AVRO-4228: [c++] Fix BinaryDecoder::arrayNext() to handle negative block counts by martin-augment · Pull Request #33 · martin-augment/avro

martin-augment · 2026-02-09T06:29:55Z

3646: To review by AI

…unts

gemini-code-assist · 2026-02-09T06:30:17Z

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug in the C++ Avro BinaryDecoder that prevented the correct processing of arrays encoded with negative block counts, a feature defined in the Avro specification. The change involves a targeted fix in the arrayNext() method to use the appropriate decoding logic, complemented by the addition of a comprehensive unit test. This enhancement improves the robustness and adherence of the Avro C++ library to the Avro data serialization standard.

Highlights

BinaryDecoder Fix: The BinaryDecoder::arrayNext() method in the C++ Avro library has been updated to correctly handle negative block counts by calling doDecodeItemCount() instead of doDecodeLong(). This ensures compliance with the Avro specification for array encoding.
New Test Case: A new test case, testArrayNegativeBlockCount(), has been added to CodecTests.cc. This test specifically validates the decoder's ability to correctly parse arrays that include negative block counts and their associated byte-size prefixes, ensuring the fix works as expected.

Changelog

lang/c++/impl/BinaryDecoder.cc
- Modified BinaryDecoder::arrayNext() to invoke doDecodeItemCount() for accurate array item count retrieval, resolving issues with negative block counts.
lang/c++/test/CodecTests.cc
- Introduced testArrayNegativeBlockCount() to verify the BinaryDecoder's handling of Avro arrays with negative block counts and byte-size information.
- Integrated the new test case into the existing test suite.

Activity

The pull request was created by martin-augment.
The pull request is currently awaiting review.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-02-09T06:30:20Z

Walkthrough

This change modifies the arrayNext() implementation in BinaryDecoder.cc to decode item counts using doDecodeItemCount() instead of directly decoding and casting a raw long value. The new approach aligns with the semantics used in arrayStart() and mapStart(), introducing handling for negative indicator values. A corresponding test function testArrayNegativeBlockCount is added to verify the behavior of decoding Avro arrays that contain negative block counts followed by additional encoded data.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-3646-2026-02-09-06-29-51

No actionable comments were generated in the recent review. 🎉

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-02-09T06:31:24Z

Code Review: AVRO-4228 - Fix BinaryDecoder::arrayNext() to handle negative block counts

Summary

This PR fixes a critical bug where arrayNext() was not properly handling negative block counts as specified by the Avro binary encoding specification. The fix aligns arrayNext() with the already-correct arrayStart() implementation.

Code Quality ✅

Excellent fix! The change is:

Minimal and focused: Single line change that directly addresses the root cause
Consistent: Now matches the implementation pattern used by arrayStart(), mapStart(), and mapNext()
Well-tested: Includes a comprehensive test case with clear documentation

Bug Analysis 🐛

The Problem:
The old implementation did not handle negative block counts. It incorrectly:

Returned the raw zigzag-decoded long value (which could be negative)
Failed to consume the byte-size when a negative block count was encountered
Cast the potentially negative value to size_t, causing undefined behavior

The Fix:
The new implementation correctly delegates to doDecodeItemCount() which:

Decodes the count
If negative, consumes the following byte-size via doDecodeLong()
Returns abs(count) as the number of items

Test Coverage ✅

The test case testArrayNegativeBlockCount() is excellent:

Strengths:

Clear inline documentation explaining the encoding
Tests the specific scenario: array with mixed negative and positive block counts
Verifies both block boundaries and item values
Uses realistic data that could occur in production

Suggestions for Enhancement (optional, not blocking):

Consider adding an edge case test with only negative blocks
Could test a single-block array with negative count
Might add a test that verifies the decoder position after reading

Performance ✅

No performance concerns. The change:

Removes a premature cast and delegates to an existing helper
The helper doDecodeItemCount() is already used by arrayStart(), mapStart(), and mapNext()
May slightly improve performance by avoiding undefined behavior

Security ✅

Security improvement! This fix addresses:

Undefined Behavior: Casting negative values to size_t is undefined behavior in C++
Stream Desynchronization: Not consuming the byte-size would cause the decoder to misinterpret subsequent data
Potential Buffer Issues: A large negative number cast to size_t could become a huge positive number

Consistency Check 🔍

Verified that the pattern is now consistent across all similar methods:

✅ arrayStart() uses doDecodeItemCount()
✅ arrayNext() now uses doDecodeItemCount()
✅ mapStart() uses doDecodeItemCount()
✅ mapNext() uses doDecodeItemCount()

Recommendations

Merge this PR ✅ - This is a critical correctness fix
Consider backporting - If there are stable branches, this bug fix should be backported
Documentation - The Avro specification compliance is improved; no doc changes needed
Follow-up (optional) - Add a few more edge case tests as suggested above

Final Verdict: LGTM 🚀

This is a textbook example of a good bug fix: minimal change, proper understanding of the specification, comprehensive test coverage, and improved correctness. The PR is ready to merge.

Review generated by Claude Code

augmentcode · 2026-02-09T06:31:53Z

🤖 Augment PR Summary

Summary: Fixes C++ binary decoding of array blocks when block counts are negative (Avro’s “byte-sized block” encoding).

Changes:

Update BinaryDecoder::arrayNext() to use the shared doDecodeItemCount() logic, ensuring negative block counts consume the following block byte-size long.
Add a unit test that constructs a binary-encoded array using a negative block count and verifies values are decoded correctly.
Register the new test in the codec test suite.

Technical Notes: Negative counts per spec indicate abs(count) items follow and a block byte-size long is present in the stream; failing to consume that byte-size desynchronizes decoding.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 2 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-02-09T06:31:55Z

lang/c++/test/CodecTests.cc

+    // Per the Avro spec, a negative count means: abs(count) items follow,
+    // preceded by a long byte-size of the block.
+    //
+    // Block 1: count=-3, bytesize=3, items: 10, 20, 30


This test puts the negative block count in the first block (handled by arrayStart()), so arrayNext() never encounters a negative count; as written, it likely would have passed before this PR too. Consider adding a later block with a negative count so the test actually exercises the arrayNext() fix.

Severity: medium

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:useful; category:bug; feedback: The Augment AI reviewer is correct! The new test passes even without the fix, so it does not prevent regressions at all. The test should be improved to use a negative count for the non-first block(s) too.

augmentcode · 2026-02-09T06:31:55Z

lang/c++/impl/BinaryDecoder.cc


 size_t BinaryDecoder::arrayNext() {
-    return static_cast<size_t>(doDecodeLong());
+    return doDecodeItemCount();


doDecodeItemCount() negates result when it’s negative; if the encoded count is INT64_MIN, -result overflows (undefined behavior). Since arrayNext() now routes through this path, consider explicitly guarding that value and throwing instead of overflowing.

Severity: medium

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:delightful; category:security; feedback: The Augment AI reviewer is correct! If the count value is size_t's min then the negation will lead to an overflow and even more issues because of this. Prevents a vulnerability if an attacker crafts an Avro file with such min count.

gemini-code-assist

Code Review

This pull request correctly fixes an issue in BinaryDecoder::arrayNext() to handle negative block counts as specified by the Avro specification. This addresses a bug where negative block counts were not being handled, which could lead to incorrect parsing and potential resource exhaustion. While the fix reuses existing logic and adds a new test case, a critical vulnerability remains in doDecodeItemCount(): negating LONG_MIN results in undefined behavior and a potential denial of service via a near-infinite loop. There is also a suggestion to make the new test case more robust and concise.

gemini-code-assist · 2026-02-09T06:33:07Z

lang/c++/impl/BinaryDecoder.cc


 size_t BinaryDecoder::arrayNext() {
-    return static_cast<size_t>(doDecodeLong());
+    return doDecodeItemCount();


The call to doDecodeItemCount() on line 173 introduces a potential undefined behavior and denial of service. The doDecodeItemCount() function (defined on line 163) contains an expression -result on line 167 that overflows when result is LONG_MIN (-9223372036854775808). An attacker can trigger this by providing a specially crafted Avro varint (e.g., 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0x01). In a 2's complement environment, this typically results in a very large positive value being returned, which can cause the caller to enter a near-infinite loop, leading to a denial of service. It is recommended to handle the LONG_MIN case in doDecodeItemCount() or use a safer negation like static_cast<size_t>(-(result + 1)) + 1.

value:delightful; category:security; feedback: The Gemini AI reviewer is correct! If the count value is size_t's min then the negation will lead to an overflow and even more issues because of this. Prevents a vulnerability if an attacker crafts an Avro file with such min count.

gemini-code-assist · 2026-02-09T06:33:07Z

lang/c++/test/CodecTests.cc

+    BOOST_CHECK_EQUAL(result.size(), 5u);
+    BOOST_CHECK_EQUAL(result[0], 10);
+    BOOST_CHECK_EQUAL(result[1], 20);
+    BOOST_CHECK_EQUAL(result[2], 30);
+    BOOST_CHECK_EQUAL(result[3], 40);
+    BOOST_CHECK_EQUAL(result[4], 50);


This series of checks can be made more robust and concise. If the initial size check fails, the subsequent element access result[i] could lead to out-of-bounds access and a test crash, because BOOST_CHECK does not abort the test. Using BOOST_CHECK_EQUAL_COLLECTIONS is a better approach as it is idiomatic in this file, checks for size equality internally, and compares all elements in a single statement. This makes the test safer and more readable.

const std::vector<int32_t> expected = {10, 20, 30, 40, 50}; BOOST_CHECK_EQUAL_COLLECTIONS(result.begin(), result.end(), expected.begin(), expected.end());

value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! Using BOOST_CHECK_EQUAL_COLLECTIONS will provide better error messages when asserting the values in a collection, like in this test.

Gabriel Feyer added 2 commits February 8, 2026 12:29

AVRO-4228: Fix BinaryDecoder::arrayNext() to handle negative block co…

8be6e9c

…unts

AVRO-4228: Add test for arrayNext() with negative block counts

dfe6815

github-actions bot added the C++ label Feb 9, 2026

augmentcode bot reviewed Feb 9, 2026

View reviewed changes

gemini-code-assist bot reviewed Feb 9, 2026

View reviewed changes

Conversation

martin-augment commented Feb 9, 2026

Uh oh!

gemini-code-assist bot commented Feb 9, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

claude bot commented Feb 9, 2026

Code Review: AVRO-4228 - Fix BinaryDecoder::arrayNext() to handle negative block counts

Summary

Code Quality ✅

Bug Analysis 🐛

Test Coverage ✅

Performance ✅

Security ✅

Consistency Check 🔍

Recommendations

Final Verdict: LGTM 🚀

Uh oh!

augmentcode bot commented Feb 9, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Feb 9, 2026 •

edited

Loading