Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion lang/c++/impl/BinaryDecoder.cc
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ size_t BinaryDecoder::doDecodeItemCount() {
}

size_t BinaryDecoder::arrayNext() {
return static_cast<size_t>(doDecodeLong());
return doDecodeItemCount();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doDecodeItemCount() negates result when it’s negative; if the encoded count is INT64_MIN, -result overflows (undefined behavior). Since arrayNext() now routes through this path, consider explicitly guarding that value and throwing instead of overflowing.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:delightful; category:security; feedback: The Augment AI reviewer is correct! If the count value is size_t's min then the negation will lead to an overflow and even more issues because of this. Prevents a vulnerability if an attacker crafts an Avro file with such min count.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The call to doDecodeItemCount() on line 173 introduces a potential undefined behavior and denial of service. The doDecodeItemCount() function (defined on line 163) contains an expression -result on line 167 that overflows when result is LONG_MIN (-9223372036854775808). An attacker can trigger this by providing a specially crafted Avro varint (e.g., 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0x01). In a 2's complement environment, this typically results in a very large positive value being returned, which can cause the caller to enter a near-infinite loop, leading to a denial of service. It is recommended to handle the LONG_MIN case in doDecodeItemCount() or use a safer negation like static_cast<size_t>(-(result + 1)) + 1.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:delightful; category:security; feedback: The Gemini AI reviewer is correct! If the count value is size_t's min then the negation will lead to an overflow and even more issues because of this. Prevents a vulnerability if an attacker crafts an Avro file with such min count.

}

size_t BinaryDecoder::skipArray() {
Expand Down
37 changes: 37 additions & 0 deletions lang/c++/test/CodecTests.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2100,6 +2100,42 @@ static void testJsonCodecReinit() {
}
}

static void testArrayNegativeBlockCount() {
// Array of ints [10, 20, 30, 40, 50] encoded with a negative block count.
// Per the Avro spec, a negative count means: abs(count) items follow,
// preceded by a long byte-size of the block.
//
// Block 1: count=-3, bytesize=3, items: 10, 20, 30
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test puts the negative block count in the first block (handled by arrayStart()), so arrayNext() never encounters a negative count; as written, it likely would have passed before this PR too. Consider adding a later block with a negative count so the test actually exercises the arrayNext() fix.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Augment AI reviewer is correct! The new test passes even without the fix, so it does not prevent regressions at all. The test should be improved to use a negative count for the non-first block(s) too.

// Block 2: count=2, items: 40, 50
// Terminal: count=0
const uint8_t data[] = {
0x05, // zigzag(-3) = 5
0x06, // zigzag(3) = 6 (byte-size of block)
0x14, 0x28, 0x3c, // zigzag ints: 10, 20, 30
0x04, // zigzag(2) = 4
0x50, 0x64, // zigzag ints: 40, 50
0x00 // terminal
};

InputStreamPtr is = memoryInputStream(data, sizeof(data));
DecoderPtr d = binaryDecoder();
d->init(*is);

std::vector<int32_t> result;
for (size_t n = d->arrayStart(); n != 0; n = d->arrayNext()) {
for (size_t i = 0; i < n; ++i) {
result.push_back(d->decodeInt());
}
}

BOOST_CHECK_EQUAL(result.size(), 5u);
BOOST_CHECK_EQUAL(result[0], 10);
BOOST_CHECK_EQUAL(result[1], 20);
BOOST_CHECK_EQUAL(result[2], 30);
BOOST_CHECK_EQUAL(result[3], 40);
BOOST_CHECK_EQUAL(result[4], 50);
Comment on lines +2131 to +2136
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This series of checks can be made more robust and concise. If the initial size check fails, the subsequent element access result[i] could lead to out-of-bounds access and a test crash, because BOOST_CHECK does not abort the test. Using BOOST_CHECK_EQUAL_COLLECTIONS is a better approach as it is idiomatic in this file, checks for size equality internally, and compares all elements in a single statement. This makes the test safer and more readable.

    const std::vector<int32_t> expected = {10, 20, 30, 40, 50};
    BOOST_CHECK_EQUAL_COLLECTIONS(result.begin(), result.end(), expected.begin(), expected.end());

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! Using BOOST_CHECK_EQUAL_COLLECTIONS will provide better error messages when asserting the values in a collection, like in this test.

}

static void testByteCount() {
OutputStreamPtr os1 = memoryOutputStream();
EncoderPtr e1 = binaryEncoder();
Expand All @@ -2125,6 +2161,7 @@ init_unit_test_suite(int, char *[]) {
ts->add(BOOST_PARAM_TEST_CASE(&avro::testJson, avro::jsonData,
ENDOF(avro::jsonData)));
ts->add(BOOST_TEST_CASE(avro::testJsonCodecReinit));
ts->add(BOOST_TEST_CASE(avro::testArrayNegativeBlockCount));
ts->add(BOOST_TEST_CASE(avro::testByteCount));

return ts;
Expand Down