3527: Avro-4193: [c] segfault cannot handle empty byte array#15
3527: Avro-4193: [c] segfault cannot handle empty byte array#15martin-augment wants to merge 2 commits intomainfrom
Conversation
When avro-c tries to encode an empty byte array it will avro_malloc(0) which on some architectures will return NULL. Make sure this is not interpreted as an error or or dereferenced causing a segfault.
WalkthroughModified error handling for zero-size memory allocations across three C files. In Changes
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
| { | ||
| char *bytes_copy = (char *) avro_malloc(size); | ||
| if (!bytes_copy) { | ||
| if (!bytes_copy && size) { |
There was a problem hiding this comment.
Allowing avro_malloc(0) to return NULL means the subsequent memcpy may be invoked with NULL pointers when size == 0, which can be undefined behavior on some platforms. Please ensure the copy isn’t performed when size is zero (related issue also in avro_bytes_set).
🤖 React with 👍 or 👎 to let us know if the comment was useful.
There was a problem hiding this comment.
value:useful; category:bug; feedback:The AI reviewer is correct that now if the bytes_copy is NULL and size is 0 the if check won't be true and memcpy() will be called with a NULL argument as a destination. The size should be checked for 0 as a first step in this function and return early.
| int rval; | ||
| char *bytes_copy = (char *) avro_malloc(size); | ||
| if (!bytes_copy) { | ||
| if (!bytes_copy && size) { |
There was a problem hiding this comment.
As in avro_bytes, the later memcpy can receive NULL pointers when size == 0 due to the relaxed allocation check, which may be undefined behavior; avoid invoking the copy in the zero-length case. (Related to the earlier occurrence in this file)
🤖 React with 👍 or 👎 to let us know if the comment was useful.
There was a problem hiding this comment.
value:useful; category:bug; feedback:The AI reviewer is correct that now if the bytes_copy is NULL and size is 0 the if check won't be true and memcpy() will be called with a NULL argument as a destination. The size should be checked for 0 as a first step in this function and return early.
|
|
||
| static int test_empty_bytes(void) | ||
| { | ||
| char bytes[] = { }; |
There was a problem hiding this comment.
char bytes[] = { }; is not valid standard C and may not compile across compilers; zero-length arrays and empty initializer lists are non-portable. This can break the build on stricter C compilers.
🤖 React with 👍 or 👎 to let us know if the comment was useful.
There was a problem hiding this comment.
value:useful; category:bug; feedback:The array initialiser is not supported by all C compilers. It is better to use either char *bytes = NULL; or char bytes[0]; instead.
| avro_datum_decref(datum); | ||
| avro_datum_decref(expected_datum); | ||
|
|
||
| avro_schema_decref(writer_schema); |
There was a problem hiding this comment.
writer_schema is decref’d at line 230 and again here, which will underflow the refcount and likely cause a double free. This is a correctness issue that can crash the test.
🤖 React with 👍 or 👎 to let us know if the comment was useful.
There was a problem hiding this comment.
value:useful; category:bug; feedback:prevented a double decrement of the reference count of a schema. This won't lead to double free but it is still a bug.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
lang/c/src/value-json.c (1)
231-250: Inconsistent NULL handling for AVRO_FIXED.Line 244 passes
encodeddirectly tojson_stringn_nocheckwithout the NULL guard that Line 129 uses for AVRO_BYTES. Sinceencode_utf8_bytescan return NULL for zero-length input, a zero-length FIXED value would pass NULL tojson_stringn_nocheck.Apply this diff to add consistent NULL handling:
- json_t *result = json_stringn_nocheck((const char *) encoded, encoded_size); + json_t *result = json_stringn_nocheck((const char *) encoded ? encoded : "", encoded_size);lang/c/src/datum.c (2)
156-170: Undefined behavior: memcpy with NULL destination.When
sizeis 0 andbytes_copyis NULL (line 159), thememcpyat line 163 invokes undefined behavior. Per C11 §7.24.1/2, memcpy requires non-NULL pointers even when size is 0.Apply this diff to guard the memcpy:
char *bytes_copy = (char *) avro_malloc(size); if (!bytes_copy && size) { avro_set_error("Cannot copy bytes content"); return NULL; } - memcpy(bytes_copy, bytes, size); + if (size > 0) { + memcpy(bytes_copy, bytes, size); + }
196-210: Undefined behavior: memcpy with NULL destination.Same issue as
avro_bytes: whensizeis 0 andbytes_copyis NULL (line 200), thememcpyat line 204 invokes undefined behavior.Apply this diff:
char *bytes_copy = (char *) avro_malloc(size); if (!bytes_copy && size) { avro_set_error("Cannot copy bytes content"); return ENOMEM; } - memcpy(bytes_copy, bytes, size); + if (size > 0) { + memcpy(bytes_copy, bytes, size); + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
lang/c/src/datum.c(2 hunks)lang/c/src/value-json.c(2 hunks)lang/c/tests/test_avro_data.c(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
lang/c/tests/test_avro_data.c (3)
lang/c/src/schema.c (2)
avro_schema_bytes(245-253)avro_schema_decref(225-233)lang/c/src/datum.c (3)
avro_givebytes(172-176)avro_datum_decref(1245-1250)avro_givebytes_set(212-216)lang/c/src/datum_equal.c (1)
avro_datum_equal(125-192)
🔇 Additional comments (6)
lang/c/src/value-json.c (2)
65-65: LGTM! Correct handling of zero-size allocations.The condition properly allows NULL result from
avro_mallocwhenutf8_lenis zero, since malloc(0) behavior is implementation-defined in C.
129-129: Good defensive guard for zero-length bytes.The ternary operator prevents passing NULL to
json_stringn_nocheckwhen the encoded pointer is NULL (zero-length case).lang/c/src/datum.c (2)
159-159: Correct zero-size allocation handling.The condition properly allows NULL from
avro_mallocwhen size is zero, addressing the implementation-defined behavior of malloc(0).
200-200: Correct zero-size allocation handling.Consistent with line 159, this properly allows NULL from
avro_mallocwhen size is zero.lang/c/tests/test_avro_data.c (2)
42-56: LGTM! Correct NULL guard for zero-size deallocations.The added
if (ptr)check properly handles the case where a zero-size allocation returns NULL, preventing dereferencing NULL pointer in the size validation logic.
690-690: LGTM! Proper test registration.The new test is correctly registered in the test suite array.
| static int test_empty_bytes(void) | ||
| { | ||
| char bytes[] = { }; | ||
| avro_schema_t writer_schema = avro_schema_bytes(); | ||
| avro_datum_t datum; | ||
| avro_datum_t expected_datum; | ||
|
|
||
| datum = avro_givebytes(bytes, sizeof(bytes), NULL); | ||
| write_read_check(writer_schema, datum, NULL, NULL, "bytes"); | ||
| test_json(datum, "\"\""); | ||
| avro_datum_decref(datum); | ||
| avro_schema_decref(writer_schema); | ||
|
|
||
| datum = avro_givebytes(NULL, 0, NULL); | ||
| avro_givebytes_set(datum, bytes, sizeof(bytes), NULL); | ||
| expected_datum = avro_givebytes(bytes, sizeof(bytes), NULL); | ||
| if (!avro_datum_equal(datum, expected_datum)) { | ||
| fprintf(stderr, | ||
| "Expected equal bytes instances.\n"); | ||
| exit(EXIT_FAILURE); | ||
| } | ||
| avro_datum_decref(datum); | ||
| avro_datum_decref(expected_datum); | ||
|
|
||
| avro_schema_decref(writer_schema); | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
Good test coverage for empty bytes, but double schema decref.
The test properly exercises empty byte array handling including JSON encoding. However, there's a double decref of writer_schema at lines 230 and 243.
Apply this diff to remove the duplicate decref:
write_read_check(writer_schema, datum, NULL, NULL, "bytes");
test_json(datum, "\"\"");
avro_datum_decref(datum);
- avro_schema_decref(writer_schema);
datum = avro_givebytes(NULL, 0, NULL);
avro_givebytes_set(datum, bytes, sizeof(bytes), NULL);Note: The existing test_bytes function (lines 184-217) has the same issue with double decref at lines 195 and 215.
🤖 Prompt for AI Agents
In lang/c/tests/test_avro_data.c around lines 219-245, the test_empty_bytes
function calls avro_schema_decref(writer_schema) twice (once after the first
datum block and again before return); remove the second/deeper duplicate decref
so writer_schema is decref'd only once. Also apply the same fix to the earlier
test_bytes function (around lines 184-217) where
avro_schema_decref(writer_schema) is duplicated (remove the later duplicate) to
ensure each schema is decref'd exactly once.
There was a problem hiding this comment.
value:useful; category:bug; feedback:prevented a double decrement of the reference count of a schema. This won't lead to double free but it is still a bug.
There was a problem hiding this comment.
@martin-augment, thank you for confirming! I appreciate the clarification that while it wouldn't cause a double free, preventing the incorrect reference count decrement is still valuable for maintaining proper memory management semantics in the codebase. Glad I could help catch this before it was merged! 🐰
(\(\
( -.-)
o_(")(")
[Happy rabbit - glad to help!]
Code Review: AVRO-4193 - Support for Empty Byte ArraysSummaryThis PR addresses a critical bug where ✅ Strengths
🐛 Potential IssuesCritical: Undefined Behavior in
|
|
Findings:
Everything else in the patch looks reasonable once that is addressed. |
Code Review: Avro-4193 - Fix segfault for empty byte arraysSummaryThis PR fixes a segfault issue when handling empty byte arrays (size = 0) in the Avro C library. The changes address the case where Code Quality and Best Practices✅ Strengths
|
value:useful; category:bug; feedback:The AI reviewer is correct that now if the bytes_copy is NULL and size is 0 the if check won't be true and memcpy() will be called with a NULL argument as a destination. The size should be checked for 0 as a first step in this function and return early. |
value:useful; category:bug; feedback:The AI reviewer is correct that now if the bytes_copy is NULL and size is 0 the if check won't be true and memcpy() will be called with a NULL argument as a destination. The size should be checked for 0 as a first step in this function and return early. |
value:useful; category:bug; feedback:prevented a double decrement of the reference count of a schema. This won't lead to double free but it is still a bug. https://github.com/apache/avro/blob/c39d0ac3276c55edc60615784f947293369ce624/lang/c/src/schema.c#L228 |
value:useful; category:bug; feedback:The array initialiser is not supported by all C compilers. It is better to use either char *bytes = NULL; or char bytes[0]; instead. |
value:useful; category:bug; feedback:prevented a double decrement of the reference count of a schema. This won't lead to double free but it is still a bug. https://github.com/apache/avro/blob/c39d0ac3276c55edc60615784f947293369ce624/lang/c/src/schema.c#L228 |
3527: To review by AI