-
Notifications
You must be signed in to change notification settings - Fork 0
3635: Fixes a memory leak issue in the Avro C library under abnormal data scenarios. #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -136,8 +136,10 @@ static int read_bytes(avro_reader_t reader, char **bytes, int64_t * len) | |
| avro_set_error("Cannot allocate buffer for bytes value"); | ||
| return ENOMEM; | ||
| } | ||
| AVRO_READ(reader, *bytes, *len); | ||
|
|
||
| (*bytes)[*len] = '\0'; | ||
| AVRO_SAFE_READ(reader, *bytes, *len, *len+1); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. value:useful; category:bug; feedback:The Gemini AI reviewer is correct! The addition of the extra null byte is problematic! If it is added before the safe read then it may lead to integer overflow when len=i64::MAX or the safe read fails then the appended \0 should be removed too. Adding it after the safe is safer because it does not have these issues. |
||
|
|
||
| return 0; | ||
| } | ||
|
|
||
|
|
@@ -180,21 +182,30 @@ size_bytes(avro_writer_t writer, const char *bytes, const int64_t len) | |
| static int read_string(avro_reader_t reader, char **s, int64_t *len) | ||
| { | ||
| int64_t str_len = 0; | ||
| int64_t max_available = -1; | ||
| int rval; | ||
| check_prefix(rval, read_long(reader, &str_len), | ||
| "Cannot read string length: "); | ||
| if (str_len < 0) { | ||
| avro_set_error("Invalid string length: %" PRId64, str_len); | ||
| return EINVAL; | ||
| } | ||
| // max := r.tail - r.head + 1; if max >= 0 && size > max | ||
| max_available = avro_max_read(reader); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While adding the 🤖 Was this useful? React with 👍 or 👎
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. value:useful; category:bug; feedback:The Augment AI reviewer is correct! The addition of the extra null byte is problematic! If it is added before the safe read then it may lead to integer overflow when len=i64::MAX or the safe read fails then the appended \0 should be removed too. Adding it after the safe is safer because it does not have these issues. |
||
| if (max_available >= 0 && str_len > max_available) { | ||
| avro_set_error("mem io: String length %" PRId64 " is greater than available buffer size %" PRId64, | ||
| str_len, max_available); | ||
| return ERANGE; | ||
| } | ||
|
Comment on lines
+193
to
+199
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix inaccurate comment and misleading error message.
📝 Suggested fix- // max := r.tail - r.head + 1; if max >= 0 && size > max
+ // Bounds check: reject if string length exceeds available bytes
max_available = avro_max_read(reader);
if (max_available >= 0 && str_len > max_available) {
- avro_set_error("mem io: String length %" PRId64 " is greater than available buffer size %" PRId64,
+ avro_set_error("String length %" PRId64 " exceeds available buffer %" PRId64,
str_len, max_available);
return ERANGE;
}🤖 Prompt for AI Agents |
||
|
|
||
| *len = str_len + 1; | ||
| *s = (char *) avro_malloc(*len); | ||
| if (!*s) { | ||
| avro_set_error("Cannot allocate buffer for string value"); | ||
| return ENOMEM; | ||
| } | ||
| (*s)[str_len] = '\0'; | ||
| AVRO_READ(reader, *s, str_len); | ||
| AVRO_SAFE_READ(reader, *s, str_len, *len); | ||
| return 0; | ||
| } | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -274,6 +274,20 @@ int avro_read(avro_reader_t reader, void *buf, int64_t len) | |
| return EINVAL; | ||
| } | ||
|
|
||
|
|
||
| int64_t avro_max_read(avro_reader_t reader) | ||
| { | ||
| if (is_memory_io(reader)) { | ||
| struct _avro_reader_memory_t *mem_reader = avro_reader_to_memory(reader); | ||
| return mem_reader->len - mem_reader->read; | ||
| } else if (is_file_io(reader)) { | ||
| struct _avro_reader_file_t *file_reader = avro_reader_to_file(reader); | ||
| return bytes_available(file_reader); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The implementation of return -1;There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In Other Locations
🤖 Was this useful? React with 👍 or 👎
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. value:useful; category:bug; feedback:The Augment AI reviewer is correct! The new functionality is not the same for memory based reads and file-based ones. For inmemory reads it returns the complete length of the bytes which could be read, while for files it returns only the free bytes in the buffer, not the bytes remaining to be read from the file itself. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. File reader incorrectly rejects valid string readsHigh Severity The Additional Locations (1)
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. value:useful; category:bug; feedback:The Bugbot AI reviewer is correct! The new functionality is not the same for memory based reads and file-based ones. For inmemory reads it returns the complete length of the bytes which could be read, while for files it returns only the free bytes in the buffer, not the bytes remaining to be read from the file itself. |
||
| } | ||
| return -1; | ||
| } | ||
|
Comment on lines
+277
to
+288
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: # Search for all usages of avro_max_read
rg -n "avro_max_read" --type=cRepository: martin-augment/avro Length of output: 268 🏁 Script executed: # Check the implementation of bytes_available
rg -n -B5 -A10 "bytes_available" lang/c/src/io.cRepository: martin-augment/avro Length of output: 2994 🏁 Script executed: # Look at read_string implementation in encoding_binary.c
rg -n -B3 -A15 "read_string" lang/c/src/encoding_binary.cRepository: martin-augment/avro Length of output: 1265 🏁 Script executed: # Check file reader structure and implementation
rg -n "_avro_reader_file_t" --type=cRepository: martin-augment/avro Length of output: 902 🏁 Script executed: # Check file reader initialization and buffer state
rg -n -B2 -A8 "avro_reader_new_file" lang/c/src/io.cRepository: martin-augment/avro Length of output: 45 🏁 Script executed: # Verify the file reader buffer initialization and typical usage patterns
rg -n -B5 -A10 "reader->buffer" lang/c/src/io.c | head -60Repository: martin-augment/avro Length of output: 1627 File reader returns only buffered bytes, not total remaining bytes. For file-based readers,
When Note: The error message in 🤖 Prompt for AI Agents
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct! The new functionality is not the same for memory based reads and file-based ones. For inmemory reads it returns the complete length of the bytes which could be read, while for files it returns only the free bytes in the buffer, not the bytes remaining to be read from the file itself. |
||
|
|
||
|
|
||
| static int avro_skip_memory(struct _avro_reader_memory_t *reader, int64_t len) | ||
| { | ||
| if (len > 0) { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AVRO_SAFE_READ macro introduces complex logic, including conditional memory deallocation and an early return. While effective for safety, macros with control flow can sometimes lead to unexpected behavior, make debugging more challenging, and reduce readability compared to a static inline function. A function would provide better type checking and scope management, reducing the risk of misuse.
Consider encapsulating this logic in a static inline function for improved maintainability and robustness, if the project's coding standards allow for it.