Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions lang/c/src/avro/io.h
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ avro_reader_memory_set_source(avro_reader_t reader, const char *buf, int64_t len
void
avro_writer_memory_set_dest(avro_writer_t writer, const char *buf, int64_t len);

int64_t avro_max_read(avro_reader_t reader);
int avro_read(avro_reader_t reader, void *buf, int64_t len);
int avro_skip(avro_reader_t reader, int64_t len);
int avro_write(avro_writer_t writer, void *buf, int64_t len);
Expand Down
3 changes: 3 additions & 0 deletions lang/c/src/encoding.h
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,9 @@ typedef struct avro_encoding_t avro_encoding_t;
#define AVRO_SKIP(reader, len) \
{ int rval = avro_skip( reader, len); if (rval) return rval; }

#define AVRO_SAFE_READ(reader, buf, len, mem_size) \
{ int rval = avro_read( reader, buf, len ); if(rval) { if(buf) avro_free(buf, mem_size); buf = NULL; return rval; } }
Comment on lines +102 to +103
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The AVRO_SAFE_READ macro introduces complex logic, including conditional memory deallocation and an early return. While effective for safety, macros with control flow can sometimes lead to unexpected behavior, make debugging more challenging, and reduce readability compared to a static inline function. A function would provide better type checking and scope management, reducing the risk of misuse.

Consider encapsulating this logic in a static inline function for improved maintainability and robustness, if the project's coding standards allow for it.


extern const avro_encoding_t avro_binary_encoding; /* in
* encoding_binary
*/
Expand Down
15 changes: 13 additions & 2 deletions lang/c/src/encoding_binary.c
Original file line number Diff line number Diff line change
Expand Up @@ -136,8 +136,10 @@ static int read_bytes(avro_reader_t reader, char **bytes, int64_t * len)
avro_set_error("Cannot allocate buffer for bytes value");
return ENOMEM;
}
AVRO_READ(reader, *bytes, *len);

(*bytes)[*len] = '\0';
AVRO_SAFE_READ(reader, *bytes, *len, *len+1);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

The read_bytes function is missing the avro_max_read check that was added to read_string. This allows an attacker to trigger large memory allocations by providing a large length in the binary data, leading to Denial of Service (OOM). Additionally, the expression *len + 1 used in the allocation (line 134) and the AVRO_SAFE_READ macro (line 141) is susceptible to integer overflow. On 32-bit systems, if *len is INT64_MAX, *len + 1 wraps to 0, leading to a small allocation followed by an out-of-bounds write at line 140.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Gemini AI reviewer is correct! The addition of the extra null byte is problematic! If it is added before the safe read then it may lead to integer overflow when len=i64::MAX or the safe read fails then the appended \0 should be removed too. Adding it after the safe is safer because it does not have these issues.


return 0;
}

Expand Down Expand Up @@ -180,21 +182,30 @@ size_bytes(avro_writer_t writer, const char *bytes, const int64_t len)
static int read_string(avro_reader_t reader, char **s, int64_t *len)
{
int64_t str_len = 0;
int64_t max_available = -1;
int rval;
check_prefix(rval, read_long(reader, &str_len),
"Cannot read string length: ");
if (str_len < 0) {
avro_set_error("Invalid string length: %" PRId64, str_len);
return EINVAL;
}
// max := r.tail - r.head + 1; if max >= 0 && size > max
max_available = avro_max_read(reader);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While adding the max_available guard, consider also guarding against overflow in the str_len + 1 size computation/allocation. With a very large str_len from corrupted input, signed overflow or a wrap when converting to size_t could lead to an undersized allocation and an out-of-bounds read/write.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Augment AI reviewer is correct! The addition of the extra null byte is problematic! If it is added before the safe read then it may lead to integer overflow when len=i64::MAX or the safe read fails then the appended \0 should be removed too. Adding it after the safe is safer because it does not have these issues.

if (max_available >= 0 && str_len > max_available) {
avro_set_error("mem io: String length %" PRId64 " is greater than available buffer size %" PRId64,
str_len, max_available);
return ERANGE;
}
Comment on lines +193 to +199
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix inaccurate comment and misleading error message.

  1. Line 193: The comment formula max := r.tail - r.head + 1 doesn't match the implementation (len - read for memory, end - cur for file—no +1).
  2. Line 196: The error message prefix "mem io:" is misleading since this check applies to all reader types (memory and file).
📝 Suggested fix
-    //  max := r.tail - r.head + 1; if max >= 0 && size > max
+	// Bounds check: reject if string length exceeds available bytes
 	max_available = avro_max_read(reader);
 	if (max_available >= 0 && str_len > max_available) {
-	    avro_set_error("mem io: String length %" PRId64 " is greater than available buffer size %" PRId64,
+		avro_set_error("String length %" PRId64 " exceeds available buffer %" PRId64,
 				str_len, max_available);
 		return ERANGE;
 	}
🤖 Prompt for AI Agents
In `@lang/c/src/encoding_binary.c` around lines 193 - 199, The comment above the
max_available check is inaccurate and the error message wrongly prefixes "mem
io:" even though the check is used for all reader types; update the comment to
reflect the actual computation used by avro_max_read (e.g., "max := len - read
for memory / end - cur for file" without "+1") and change the avro_set_error
call in the block using max_available, str_len and avro_max_read to use a
generic prefix such as "io:" or "reader:" (remove "mem io:") so the message
correctly reflects both memory and file readers.


*len = str_len + 1;
*s = (char *) avro_malloc(*len);
if (!*s) {
avro_set_error("Cannot allocate buffer for string value");
return ENOMEM;
}
(*s)[str_len] = '\0';
AVRO_READ(reader, *s, str_len);
AVRO_SAFE_READ(reader, *s, str_len, *len);
return 0;
}

Expand Down
14 changes: 14 additions & 0 deletions lang/c/src/io.c
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,20 @@ int avro_read(avro_reader_t reader, void *buf, int64_t len)
return EINVAL;
}


int64_t avro_max_read(avro_reader_t reader)
{
if (is_memory_io(reader)) {
struct _avro_reader_memory_t *mem_reader = avro_reader_to_memory(reader);
return mem_reader->len - mem_reader->read;
} else if (is_file_io(reader)) {
struct _avro_reader_file_t *file_reader = avro_reader_to_file(reader);
return bytes_available(file_reader);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The implementation of avro_max_read for file-based readers is incorrect. It returns the number of bytes currently in the internal buffer (reader->end - reader->cur), which is at most 4096 bytes. This causes read_string to incorrectly reject any string longer than 4KB with an ERANGE error, even if the file contains the full string. For file readers where the total size is not readily available in the buffer, this should return -1 to bypass the length check and avoid a denial of service for legitimate data.

return -1;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In avro_max_read(), the file-IO branch returns bytes_available(file_reader), which only reflects the current internal buffer (and can be 0/undefined before the first fill). Callers like read_string() can then incorrectly reject valid reads when the requested size exceeds the currently buffered bytes.

Other Locations
  • lang/c/src/encoding_binary.c:194

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Augment AI reviewer is correct! The new functionality is not the same for memory based reads and file-based ones. For inmemory reads it returns the complete length of the bytes which could be read, while for files it returns only the free bytes in the buffer, not the bytes remaining to be read from the file itself.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File reader incorrectly rejects valid string reads

High Severity

The avro_max_read function for file I/O returns bytes_available(file_reader), which only reflects bytes currently buffered (up to 4096), not total remaining file data. When read_string uses this to validate string length, it incorrectly rejects valid strings from files when str_len exceeds the current buffer contents, even though avro_read_file can read more by refilling from the file. For file readers, the function should return -1 to skip the validation check.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Bugbot AI reviewer is correct! The new functionality is not the same for memory based reads and file-based ones. For inmemory reads it returns the complete length of the bytes which could be read, while for files it returns only the free bytes in the buffer, not the bytes remaining to be read from the file itself.

}
return -1;
}
Comment on lines +277 to +288
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Search for all usages of avro_max_read
rg -n "avro_max_read" --type=c

Repository: martin-augment/avro

Length of output: 268


🏁 Script executed:

# Check the implementation of bytes_available
rg -n -B5 -A10 "bytes_available" lang/c/src/io.c

Repository: martin-augment/avro

Length of output: 2994


🏁 Script executed:

# Look at read_string implementation in encoding_binary.c
rg -n -B3 -A15 "read_string" lang/c/src/encoding_binary.c

Repository: martin-augment/avro

Length of output: 1265


🏁 Script executed:

# Check file reader structure and implementation
rg -n "_avro_reader_file_t" --type=c

Repository: martin-augment/avro

Length of output: 902


🏁 Script executed:

# Check file reader initialization and buffer state
rg -n -B2 -A8 "avro_reader_new_file" lang/c/src/io.c

Repository: martin-augment/avro

Length of output: 45


🏁 Script executed:

# Verify the file reader buffer initialization and typical usage patterns
rg -n -B5 -A10 "reader->buffer" lang/c/src/io.c | head -60

Repository: martin-augment/avro

Length of output: 1627


File reader returns only buffered bytes, not total remaining bytes.

For file-based readers, bytes_available(file_reader) returns only the bytes currently in the internal buffer (reader->end - reader->cur), not the remaining bytes in the file. This creates asymmetric behavior:

  • Memory readers: returns total remaining bytes (mem_reader->len - mem_reader->read)
  • File readers: returns only buffered bytes (potentially zero if buffer is empty)

When read_string in encoding_binary.c (line 194-197) uses this for bounds checking, it incorrectly rejects valid reads from file-based readers when the requested string length exceeds the current buffer size, even though avro_read_file has logic to refill the buffer from the file and fulfill such reads.

Note: The error message in read_string says "mem io", indicating the check was intended for memory-based IO only, but it applies equally to file-based readers.

🤖 Prompt for AI Agents
In `@lang/c/src/io.c` around lines 277 - 288, avro_max_read currently returns only
buffered bytes for file readers (via bytes_available), causing read_string
bounds checks to fail for valid reads; change the file branch in avro_max_read
(is_file_io / avro_reader_to_file) to compute total remaining bytes in the
underlying file instead of just the buffer by using the file's size and current
absolute read position (e.g., use fseeko/ftello or equivalent on
file_reader->file to get file_size and current_offset and return file_size -
current_offset + bytes_available(file_reader)), so callers like read_string in
encoding_binary.c get the true remaining byte count.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct! The new functionality is not the same for memory based reads and file-based ones. For inmemory reads it returns the complete length of the bytes which could be read, while for files it returns only the free bytes in the buffer, not the bytes remaining to be read from the file itself.



static int avro_skip_memory(struct _avro_reader_memory_t *reader, int64_t len)
{
if (len > 0) {
Expand Down