Protocol encoding/decoding and utility unit tests (Phase 2.5)#5488
Protocol encoding/decoding and utility unit tests (Phase 2.5)#5488renecannao merged 3 commits intov3.0-5473from
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the project's unit testing coverage by introducing a new suite of tests for critical protocol and utility functions. It ensures the robust and correct operation of MySQL length-encoded integer handling, verifies the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds a new TAP unit-test binary and source validating protocol utilities: MySQL length-encoded integer encode/decode/roundtrip, packet/header helpers, byte-copy helpers, query digests, string escaping, and wildcard matching. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive suite of unit tests for protocol encoding/decoding and various utility functions. The tests are well-structured and cover a wide range of cases, including boundaries and edge conditions. My review identified a few potential issues in the underlying functions being tested, including a portability problem due to unaligned memory access and a confusing memory management contract in another function. I've provided specific feedback and suggestions to address these points for improved correctness and maintainability.
| unsigned int val = CPY3(buf); | ||
| ok(val == 0x030201, | ||
| "CPY3: little-endian 3-byte copy correct"); |
There was a problem hiding this comment.
This test calls CPY3, which has an implementation in lib/MySQL_encode.cpp that can lead to undefined behavior. The use of *(uint32_t *)ptr on an unsigned char * can cause unaligned memory access, which may result in crashes or performance issues on certain architectures (e.g., ARM).
To ensure portability and correctness, it's highly recommended to refactor CPY3 to use a safe method like memcpy or manual byte-wise construction. For example:
// A safe implementation for CPY3
unsigned int CPY3(unsigned char *ptr) {
return (unsigned int)(ptr[0]) |
((unsigned int)(ptr[1]) << 8) |
((unsigned int)(ptr[2]) << 16);
}Since the implementation is not in this PR, this comment serves as a note that the test may be unreliable across different platforms.
| memset(buf + 1, 0, 8); | ||
| buf[1] = 0x01; |
There was a problem hiding this comment.
The buffer initialization here is slightly confusing. memset zeroes out 8 bytes starting from buf[1], and the next line immediately overwrites buf[1]. While the logic is correct, it could be made clearer to improve readability.
| memset(buf + 1, 0, 8); | |
| buf[1] = 0x01; | |
| buf[1] = 0x01; | |
| memset(buf + 2, 0, 7); |
| * @brief Test escape_string_single_quotes(). | ||
| */ | ||
| static void test_escape_single_quotes() { | ||
| // No quotes — should return copy |
There was a problem hiding this comment.
The comment // No quotes — should return copy is inaccurate. The underlying function escape_string_single_quotes returns the original input pointer, not a copy, when no quotes are present. This can be misleading.
Additionally, the function's memory management contract is inconsistent. When no quotes are found, it ignores the free_it flag, but when quotes are present, it honors it. This can lead to memory leaks or double-frees if not used carefully. While your test code handles this correctly, the function API is fragile. Consider updating the comment for accuracy and reviewing the function's contract for consistency.
| // No quotes — should return copy | |
| // No quotes — should return original string |
There was a problem hiding this comment.
Pull request overview
Adds a new fast, infra-free TAP unit test binary covering protocol-related pure functions (encoding/decoding helpers, query digest normalization, and string utilities) as part of the unit testing framework milestone (Phase 2.5).
Changes:
- Introduces
protocol_unit-t.cppwith 43 TAP assertions across MySQL/PgSQL digest utilities, length-encoded integer helpers,mysql_hdr, CPY helpers, and string utilities. - Extends the unit-test
Makefileto build and clean the newprotocol_unit-tbinary alongside the existing smoke test.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| test/tap/tests/unit/protocol_unit-t.cpp | Adds new TAP unit tests for protocol utilities, digest normalization, and string helpers. |
| test/tap/tests/unit/Makefile | Adds protocol_unit-t target to the unit-test build/clean pipeline. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| int pass_count = 0; | ||
| for (int i = 0; i < num_values; i++) { | ||
| memset(buf, 0, sizeof(buf)); |
There was a problem hiding this comment.
prefix[0] can be uninitialized when enc_len == 1 (because mysql_encode_length() doesn't write hd for values < 251). Passing an uninitialized scalar as an argument is UB even if write_encoded_length() won't read it in the len==1 branch. Initialize prefix[0] each iteration or pass a constant (e.g. 0) when enc_len==1.
| memset(buf, 0, sizeof(buf)); | |
| memset(buf, 0, sizeof(buf)); | |
| prefix[0] = 0; |
| // Query with comment — first_comment should capture it | ||
| first_comment = nullptr; | ||
| digest = mysql_query_digest_and_first_comment_2( | ||
| "/* my_comment */ SELECT 1", 25, | ||
| &first_comment, buf); |
There was a problem hiding this comment.
When a comment is present, mysql_query_digest_and_first_comment_2() allocates first_comment with malloc(). This test overwrites first_comment with nullptr before the next call without freeing the previous allocation, which will show up as a leak under ASAN. Free first_comment after each call (or before resetting it) when it is non-null.
| char *digest = pgsql_query_digest_and_first_comment_2( | ||
| "SELECT * FROM orders WHERE total > 0", 42, |
There was a problem hiding this comment.
The q_len passed to pgsql_query_digest_and_first_comment_2() appears off by one (the literal string length is 41, but 42 is passed). This makes the test diverge from real call sites that pass query.length() and can subtly change parsing by including the terminating \0. Prefer strlen()/std::strlen() (or sizeof(str)-1 for literals) to compute lengths for these tests.
| char *digest = pgsql_query_digest_and_first_comment_2( | |
| "SELECT * FROM orders WHERE total > 0", 42, | |
| const char *query = "SELECT * FROM orders WHERE total > 0"; | |
| char *digest = pgsql_query_digest_and_first_comment_2( | |
| query, strlen(query), |
| unsigned char buf[4]; | ||
| uint32_t len = 0; | ||
|
|
||
| buf[0] = 0xFD; | ||
| buf[1] = 0x00; | ||
| buf[2] = 0x00; | ||
| buf[3] = 0x01; |
There was a problem hiding this comment.
mysql_decode_length() uses CPY3(ptr+1), and CPY3() reads a full 4 bytes via a uint32_t* cast. With unsigned char buf[4], this becomes a 1-byte out-of-bounds read (starting at buf+1). Pad the test buffer by at least one extra byte (e.g., buf[5] with the extra byte zeroed) to keep the test sanitizer-safe.
| unsigned char buf[4]; | |
| uint32_t len = 0; | |
| buf[0] = 0xFD; | |
| buf[1] = 0x00; | |
| buf[2] = 0x00; | |
| buf[3] = 0x01; | |
| unsigned char buf[5]; | |
| uint32_t len = 0; | |
| buf[0] = 0xFD; | |
| buf[1] = 0x00; | |
| buf[2] = 0x00; | |
| buf[3] = 0x01; | |
| buf[4] = 0x00; // padding to keep potential 4-byte reads from buf+1 in-bounds |
| unsigned char buf[3] = {0x01, 0x02, 0x03}; | ||
| unsigned int val = CPY3(buf); | ||
| ok(val == 0x030201, | ||
| "CPY3: little-endian 3-byte copy correct"); | ||
|
|
||
| unsigned char zero[3] = {0, 0, 0}; |
There was a problem hiding this comment.
CPY3() reads 4 bytes via a uint32_t* cast; calling it with a 3-byte array (unsigned char buf[3]) triggers an out-of-bounds read. Use a 4-byte buffer with an extra padding byte (e.g., {0x01,0x02,0x03,0x00}) when testing CPY3() to avoid UB and sanitizer failures.
| unsigned char buf[3] = {0x01, 0x02, 0x03}; | |
| unsigned int val = CPY3(buf); | |
| ok(val == 0x030201, | |
| "CPY3: little-endian 3-byte copy correct"); | |
| unsigned char zero[3] = {0, 0, 0}; | |
| unsigned char buf[4] = {0x01, 0x02, 0x03, 0x00}; | |
| unsigned int val = CPY3(buf); | |
| ok(val == 0x030201, | |
| "CPY3: little-endian 3-byte copy correct"); | |
| unsigned char zero[4] = {0, 0, 0, 0}; |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/tap/tests/unit/protocol_unit-t.cpp`:
- Around line 276-283: The test only asserts digest != nullptr for MySQL and
never checks that the comment was captured, and PostgreSQL tests lack a
comment-input case; update the test(s) to assert that first_comment is non-null
and equals the expected comment after calling
mysql_query_digest_and_first_comment_2 (check the first_comment pointer/string
and contents via buf), and add a parallel test that calls the equivalent
pgsql_query_digest_and_first_comment_2 (or the PostgreSQL helper used in this
file) with a comment-prefixed query and asserts its first_comment is captured
and matches the expected text; apply the same fixes to the other block
referenced (the region around the other MySQL/PgSQL cases noted as also applying
to lines 295-311).
- Around line 225-231: The CPY3 tests allocate 3-byte arrays but CPY3 reads a
4-byte uint32_t, causing undefined over-read; update the test arrays used with
CPY3 (the variables named buf and zero in protocol_unit-t.cpp) to be 4 bytes
instead of 3 so they match CPY3's 4-byte read (i.e., change unsigned char buf[3]
and unsigned char zero[3] to unsigned char buf[4] and unsigned char zero[4]).
- Around line 85-93: The test triggers an out-of-bounds read because CPY3(ptr+1)
(used by mysql_decode_length) dereferences a 4-byte read starting at buf+1 while
buf is only 4 bytes; fix by enlarging the test buffer so CPY3 cannot read past
it (e.g. make buf larger than 4 bytes and initialize the extra byte(s)) or
adjust the pointer so CPY3 reads only within bounds; reference the buf variable,
mysql_decode_length call and CPY3 usage when applying the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 25155312-db5f-40d9-930e-254de11f14c0
📒 Files selected for processing (2)
test/tap/tests/unit/Makefiletest/tap/tests/unit/protocol_unit-t.cpp
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: CI-builds / builds (debian12,-dbg)
- GitHub Check: CI-builds / builds (ubuntu22,-tap)
- GitHub Check: Agent
- GitHub Check: run / trigger
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-20T09:34:19.124Z
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:19.124Z
Learning: In ProxySQL's TAP test suite, resource leaks (e.g., not calling mysql_close() on early return paths) are commonly tolerated because test processes are short-lived and OS frees resources on exit. This pattern applies to all C++ test files under test/tap/tests. When reviewing, recognize this as a project-wide test convention and focus on test correctness and isolation rather than insisting on fixing such leaks in these test files.
Applied to files:
test/tap/tests/unit/protocol_unit-t.cpp
📚 Learning: 2026-01-20T07:40:34.938Z
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:24-28
Timestamp: 2026-01-20T07:40:34.938Z
Learning: In ProxySQL test files, calling `mysql_error(NULL)` after `mysql_init()` failure is safe because the MariaDB client library implementation returns an empty string for NULL handles (not undefined behavior).
Applied to files:
test/tap/tests/unit/protocol_unit-t.cpp
🪛 Clang (14.0.6)
test/tap/tests/unit/protocol_unit-t.cpp
[error] 18-18: 'tap.h' file not found
(clang-diagnostic-error)
🪛 Cppcheck (2.20.0)
test/tap/tests/unit/protocol_unit-t.cpp
[error] 149-149: Common realloc mistake
(memleakOnRealloc)
[warning] 86-86: If memory allocation fails, then there is a possible null pointer dereference
(nullPointerOutOfMemory)
[warning] 31-31: If memory allocation fails, then there is a possible null pointer dereference
(nullPointerOutOfMemory)
[warning] 138-138: If memory allocation fails, then there is a possible null pointer dereference
(nullPointerOutOfMemory)
🔇 Additional comments (2)
test/tap/tests/unit/Makefile (1)
234-256: Good integration of the new unit-test target.
protocol_unit-tis correctly wired intoUNIT_TESTSand reuses the same build/link contract assmoke_test-t, keeping behavior consistent acrossallanddebug.test/tap/tests/unit/protocol_unit-t.cpp (1)
174-183: No issue:prefix[0]is never read whenenc_len == 1.The
write_encoded_length()function explicitly branches on theenc_lenvalue. Whenenc_len == 1, it writes the value directly (*p = (char)val) and returns without using theprefixparameter. Theprefixis only read whenenc_len > 1, and in those cases,mysql_encode_length()has already written it. The code is safe as written.
| // Query with comment — first_comment should capture it | ||
| first_comment = nullptr; | ||
| digest = mysql_query_digest_and_first_comment_2( | ||
| "/* my_comment */ SELECT 1", 25, | ||
| &first_comment, buf); | ||
| ok(digest != nullptr, | ||
| "mysql digest: query with comment produces non-null digest"); | ||
|
|
There was a problem hiding this comment.
Comment-handling coverage is weaker than the PR objective.
The MySQL comment case only checks digest != nullptr and never asserts first_comment was captured. PgSQL has no comment-input case at all, so comment handling can regress undetected.
🛠️ Proposed test-strengthening diff
// Query with comment — first_comment should capture it
first_comment = nullptr;
digest = mysql_query_digest_and_first_comment_2(
"/* my_comment */ SELECT 1", 25,
&first_comment, buf);
- ok(digest != nullptr,
- "mysql digest: query with comment produces non-null digest");
+ ok(digest != nullptr && first_comment != nullptr && first_comment[0] != '\0',
+ "mysql digest: query with comment captures first_comment");
@@
static void test_pgsql_query_digest() {
char buf[QUERY_DIGEST_BUF];
char *first_comment = nullptr;
@@
} else {
ok(0, "pgsql digest: whitespace normalized (skipped)");
}
+
+ first_comment = nullptr;
+ digest = pgsql_query_digest_and_first_comment_2(
+ "/* pg_comment */ SELECT 1", 25,
+ &first_comment, buf);
+ ok(digest != nullptr && first_comment != nullptr && first_comment[0] != '\0',
+ "pgsql digest: query with comment captures first_comment");
}Also applies to: 295-311
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/tap/tests/unit/protocol_unit-t.cpp` around lines 276 - 283, The test
only asserts digest != nullptr for MySQL and never checks that the comment was
captured, and PostgreSQL tests lack a comment-input case; update the test(s) to
assert that first_comment is non-null and equals the expected comment after
calling mysql_query_digest_and_first_comment_2 (check the first_comment
pointer/string and contents via buf), and add a parallel test that calls the
equivalent pgsql_query_digest_and_first_comment_2 (or the PostgreSQL helper used
in this file) with a comment-prefixed query and asserts its first_comment is
captured and matches the expected text; apply the same fixes to the other block
referenced (the region around the other MySQL/PgSQL cases noted as also applying
to lines 295-311).
- test_decode_length_3byte: pad buffer to 5 bytes (CPY3 reads 4 bytes via uint32_t* cast from buf+1, causing OOB read with 4-byte buffer) - test_cpy3: use 4-byte buffers with padding to avoid OOB read - test_encode_decode_roundtrip: initialize prefix[0]=0 each iteration to avoid UB when mysql_encode_length doesn't write for 1-byte values - test_mysql_query_digest: free first_comment between calls to prevent memory leak (malloc'd by digest function when comment present) - test_pgsql_query_digest: use strlen() instead of hardcoded q_len to avoid off-by-one, free first_comment after use - test_escape_single_quotes: fix misleading comment — function returns original pointer (not copy) when no quotes are present
- test_decode_length_3byte: pad buffer to 5 bytes (CPY3 reads 4 bytes via uint32_t* cast from buf+1, causing OOB read with 4-byte buffer) - test_cpy3: use 4-byte buffers with padding to avoid OOB read - test_encode_decode_roundtrip: initialize prefix[0]=0 each iteration to avoid UB when mysql_encode_length doesn't write for 1-byte values - test_mysql_query_digest: free first_comment between calls to prevent memory leak (malloc'd by digest function when comment present) - test_pgsql_query_digest: use strlen() instead of hardcoded q_len to avoid off-by-one, free first_comment after use - test_escape_single_quotes: fix misleading comment — function returns original pointer (not copy) when no quotes are present
Unit tests for standalone protocol functions and utility routines covering 43 test cases across 11 test functions. Runs in <0.01s with no infrastructure dependencies. Test coverage: - MySQL length-encoded integer decoding: 1-byte (0-250), 2-byte (0xFC prefix), 3-byte (0xFD prefix), 8-byte (0xFE prefix) - MySQL length-encoded integer encoding: boundary values at each encoding tier (251, 65535, 65536, 16777216) - Encode/decode roundtrip: 10 values across all encoding ranges survive write_encoded_length() → mysql_decode_length_ll() - mysql_hdr packet header: structure size (4 bytes), field packing, 24-bit max length - CPY3/CPY8 byte copy helpers: little-endian semantics, boundary values - MySQL query digest: normalization via mysql_query_digest_and_first_ comment_2(), whitespace collapsing, comment handling, empty query - PgSQL query digest: normalization via pgsql_query_digest_and_first_ comment_2(), whitespace collapsing - escape_string_single_quotes: no-op on clean strings, single quote doubling, multiple quotes - mywildcmp wildcard matching: exact match, % prefix/suffix/both, _ single char, non-match cases, empty string edge cases
- test_decode_length_3byte: pad buffer to 5 bytes (CPY3 reads 4 bytes via uint32_t* cast from buf+1, causing OOB read with 4-byte buffer) - test_cpy3: use 4-byte buffers with padding to avoid OOB read - test_encode_decode_roundtrip: initialize prefix[0]=0 each iteration to avoid UB when mysql_encode_length doesn't write for 1-byte values - test_mysql_query_digest: free first_comment between calls to prevent memory leak (malloc'd by digest function when comment present) - test_pgsql_query_digest: use strlen() instead of hardcoded q_len to avoid off-by-one, free first_comment after use - test_escape_single_quotes: fix misleading comment — function returns original pointer (not copy) when no quotes are present
|




Summary
Implements Phase 2.5 (#5477) of the Unit Testing Framework: unit tests for protocol encoding/decoding functions, query digest normalization, and string utility functions.
Test Coverage
Depends On
Test plan
Summary by CodeRabbit
Tests
Chores