Range read block on rw conflict#229
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds two new ELoQ MySQL tests (range_read_block_on_write_lock and range_split_deadlock_abort), their .opt and .result artifacts, and bumps the data_substrate submodule pointer; tests simulate range-split locking, fault injection (DEAD_LOCK_ABORT), and verify blocking/unblocking behavior and split completion. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Conn as Connection
participant SE as StorageEngine
participant LM as LockManager
participant CP as Checkpointer
participant FI as FaultInjector
Client->>Conn: issue WRITE that triggers range-split prepare
Conn->>SE: start range-split phase 8 (acquire write-intent lock)
FI->>SE: inject DEAD_LOCK_ABORT at phase 8
Client->>Conn: spawn concurrent READs
Conn->>SE: READs attempt access — blocked by write-intent lock
CP->>SE: checkpoint advances (drives split path)
SE->>LM: handle DEAD_LOCK_ABORT → DowngradeLock (write→read)
LM->>SE: allow retry of phase 8
SE->>Conn: complete split and unblock reads
Conn->>Client: READs return results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test`:
- Line 80: The test currently issues unbounded queries "select c0, c1, c2 from
t1 order by c0" (the --send at the first occurrence and the subsequent
reap/re-execute of "select c0, c1, c2 from t1"), which dumps ~2048 rows twice
into the .result file; replace those full-result queries with "select count(*)
from t1" to verify completion without dumping rows, or alternatively surround
the original sends/reaps with "--disable_result_log" / "--enable_result_log" so
the large outputs are not recorded; update both the initial --send statement and
the reap/re-verify re-execution occurrences to use one of these two approaches.
- Around line 130-135: The wait condition stored in $wait_condition is too
permissive because the "(info IS NULL and time >= 1)" branch can match unrelated
idle/daemon connections; tighten it so we only consider connections that
actually reference t1 or are in a lock state. Edit the SQL in $wait_condition to
remove the generic "info IS NULL and time >= 1" clause and instead use just
"(state like "%lock%") OR (info like "select%from t1%")" so only lock-holding or
t1-related queries are considered by the wait condition.
In `@storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.test`:
- Around line 65-70: The wait condition in $wait_condition is too broad because
`(info IS NULL and time >= 1)` matches idle connections (including the test's
own default connection); update the predicate in the select from
information_schema.processlist to exclude the current connection and only target
other idle sessions — e.g. replace `(info IS NULL and time >= 1)` with `(info IS
NULL and time >= 1 and id != CONNECTION_ID())` (keep the existing `info like
"select%from t1%"` check intact) so the count(*) = 0 check no longer blocks on
unrelated idle connections; locate and modify the $wait_condition definition to
apply this change.
🧹 Nitpick comments (5)
storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test (3)
42-42: Hardcoded--sleepdurations make the test slow and potentially flaky.The test accumulates
--sleep 5 + --sleep 2 + --sleep 2 + --sleep 10 + --sleep 5 = 24seconds of fixed waits. On fast machines these are wasted time; on slow CI machines they may be insufficient. Where possible, preferwait_conditionloops (which already exist elsewhere in this test) over fixed sleeps. At minimum, the--sleep 10on line 83 seems excessive for verifying that reads are blocked — a shorter sleep followed by thewait_conditionon line 93 should suffice.Also applies to: 50-50, 54-58, 83-83
67-70: Wait condition in Test Case 1 is overly broad.The condition
state like "%lock%" or info like "select count(*) from t1%"matches any connection with "lock" in its state (which could match unrelated connections) OR any connection running the matching query (which doesn't necessarily mean it's blocked). Consider combining the conditions with AND and also filtering by connection ID to be more precise.Proposed tighter condition
let $wait_condition= select count(*) >= 1 from information_schema.processlist - where (state like "%lock%" or info like "select count(*) from t1%") + where info like "select count(*) from t1%" + and state like "%lock%" and id != CONNECTION_ID();
93-99: Same broad condition pattern in Test Case 2.Similar to Test Case 1, using OR between
state like "%lock%"and the info matches weakens the assertion. This should require both a matching query and a lock-wait state to reliably confirm the reads are blocked. Also, the count threshold (>= 3) includes conn_read1 from Test Case 1 — confirm that conn_read1's query is still pending at this point (it should be, since the debug point is still active).storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.test (2)
51-54: 15 seconds of fixed sleeps with no condition-based gating.Same concern as the sibling test —
--sleep 5+--sleep 10is slow on fast machines and fragile on slow CI. Await_conditionpolling for evidence that the split reached phase 8 (e.g., checking a status variable or processlist state) would be more robust.
36-36: For consistency with the sibling test, use explicit column list instead ofSELECT *.The
insert into t1 select * from t1;pattern works correctly in the Eloq engine, but the sibling test (range_read_block_on_write_lock.test) uses an explicit column list for better clarity:insert into t1 (c1, c2, c3, c4) select c1, c2, c3, c4 from t1;Consider adopting the same pattern here for consistency across the test suite.
storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test
Show resolved
Hide resolved
| let $wait_condition= | ||
| select count(*) = 0 from information_schema.processlist | ||
| where ((state like "%lock%") or | ||
| (info IS NULL and time >= 1)) | ||
| and (info like "select%from t1%" or info IS NULL); | ||
| --source include/wait_condition.inc |
There was a problem hiding this comment.
Wait condition in Test Case 5 may match unrelated idle connections.
The predicate (info IS NULL and time >= 1) combined with (info like "select%from t1%" or info IS NULL) will match any idle connection with NULL info and time >= 1, including daemon threads or the default connection itself (excluded only by the lock-state branch, not the NULL-info branch). This can cause the wait_condition to never reach count(*) = 0, leading to a timeout and false test failure.
Proposed fix: tighten the condition to only match t1-related queries
let $wait_condition=
select count(*) = 0 from information_schema.processlist
- where ((state like "%lock%") or
- (info IS NULL and time >= 1))
- and (info like "select%from t1%" or info IS NULL);
+ where info like "select%from t1%"
+ and id != CONNECTION_ID();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let $wait_condition= | |
| select count(*) = 0 from information_schema.processlist | |
| where ((state like "%lock%") or | |
| (info IS NULL and time >= 1)) | |
| and (info like "select%from t1%" or info IS NULL); | |
| --source include/wait_condition.inc | |
| let $wait_condition= | |
| select count(*) = 0 from information_schema.processlist | |
| where info like "select%from t1%" | |
| and id != CONNECTION_ID(); | |
| --source include/wait_condition.inc |
🤖 Prompt for AI Agents
In `@storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test`
around lines 130 - 135, The wait condition stored in $wait_condition is too
permissive because the "(info IS NULL and time >= 1)" branch can match unrelated
idle/daemon connections; tighten it so we only consider connections that
actually reference t1 or are in a lock state. Edit the SQL in $wait_condition to
remove the generic "info IS NULL and time >= 1" clause and instead use just
"(state like "%lock%") OR (info like "select%from t1%")" so only lock-holding or
t1-related queries are considered by the wait condition.
63506e3 to
679932b
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test`:
- Around line 1-6: This test uses debug sync points via debug_dbug, so add
sourcing of have_debug.inc at the top of the test to skip on non-debug builds:
include/have_debug.inc should be sourced before any debug_dbug usage (i.e.,
before the first debug_dbug call in this file), and consider adding the same
include to the shared eloq include files so all tests that call debug_dbug will
skip when not built with debug support.
🧹 Nitpick comments (3)
storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test (3)
42-42: Total of ~24 seconds of hard sleeps makes this test slow and potentially flaky.The test accumulates
--sleep 5 + 2 + 2 + 10 + 5 = 24seconds of hard sleeps. The 10-second sleep at line 83 is particularly aggressive. While some sleep is unavoidable for debug-sync based tests, consider whether thewait_conditionchecks could replace some of these sleeps, or whether shorter sleeps suffice (especially lines 50 and 83).Also applies to: 50-50, 58-58, 83-83, 106-106
105-105: Remove commented-out debug command.This commented-out
SET SESSION debug_dbugline appears to be a leftover from development/debugging. If it's intentionally kept as documentation, add a comment explaining why; otherwise remove it to keep the test clean.SET SESSION debug_dbug="-d,eloq;term_SplitFlushOp_CommitAcquireAllWriteOp_Continue;node_id=-1"; -# SET SESSION debug_dbug="+d,eloq;at_once;node_id=-1;action=NOTIFY_CHECKPOINTER"; --sleep 5
120-125: Unnecessary full table re-scan at line 123.After
--reapcollects the result from the--sendon line 80, line 123 re-executesselect c0, c1, c2 from t1(without ORDER BY), scanning all ~2048 rows again for no verification purpose since result logging is disabled. Consider removing it or replacing with a lightweight check if needed.--connection conn_read3 --disable_result_log --reap -select c0, c1, c2 from t1; --enable_result_log commit;
| --source include/have_eloq.inc | ||
| set @@default_storage_engine= eloq; | ||
|
|
||
| --disable_warnings | ||
| DROP TABLE IF EXISTS t1; | ||
| --enable_warnings |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find tests in the eloq directory that use debug_dbug
echo "=== Tests using debug_dbug ==="
rg -l 'debug_dbug' storage/eloq/mysql-test/mono_basic/t/ 2>/dev/null | sort
echo ""
echo "=== Tests sourcing have_debug.inc ==="
rg -l 'have_debug' storage/eloq/mysql-test/mono_basic/t/ 2>/dev/null | sort
echo ""
echo "=== Tests using both debug_dbug AND have_debug ==="
rg -l 'debug_dbug' storage/eloq/mysql-test/mono_basic/t/ 2>/dev/null | while read f; do
if rg -q 'have_debug' "$f" 2>/dev/null; then
echo "$f"
fi
done
echo ""
echo "=== Tests using debug_dbug but NOT have_debug ==="
rg -l 'debug_dbug' storage/eloq/mysql-test/mono_basic/t/ 2>/dev/null | while read f; do
if ! rg -q 'have_debug' "$f" 2>/dev/null; then
echo "$f"
fi
doneRepository: eloqdata/eloqsql
Length of output: 2890
🏁 Script executed:
#!/bin/bash
# Check if have_debug.inc exists anywhere in the repo
echo "=== Searching for have_debug.inc ==="
find . -name "have_debug.inc" 2>/dev/null
echo ""
echo "=== Checking eloq includes directory ==="
ls -la storage/eloq/mysql-test/mono_basic/include/ 2>/dev/null || echo "Include dir not found"
echo ""
echo "=== All .inc files in eloq includes ==="
ls storage/eloq/mysql-test/mono_basic/include/*.inc 2>/dev/null | head -20
echo ""
echo "=== Checking if have_debug.inc exists in main mysql test suite ==="
find . -path "*/mysql-test/include/have_debug.inc" 2>/dev/nullRepository: eloqdata/eloqsql
Length of output: 837
🏁 Script executed:
#!/bin/bash
echo "=== Content of have_debug.inc ==="
cat mysql-test/include/have_debug.inc
echo ""
echo "=== Content of have_eloq.inc ==="
cat storage/eloq/mysql-test/mono_basic/include/have_eloq.inc
echo ""
echo "=== First few lines of an eloq test file for comparison ==="
head -10 storage/eloq/mysql-test/mono_basic/t/alter_table_recovery.testRepository: eloqdata/eloqsql
Length of output: 657
🏁 Script executed:
#!/bin/bash
echo "=== Check if eloq tests can source ../../../../../mysql-test/include/have_debug.inc ==="
head -5 storage/eloq/mysql-test/mono_basic/t/alter_table_recovery.test | grep -E 'source|include'
echo ""
echo "=== Check how many eloq tests source long_test.inc ==="
rg -l 'long_test.inc' storage/eloq/mysql-test/mono_basic/t/ | wc -l
echo ""
echo "=== All eloq tests and their includes ==="
for f in storage/eloq/mysql-test/mono_basic/t/*.test; do
echo "=== $(basename $f) ==="
head -10 "$f" | grep -E '^\-\-source'
done | head -50Repository: eloqdata/eloqsql
Length of output: 1918
Source include/have_debug.inc to skip on non-debug builds.
This test uses debug_dbug (lines 20, 40, 104) which requires a debug build. Standard MySQL tests that rely on debug sync points source have_debug.inc to skip gracefully on non-debug builds. Without this, the test may fail unexpectedly if the debug sync points don't fire.
--source include/have_eloq.inc
+--source include/have_debug.inc
set @@default_storage_engine= eloq;Note: This pattern applies to all 20 debug_dbug-using tests in the eloq suite, not just this one. Consider adding have_debug.inc to the shared eloq include files for consistency.
🤖 Prompt for AI Agents
In `@storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test`
around lines 1 - 6, This test uses debug sync points via debug_dbug, so add
sourcing of have_debug.inc at the top of the test to skip on non-debug builds:
include/have_debug.inc should be sourced before any debug_dbug usage (i.e.,
before the first debug_dbug call in this file), and consider adding the same
include to the shared eloq include files so all tests that call debug_dbug will
skip when not built with debug support.
4aef7f8 to
481ab8a
Compare
Summary by CodeRabbit
Tests
Chores