Range read block on rw conflict by githubzilla · Pull Request #229 · eloqdata/eloqsql

githubzilla · 2026-02-08T07:14:06Z

Summary by CodeRabbit

Tests
- Added a test for range-split deadlock-abort handling, including write-intent downgrade and retry verification.
- Added a test validating that reads block while a write lock is held during range-splits and complete after release.
- Added test options to stabilize checkpoint timing.
Chores
- Updated tracked submodule pointer (no functional API changes).

coderabbitai · 2026-02-08T07:15:18Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds two new ELoQ MySQL tests (range_read_block_on_write_lock and range_split_deadlock_abort), their .opt and .result artifacts, and bumps the data_substrate submodule pointer; tests simulate range-split locking, fault injection (DEAD_LOCK_ABORT), and verify blocking/unblocking behavior and split completion.

Changes

Cohort / File(s)	Summary
Range Read Block on Write Lock `storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test`, `storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.opt`, `storage/eloq/mysql-test/mono_basic/r/range_read_block_on_write_lock.result`	New test, option, and expected-result files. Seeds data to force a range split, injects a debug pause during prepare while holding a write-intent lock, spawns concurrent readers to assert they block, then releases the lock and verifies reads complete. Option adds `--checkpointer_interval=86400`.
Range Split Deadlock Abort `storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.test`, `storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.opt`, `storage/eloq/mysql-test/mono_basic/r/range_split_deadlock_abort.result`	New test, option, and result files. Injects DEAD_LOCK_ABORT at range-split phase 8, asserts DowngradeLock and retry behavior, waits for checkpoint-driven split progress, verifies data accessibility and no blocked queries. Option adds `--checkpointer_interval=86400`.
Submodule bump `data_substrate`	Updated submodule commit pointer; no code changes in this diff.

Sequence Diagram(s)

sequenceDiagram
participant Client as Client
participant Conn as Connection
participant SE as StorageEngine
participant LM as LockManager
participant CP as Checkpointer
participant FI as FaultInjector

Client->>Conn: issue WRITE that triggers range-split prepare
Conn->>SE: start range-split phase 8 (acquire write-intent lock)
FI->>SE: inject DEAD_LOCK_ABORT at phase 8
Client->>Conn: spawn concurrent READs
Conn->>SE: READs attempt access — blocked by write-intent lock
CP->>SE: checkpoint advances (drives split path)
SE->>LM: handle DEAD_LOCK_ABORT → DowngradeLock (write→read)
LM->>SE: allow retry of phase 8
SE->>Conn: complete split and unblock reads
Conn->>Client: READs return results

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Update subm for fix assert #188: Modifies the same MySQL test files and the data_substrate pointer; likely directly related.
Range read block range lock if RW conflict #180: Edits/adds the same ELoQ MySQL tests (range_read_block_on_write_lock.* and range_split_deadlock_abort.*).

Suggested reviewers

liunyl
thweetkomputer

Poem

🐇 I hopped through rows and nudged phase eight,
Locks held tight while readers learned to wait,
A fault then blinked and nudged a softer hand,
The split retried and queries cleared the land,
I twitched my whiskers — tests passed, all went great!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Range read block on rw conflict' directly summarizes the main objective of the pull request. The changes add new test scripts to verify range read blocking behavior when there are read-write conflicts during range split operations.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test`:
- Line 80: The test currently issues unbounded queries "select c0, c1, c2 from
t1 order by c0" (the --send at the first occurrence and the subsequent
reap/re-execute of "select c0, c1, c2 from t1"), which dumps ~2048 rows twice
into the .result file; replace those full-result queries with "select count(*)
from t1" to verify completion without dumping rows, or alternatively surround
the original sends/reaps with "--disable_result_log" / "--enable_result_log" so
the large outputs are not recorded; update both the initial --send statement and
the reap/re-verify re-execution occurrences to use one of these two approaches.
- Around line 130-135: The wait condition stored in $wait_condition is too
permissive because the "(info IS NULL and time >= 1)" branch can match unrelated
idle/daemon connections; tighten it so we only consider connections that
actually reference t1 or are in a lock state. Edit the SQL in $wait_condition to
remove the generic "info IS NULL and time >= 1" clause and instead use just
"(state like "%lock%") OR (info like "select%from t1%")" so only lock-holding or
t1-related queries are considered by the wait condition.

In `@storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.test`:
- Around line 65-70: The wait condition in $wait_condition is too broad because
`(info IS NULL and time >= 1)` matches idle connections (including the test's
own default connection); update the predicate in the select from
information_schema.processlist to exclude the current connection and only target
other idle sessions — e.g. replace `(info IS NULL and time >= 1)` with `(info IS
NULL and time >= 1 and id != CONNECTION_ID())` (keep the existing `info like
"select%from t1%"` check intact) so the count(*) = 0 check no longer blocks on
unrelated idle connections; locate and modify the $wait_condition definition to
apply this change.

🧹 Nitpick comments (5)

storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test (3)
42-42: Hardcoded --sleep durations make the test slow and potentially flaky.

The test accumulates --sleep 5 + --sleep 2 + --sleep 2 + --sleep 10 + --sleep 5 = 24 seconds of fixed waits. On fast machines these are wasted time; on slow CI machines they may be insufficient. Where possible, prefer wait_condition loops (which already exist elsewhere in this test) over fixed sleeps. At minimum, the --sleep 10 on line 83 seems excessive for verifying that reads are blocked — a shorter sleep followed by the wait_condition on line 93 should suffice.

Also applies to: 50-50, 54-58, 83-83

67-70: Wait condition in Test Case 1 is overly broad.

The condition state like "%lock%" or info like "select count(*) from t1%" matches any connection with "lock" in its state (which could match unrelated connections) OR any connection running the matching query (which doesn't necessarily mean it's blocked). Consider combining the conditions with AND and also filtering by connection ID to be more precise.
Proposed tighter condition
 let $wait_condition=
     select count(*) >= 1 from information_schema.processlist
-    where (state like "%lock%" or info like "select count(*) from t1%")
+    where info like "select count(*) from t1%"
+    and state like "%lock%"
     and id != CONNECTION_ID();
93-99: Same broad condition pattern in Test Case 2.

Similar to Test Case 1, using OR between state like "%lock%" and the info matches weakens the assertion. This should require both a matching query and a lock-wait state to reliably confirm the reads are blocked. Also, the count threshold (>= 3) includes conn_read1 from Test Case 1 — confirm that conn_read1's query is still pending at this point (it should be, since the debug point is still active).
storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.test (2)
51-54: 15 seconds of fixed sleeps with no condition-based gating.

Same concern as the sibling test — --sleep 5 + --sleep 10 is slow on fast machines and fragile on slow CI. A wait_condition polling for evidence that the split reached phase 8 (e.g., checking a status variable or processlist state) would be more robust.

36-36: For consistency with the sibling test, use explicit column list instead of SELECT *.

The insert into t1 select * from t1; pattern works correctly in the Eloq engine, but the sibling test (range_read_block_on_write_lock.test) uses an explicit column list for better clarity:
insert into t1 (c1, c2, c3, c4) select c1, c2, c3, c4 from t1;
Consider adopting the same pattern here for consistency across the test suite.

storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test

coderabbitai · 2026-02-08T07:18:41Z

storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test

+let $wait_condition=
+    select count(*) = 0 from information_schema.processlist
+    where ((state like "%lock%") or
+           (info IS NULL and time >= 1))
+    and (info like "select%from t1%" or info IS NULL);
+--source include/wait_condition.inc


⚠️ Potential issue | 🟠 Major

Wait condition in Test Case 5 may match unrelated idle connections.

The predicate (info IS NULL and time >= 1) combined with (info like "select%from t1%" or info IS NULL) will match any idle connection with NULL info and time >= 1, including daemon threads or the default connection itself (excluded only by the lock-state branch, not the NULL-info branch). This can cause the wait_condition to never reach count(*) = 0, leading to a timeout and false test failure.

Proposed fix: tighten the condition to only match t1-related queries

let $wait_condition= select count(*) = 0 from information_schema.processlist - where ((state like "%lock%") or - (info IS NULL and time >= 1)) - and (info like "select%from t1%" or info IS NULL); + where info like "select%from t1%" + and id != CONNECTION_ID();

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let $wait_condition=

select count(*) = 0 from information_schema.processlist

where ((state like "%lock%") or

(info IS NULL and time >= 1))

and (info like "select%from t1%" or info IS NULL);

--source include/wait_condition.inc

let $wait_condition=

select count(*) = 0 from information_schema.processlist

where info like "select%from t1%"

and id != CONNECTION_ID();

--source include/wait_condition.inc

🤖 Prompt for AI Agents

In `@storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test` around lines 130 - 135, The wait condition stored in $wait_condition is too permissive because the "(info IS NULL and time >= 1)" branch can match unrelated idle/daemon connections; tighten it so we only consider connections that actually reference t1 or are in a lock state. Edit the SQL in $wait_condition to remove the generic "info IS NULL and time >= 1" clause and instead use just "(state like "%lock%") OR (info like "select%from t1%")" so only lock-holding or t1-related queries are considered by the wait condition.

storage/eloq/mysql-test/mono_basic/t/range_split_deadlock_abort.test

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test`:
- Around line 1-6: This test uses debug sync points via debug_dbug, so add
sourcing of have_debug.inc at the top of the test to skip on non-debug builds:
include/have_debug.inc should be sourced before any debug_dbug usage (i.e.,
before the first debug_dbug call in this file), and consider adding the same
include to the shared eloq include files so all tests that call debug_dbug will
skip when not built with debug support.

🧹 Nitpick comments (3)

storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test (3)
42-42: Total of ~24 seconds of hard sleeps makes this test slow and potentially flaky.

The test accumulates --sleep 5 + 2 + 2 + 10 + 5 = 24 seconds of hard sleeps. The 10-second sleep at line 83 is particularly aggressive. While some sleep is unavoidable for debug-sync based tests, consider whether the wait_condition checks could replace some of these sleeps, or whether shorter sleeps suffice (especially lines 50 and 83).

Also applies to: 50-50, 58-58, 83-83, 106-106

105-105: Remove commented-out debug command.

This commented-out SET SESSION debug_dbug line appears to be a leftover from development/debugging. If it's intentionally kept as documentation, add a comment explaining why; otherwise remove it to keep the test clean.
 SET SESSION debug_dbug="-d,eloq;term_SplitFlushOp_CommitAcquireAllWriteOp_Continue;node_id=-1";
-# SET SESSION debug_dbug="+d,eloq;at_once;node_id=-1;action=NOTIFY_CHECKPOINTER";
 --sleep 5
120-125: Unnecessary full table re-scan at line 123.

After --reap collects the result from the --send on line 80, line 123 re-executes select c0, c1, c2 from t1 (without ORDER BY), scanning all ~2048 rows again for no verification purpose since result logging is disabled. Consider removing it or replacing with a lightweight check if needed.
 --connection conn_read3
 --disable_result_log
 --reap
-select c0, c1, c2 from t1;
 --enable_result_log
 commit;

coderabbitai · 2026-02-09T05:01:40Z

storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test

+--source include/have_eloq.inc
+set @@default_storage_engine= eloq;
+
+--disable_warnings
+DROP TABLE IF EXISTS t1;
+--enable_warnings


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find tests in the eloq directory that use debug_dbug echo "=== Tests using debug_dbug ===" rg -l 'debug_dbug' storage/eloq/mysql-test/mono_basic/t/ 2>/dev/null | sort echo "" echo "=== Tests sourcing have_debug.inc ===" rg -l 'have_debug' storage/eloq/mysql-test/mono_basic/t/ 2>/dev/null | sort echo "" echo "=== Tests using both debug_dbug AND have_debug ===" rg -l 'debug_dbug' storage/eloq/mysql-test/mono_basic/t/ 2>/dev/null | while read f; do if rg -q 'have_debug' "$f" 2>/dev/null; then echo "$f" fi done echo "" echo "=== Tests using debug_dbug but NOT have_debug ===" rg -l 'debug_dbug' storage/eloq/mysql-test/mono_basic/t/ 2>/dev/null | while read f; do if ! rg -q 'have_debug' "$f" 2>/dev/null; then echo "$f" fi done

Repository: eloqdata/eloqsql

Length of output: 2890

🏁 Script executed:

#!/bin/bash # Check if have_debug.inc exists anywhere in the repo echo "=== Searching for have_debug.inc ===" find . -name "have_debug.inc" 2>/dev/null echo "" echo "=== Checking eloq includes directory ===" ls -la storage/eloq/mysql-test/mono_basic/include/ 2>/dev/null || echo "Include dir not found" echo "" echo "=== All .inc files in eloq includes ===" ls storage/eloq/mysql-test/mono_basic/include/*.inc 2>/dev/null | head -20 echo "" echo "=== Checking if have_debug.inc exists in main mysql test suite ===" find . -path "*/mysql-test/include/have_debug.inc" 2>/dev/null

Repository: eloqdata/eloqsql

Length of output: 837

🏁 Script executed:

#!/bin/bash echo "=== Content of have_debug.inc ===" cat mysql-test/include/have_debug.inc echo "" echo "=== Content of have_eloq.inc ===" cat storage/eloq/mysql-test/mono_basic/include/have_eloq.inc echo "" echo "=== First few lines of an eloq test file for comparison ===" head -10 storage/eloq/mysql-test/mono_basic/t/alter_table_recovery.test

Repository: eloqdata/eloqsql

Length of output: 657

🏁 Script executed:

#!/bin/bash echo "=== Check if eloq tests can source ../../../../../mysql-test/include/have_debug.inc ===" head -5 storage/eloq/mysql-test/mono_basic/t/alter_table_recovery.test | grep -E 'source|include' echo "" echo "=== Check how many eloq tests source long_test.inc ===" rg -l 'long_test.inc' storage/eloq/mysql-test/mono_basic/t/ | wc -l echo "" echo "=== All eloq tests and their includes ===" for f in storage/eloq/mysql-test/mono_basic/t/*.test; do echo "=== $(basename $f) ===" head -10 "$f" | grep -E '^\-\-source' done | head -50

Repository: eloqdata/eloqsql

Length of output: 1918

Source include/have_debug.inc to skip on non-debug builds.

This test uses debug_dbug (lines 20, 40, 104) which requires a debug build. Standard MySQL tests that rely on debug sync points source have_debug.inc to skip gracefully on non-debug builds. Without this, the test may fail unexpectedly if the debug sync points don't fire.

--source include/have_eloq.inc +--source include/have_debug.inc set @@default_storage_engine= eloq;

Note: This pattern applies to all 20 debug_dbug-using tests in the eloq suite, not just this one. Consider adding have_debug.inc to the shared eloq include files for consistency.

🤖 Prompt for AI Agents

In `@storage/eloq/mysql-test/mono_basic/t/range_read_block_on_write_lock.test` around lines 1 - 6, This test uses debug sync points via debug_dbug, so add sourcing of have_debug.inc at the top of the test to skip on non-debug builds: include/have_debug.inc should be sourced before any debug_dbug usage (i.e., before the first debug_dbug call in this file), and consider adding the same include to the shared eloq include files so all tests that call debug_dbug will skip when not built with debug support.

githubzilla added 10 commits December 5, 2025 14:08

Update data_substrate

d8bac9f

Merge remote-tracking branch 'upstream/eloq-10.6.10' into eloq-10.6.10

363d781

Merge remote-tracking branch 'upstream/eloq-10.6.10' into eloq-10.6.10

9e5d1e8

Merge remote-tracking branch 'upstream/eloq-10.6.10' into eloq-10.6.10

c4245ee

Merge remote-tracking branch 'upstream/eloq-10.6.10' into eloq-10.6.10

74850e0

Merge remote-tracking branch 'upstream/eloq-10.6.10' into eloq-10.6.10

04fd052

Merge remote-tracking branch 'upstream/eloq-10.6.10' into eloq-10.6.10

e685b42

Merge remote-tracking branch 'upstream/eloq-10.6.10' into eloq-10.6.10

266e81a

Merge remote-tracking branch 'upstream/eloq-10.6.10' into eloq-10.6.10

5c9ddd5

Merge remote-tracking branch 'upstream/eloq-10.6.10' into eloq-10.6.10

58ccd01

githubzilla added the trigger-ci label Feb 8, 2026

coderabbitai bot reviewed Feb 8, 2026

View reviewed changes

githubzilla added 2 commits February 9, 2026 11:27

Merge remote-tracking branch 'upstream/eloq-10.6.10' into eloq-10.6.10

7c0f1b0

chore: update data_substrate: extend log retention

4274ba8

githubzilla force-pushed the range_read_block_on_rw_conflict branch from 63506e3 to 679932b Compare February 9, 2026 05:00

githubzilla requested a review from liunyl February 9, 2026 05:00

githubzilla mentioned this pull request Feb 9, 2026

[Bug]: Verify if range split acquire all lock can be aborted eloqdata/tx_service#201

Closed

coderabbitai bot reviewed Feb 9, 2026

View reviewed changes

liunyl approved these changes Feb 9, 2026

View reviewed changes

githubzilla added 7 commits February 11, 2026 12:37

Merge remote-tracking branch 'upstream/eloq-10.6.10' into eloq-10.6.10

f8c540b

Add range_read_block_on_write_lock test

175b5ef

Disable auto ckpt when testing

7868873

Add range_split_deadlock_abort.test

58cfe41

Update range_read_block_on_write_lock

cdacfdb

Update data_substrate

d5d5fb8

Update test case reflecting comment

481ab8a

githubzilla force-pushed the range_read_block_on_rw_conflict branch from 4aef7f8 to 481ab8a Compare February 11, 2026 05:30

githubzilla added 2 commits February 11, 2026 07:03

Update data_substrate

0e1a41c

Update data_substrate

3c313aa

Update data_substrate

d9d39ab

githubzilla merged commit a06a46c into eloqdata:eloq-10.6.10 Feb 11, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Range read block on rw conflict#229

Range read block on rw conflict#229
githubzilla merged 22 commits intoeloqdata:eloq-10.6.10from
githubzilla:range_read_block_on_rw_conflict

githubzilla commented Feb 8, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 8, 2026 •

edited

Loading

Reviews paused

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Feb 8, 2026

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

githubzilla commented Feb 8, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

githubzilla commented Feb 8, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 8, 2026 •

edited

Loading