Skip to content

Fix datasyncscan paused key with debug log#332

Merged
yi-xmu merged 1 commit intomainfrom
fix_paused_key
Jan 4, 2026
Merged

Fix datasyncscan paused key with debug log#332
yi-xmu merged 1 commit intomainfrom
fix_paused_key

Conversation

@yi-xmu
Copy link
Collaborator

@yi-xmu yi-xmu commented Jan 2, 2026

The paused key should be the end key of slice if it paused on the end of a slice.

Summary by CodeRabbit

Bug Fixes

  • Added bounds checking to prevent potential access violations when advancing through data slices during scan operations.
  • Refined resumption logic for data scans across partitions to correctly position pause points at slice boundaries.

✏️ Tip: You can customize this high-level summary in your review settings.

The paused key should be the end key of slice if it paused on the end of a slice.

remove debug log
@yi-xmu yi-xmu self-assigned this Jan 2, 2026
@coderabbitai
Copy link

coderabbitai bot commented Jan 2, 2026

Walkthrough

The PR adds bounds checking via assertion when advancing to the next slice in RangePartitionDataSyncScanCc and refines the pause-position calculation logic by conditionally setting the next pause key based on whether the current slice has completed.

Changes

Cohort / File(s) Summary
Slice Advancement Bounds Guard
tx_service/include/cc/cc_request.h
Added assertion after incrementing curr_slice_idx to verify it remains within bounds of slice_coordinator_.pinned_slices_ before accessing the next slice element.
Pause Position Calculation
tx_service/include/cc/template_cc_map.h
Refined logic in TemplateCcMap::Execute for RangePartitionDataSyncScanCc: when computing next_pause_key after slice completion, conditionally set to slice_end_key if key_it == slice_end_it, otherwise clone key_it->first as before.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • liunyl
  • MrGuin

Poem

🐰 Through slices we hop with precision and care,
Checking our bounds before leaping in air,
When each slice completes, the pause key knows best,
Where to resume scanning with confidence blessed—
A rabbit's delight in this sync-scan quest! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (3 warnings)
Check name Status Explanation Resolution
Title check ⚠️ Warning The title 'Fix datasyncscan paused key with debug log' partially relates to the changeset but is misleading regarding debug log removal which isn't documented. Revise the title to focus on the primary change: 'Fix datasyncscan paused key at slice boundary' or similar, removing the reference to debug log if it's not a substantial part of the fix.
Description check ⚠️ Warning The pull request description is minimal and lacks required template sections including tests, documentation, issue references, and test suite verification. Complete the description template by adding sections for tests, documentation references, issue links (using 'fixes eloqdb/tx_service#issue_id'), RFC references if applicable, and confirming test suite results.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix_paused_key

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tx_service/include/cc/template_cc_map.h (1)

5737-5741: Pause key semantics at slice boundary look correct

The new conditional correctly distinguishes between pausing mid‑slice vs pausing exactly at the slice end, and sets the pause key to slice_end_key in the latter case, which matches the stated behavior and avoids mis-resuming across slice boundaries. The invariants around key_it, slice_end_it, and IsLastBatch() remain consistent.

As a tiny readability tweak, you could collapse the ternary into a single source pointer before cloning:

const KeyT* pause_from =
    (key_it != slice_end_it) ? key_it->first : slice_end_key;
next_pause_key = pause_from->CloneTxKey();

but the current form is functionally fine.

tx_service/include/cc/cc_request.h (1)

4299-4304: Assertion correctly guards slice index invariant; consider making bound check non‑debug as well

The new assert(curr_slice_idx < slice_coordinator_.pinned_slices_.size()); is a good invariant check here and will surface any mismatch between pause_pos_ and pinned_slices_ during testing.

If you want to be fully defensive in release builds too, you could fold the bounds check into the loop condition (e.g., guard with curr_slice_idx + 1 < pinned_slices_.size() before increment/index) so an unexpected pause key can’t cause an out‑of‑bounds access even when assert is compiled out. Not urgent, but worth considering if this path is hard to exercise in debug.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42f15b9 and b6ea9c6.

📒 Files selected for processing (2)
  • tx_service/include/cc/cc_request.h
  • tx_service/include/cc/template_cc_map.h
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.
Learnt from: lokax
Repo: eloqdata/tx_service PR: 149
File: src/remote/cc_stream_receiver.cpp:1066-1075
Timestamp: 2025-10-21T06:46:53.700Z
Learning: In src/remote/cc_stream_receiver.cpp, for ScanNextRequest handling, BucketIds() on RemoteScanNextBatch should never be empty—this is an expected invariant of the scan protocol.
📚 Learning: 2025-10-20T04:30:07.884Z
Learnt from: liunyl
Repo: eloqdata/tx_service PR: 149
File: include/cc/cc_request.h:1876-1927
Timestamp: 2025-10-20T04:30:07.884Z
Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.

Applied to files:

  • tx_service/include/cc/cc_request.h
  • tx_service/include/cc/template_cc_map.h
📚 Learning: 2025-10-21T06:46:53.700Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 149
File: src/remote/cc_stream_receiver.cpp:1066-1075
Timestamp: 2025-10-21T06:46:53.700Z
Learning: In src/remote/cc_stream_receiver.cpp, for ScanNextRequest handling, BucketIds() on RemoteScanNextBatch should never be empty—this is an expected invariant of the scan protocol.

Applied to files:

  • tx_service/include/cc/cc_request.h
📚 Learning: 2025-11-11T07:10:40.346Z
Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.

Applied to files:

  • tx_service/include/cc/cc_request.h
📚 Learning: 2025-12-02T10:43:27.431Z
Learnt from: lokax
Repo: eloqdata/tx_service PR: 254
File: tx_service/src/cc/local_cc_shards.cpp:2949-3188
Timestamp: 2025-12-02T10:43:27.431Z
Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.

Applied to files:

  • tx_service/include/cc/cc_request.h
📚 Learning: 2025-10-09T03:56:58.811Z
Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.

Applied to files:

  • tx_service/include/cc/template_cc_map.h

@yi-xmu
Copy link
Collaborator Author

yi-xmu commented Jan 4, 2026

@yi-xmu yi-xmu requested a review from liunyl January 4, 2026 02:29
@yi-xmu yi-xmu merged commit 6cea0a9 into main Jan 4, 2026
4 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Jan 5, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants