Fix bugs by liunyl · Pull Request #352 · eloqdata/tx_service

liunyl · 2026-01-09T06:27:09Z

Fix slice split

1. Only need to split slice when post ckpt size is greater than the current size of the slice and greater than the slice upper bound.
2. For non snapshot read, there is no need to return the deleted keys from the data store.

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved snapshot read handling to correctly process deleted records based on timestamp gating.
- Enhanced batch processing efficiency by optimizing key advancement at batch boundaries.
Performance Improvements
- Refined slice splitting logic to reduce unnecessary splits when post-checkpoint size aligns with actual data footprint.
- Optimized iterator state management during page merging operations.
Chores
- Updated internal dependency references.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

1. Only need to split slice when post ckpt size is greater than the current size of the slice and greater than the slice upper bound. 2. For non snapshot read, there is no need to return the deleted keys from the data store.

coderabbitai · 2026-01-09T06:27:22Z

Walkthrough

This pull request enhances snapshot read handling and batch-end key management in the data store service client closure, adds ownership tracking for batch start keys in the fill-store-slice request structure, and refines slice-splitting conditions and iterator rebinding logic in the transaction service's template map and local shard implementations.

Changes

Cohort / File(s)	Summary
Snapshot handling and batch-end key advancement `store_handler/data_store_service_client_closure.cpp`	Introduced snapshot timestamp gating to conditionally process deleted records; added delayed TxRecord creation for EloqKv path; implemented batch-end key advancement logic when batch size reaches 1000, transferring key ownership to `kv_start_key_owner_`.
Ownership tracking for batch keys `tx_service/include/cc/cc_req_misc.h`	Added new public field `kv_start_key_owner_` to `FillStoreSliceCc` struct to track ownership and lifecycle of batch start keys.
Slice management and iteration logic `tx_service/include/cc/template_cc_map.h`, `tx_service/src/cc/local_cc_shards.cpp`	Enhanced slice-split conditions to additionally check that post-checkpoint size exceeds in-memory slice size; implemented iterator rebinding in `MergePages` to refresh iteration state after page erasure.
Submodule update `store_handler/eloq_data_store_service/eloqstore`	Updated submodule reference pointer.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Fix assert bug #306: Modifies UpdateSlices logic for post-checkpoint-size handling in slice-splitting decisions, overlapping with changes to slice management conditions.
Fix slice index out of range #333: Adjusts end-of-batch slice handling and iteration logic in template map, related to batch-end key advancement semantics.
Update datasyncscan execution #324: Affects range-partition batch handling and start-key ownership coordination, directly tied to kv_start_key_owner_ introduction and batch-end key management.

Suggested reviewers

liunyl

Poem

🐰 Hops through snapshots with grace,
Batches end with keys in place,
Iterators rebound with care,
Slices split when truly fair,
Data flows without a hare! 🍃

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Fix bugs' is vague and generic, using non-descriptive language that doesn't convey meaningful information about the specific changes in the changeset.	Use a more specific title that captures the main changes, such as 'Fix slice split logic and snapshot read behavior for deleted keys' or similar.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description provides clear explanations of the two main changes (slice split logic and snapshot read behavior) but doesn't include required checklist items like test references, issue links, or RFC references.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix_bugs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @store_handler/data_store_service_client_closure.cpp:
- Around line 1342-1352: The code sets fill_store_slice_req->kv_start_key_ to a
string_view of key.Data() and then moves key into AddDataItem, creating a
use-after-move; fix by creating a separate owner TxKey (e.g., owner_key via
catalog_factory->CreateTxKey) when i == items_size - 1 &&
scan_next_closure->ItemsSize() == 1000, assign
fill_store_slice_req->kv_start_key_ = std::string_view(owner_key.Data(),
owner_key.Size()) and store ownership in
fill_store_slice_req->kv_start_key_owner_ (move owner_key), then proceed to call
AddDataItem(std::move(key), ...); this mirrors the deleted-record path and
prevents the dangling string_view.

🧹 Nitpick comments (3)

tx_service/include/cc/cc_req_misc.h (1)
553-554: Document the ownership relationship between kv_start_key_owner_ and kv_start_key_.

The new field kv_start_key_owner_ appears to provide backing storage for the kv_start_key_ view. Without documentation, the relationship between these fields and their intended usage pattern may not be clear to future maintainers. Consider adding a brief comment explaining that kv_start_key_owner_ holds the key data while kv_start_key_ provides a non-owning view into it.
📝 Suggested documentation
 public:
     // These variables only be used in DataStoreHandler
     const std::string *kv_table_name_{nullptr};
+    // Owns the batch start key data; kv_start_key_ provides a view into this.
     TxKey kv_start_key_owner_;
     std::string_view kv_start_key_;
tx_service/include/cc/template_cc_map.h (1)
5424-5429: Harden check_split_slice() against UINT64_MAX sentinel and avoid repeated PostCkptSize() calls.
The updated condition matches the PR objective, but the guard is currently debug-only (assert). In release builds, a sentinel UINT64_MAX would make the predicate trivially true and could trigger pathological splitting.
Proposed tweak
-            assert(slice->PostCkptSize() != UINT64_MAX);
-            // Only need to split the slice when the post ckpt size of the slice
-            // is greater than the current size and greater than the slice upper
-            // bound.
-            return (slice->PostCkptSize() > StoreSlice::slice_upper_bound &&
-                    slice->PostCkptSize() > slice->Size());
+            const uint64_t post_ckpt_size = slice->PostCkptSize();
+            if (post_ckpt_size == UINT64_MAX)
+            {
+                return false;
+            }
+            // Split only when post-ckpt size exceeds both the slice upper bound
+            // and the current slice size.
+            return (post_ckpt_size > StoreSlice::slice_upper_bound &&
+                    post_ckpt_size > slice->Size());
tx_service/src/cc/local_cc_shards.cpp (1)
5392-5414: Confirm intended behavior: “oversized but not growing” slices will no longer split.

The new guard slice_post_ckpt_size <= curr_slice->Size() matches the PR objective (split only when post-ckpt size exceeds both the upper bound and the current slice size). One implication is that a slice already above StoreSlice::slice_upper_bound but not increasing won’t be split anymore—please confirm that’s desired for long-lived/legacy oversized slices.
Proposed small clarity refactor
-        if (slice_post_ckpt_size == UINT64_MAX ||
-            slice_post_ckpt_size <= StoreSlice::slice_upper_bound ||
-            slice_post_ckpt_size <= curr_slice->Size())
+        const bool should_split =
+            slice_post_ckpt_size != UINT64_MAX &&
+            slice_post_ckpt_size > StoreSlice::slice_upper_bound &&
+            slice_post_ckpt_size > curr_slice->Size();
+        if (!should_split)
         {
             // Case 1: There is no unpersisted data in the current slice, so no
             // need to split the slice. Only to migrate this data from the old
             // range to the new range.
             // Case 2: The current slice does not need to be split.
             assert(!status.paused_slice_);
             // Move to the next slice
             flush_record_it = slice_end_it;
             continue;
         }

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b3feed and 915b3a1.

📒 Files selected for processing (5)

store_handler/data_store_service_client_closure.cpp
store_handler/eloq_data_store_service/eloqstore
tx_service/include/cc/cc_req_misc.h
tx_service/include/cc/template_cc_map.h
tx_service/src/cc/local_cc_shards.cpp

🧰 Additional context used

🧠 Learnings (4)

📚 Learning: 2025-10-21T06:46:53.700Z

Learnt from: lokax
Repo: eloqdata/tx_service PR: 149
File: src/remote/cc_stream_receiver.cpp:1066-1075
Timestamp: 2025-10-21T06:46:53.700Z
Learning: In src/remote/cc_stream_receiver.cpp, for ScanNextRequest handling, BucketIds() on RemoteScanNextBatch should never be empty—this is an expected invariant of the scan protocol.

Applied to files:

store_handler/data_store_service_client_closure.cpp

📚 Learning: 2025-11-11T07:10:40.346Z

Learnt from: lzxddz
Repo: eloqdata/tx_service PR: 199
File: include/cc/local_cc_shards.h:233-234
Timestamp: 2025-11-11T07:10:40.346Z
Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.

Applied to files:

tx_service/src/cc/local_cc_shards.cpp

📚 Learning: 2025-10-09T03:56:58.811Z

Learnt from: thweetkomputer
Repo: eloqdata/tx_service PR: 150
File: include/cc/local_cc_shards.h:626-631
Timestamp: 2025-10-09T03:56:58.811Z
Learning: For the LocalCcShards class in include/cc/local_cc_shards.h: Writer locks (unique_lock) should continue using the original meta_data_mux_ (std::shared_mutex) rather than fast_meta_data_mux_ (FastMetaDataMutex) at this stage. Only reader locks may use the FastMetaDataMutex wrapper.

Applied to files:

tx_service/src/cc/local_cc_shards.cpp

📚 Learning: 2025-12-02T10:43:27.431Z

Learnt from: lokax
Repo: eloqdata/tx_service PR: 254
File: tx_service/src/cc/local_cc_shards.cpp:2949-3188
Timestamp: 2025-12-02T10:43:27.431Z
Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.

Applied to files:

tx_service/src/cc/local_cc_shards.cpp

🧬 Code graph analysis (1)

tx_service/include/cc/cc_req_misc.h (1)

tx_service/include/range_record.h (1)

TxKey (256-262)

🔇 Additional comments (4)

tx_service/include/cc/template_cc_map.h (1)

11727-11735: Good fix: rebind absl::btree_map iterators after erase() in MergePages().
Given absl::btree_map iterator invalidation semantics, refinding merged_page after erasing discarded_page_it avoids use-after-invalidation issues.

store_handler/eloq_data_store_service/eloqstore (1)

1-1: Submodule update contains only test debugging changes, not the intended bug fixes mentioned in PR objectives.

The submodule pointer update from a94159ab298a... to 67c86296b89a3... includes only 2 commits:

b8ecc272: "skip manifest checksum for test"

67c8629: "do not append manifest" (disables manifest appending via false && condition in write_task.cpp)

These changes focus on manifest handling and appear to be debugging/test modifications, not the slice-splitting logic fixes or non-snapshot read bug fixes stated in the PR objectives. This mismatch between the submodule content and PR goals should be clarified—either the PR description needs updating or the submodule pointer should reference commits containing the intended fixes.

store_handler/data_store_service_client_closure.cpp (2)

1279-1293: LGTM — deferred record creation and snapshot timestamp extraction.

The changes correctly defer TxRecord creation and read the snapshot timestamp to gate deleted-record handling. This aligns with the PR objective of not returning deleted keys for non-snapshot reads.

1310-1333: LGTM — correctly skips visible deleted records and preserves batch-key ownership.

The logic properly distinguishes between:

Non-snapshot reads: always skip deleted records

Snapshot reads: skip deleted records only if they're visible (committed before snapshot), keeping those needing archive lookup

The batch-end key ownership is correctly preserved via kv_start_key_owner_.

coderabbitai · 2026-01-09T06:32:18Z

store_handler/data_store_service_client_closure.cpp

+        txservice::TxKey key =
+            catalog_factory->CreateTxKey(key_str.data(), key_str.size());
+        if (i == items_size - 1 && scan_next_closure->ItemsSize() == 1000)
        {
+            // Record the last key of the current batch as the start key of the
+            // next batch.
            fill_store_slice_req->kv_start_key_ =
                std::string_view(key.Data(), key.Size());
        }
-
        fill_store_slice_req->AddDataItem(
            std::move(key), std::move(record), ts, is_deleted);


⚠️ Potential issue | 🔴 Critical

Use-after-move: kv_start_key_ becomes a dangling reference after key is moved.

When this is the last item in a 1000-item batch, kv_start_key_ is set to point to key.Data(), but then key is moved into AddDataItem. The string_view will reference invalid memory when used in ScanNext (line 1362).

The deleted-record path (lines 1326-1330) correctly preserves ownership via kv_start_key_owner_. The same pattern should be applied here.

🔧 Proposed fix

txservice::TxKey key = catalog_factory->CreateTxKey(key_str.data(), key_str.size()); if (i == items_size - 1 && scan_next_closure->ItemsSize() == 1000) { // Record the last key of the current batch as the start key of the // next batch. fill_store_slice_req->kv_start_key_ = std::string_view(key.Data(), key.Size()); + fill_store_slice_req->kv_start_key_owner_ = key.Clone(); } fill_store_slice_req->AddDataItem( std::move(key), std::move(record), ts, is_deleted);

Note: This assumes TxKey has a Clone() method or copy constructor. If not, you may need to create a separate key for ownership:

if (i == items_size - 1 && scan_next_closure->ItemsSize() == 1000) { txservice::TxKey owner_key = catalog_factory->CreateTxKey(key_str.data(), key_str.size()); fill_store_slice_req->kv_start_key_ = std::string_view(owner_key.Data(), owner_key.Size()); fill_store_slice_req->kv_start_key_owner_ = std::move(owner_key); }

🤖 Prompt for AI Agents

In @store_handler/data_store_service_client_closure.cpp around lines 1342 - 1352, The code sets fill_store_slice_req->kv_start_key_ to a string_view of key.Data() and then moves key into AddDataItem, creating a use-after-move; fix by creating a separate owner TxKey (e.g., owner_key via catalog_factory->CreateTxKey) when i == items_size - 1 && scan_next_closure->ItemsSize() == 1000, assign fill_store_slice_req->kv_start_key_ = std::string_view(owner_key.Data(), owner_key.Size()) and store ownership in fill_store_slice_req->kv_start_key_owner_ (move owner_key), then proceed to call AddDataItem(std::move(key), ...); this mirrors the deleted-record path and prevents the dangling string_view.

Fix slice split

c84cb28

1. Only need to split slice when post ckpt size is greater than the current size of the slice and greater than the slice upper bound. 2. For non snapshot read, there is no need to return the deleted keys from the data store.

Fix load range slice bug

915b3a1

xiexiaoy force-pushed the fix_bugs branch from 4e87505 to 915b3a1 Compare January 9, 2026 06:28

coderabbitai bot reviewed Jan 9, 2026

View reviewed changes

yi-xmu approved these changes Jan 9, 2026

View reviewed changes

liunyl merged commit e187096 into main Jan 9, 2026
4 checks passed

liunyl deleted the fix_bugs branch January 9, 2026 08:07

yi-xmu mentioned this pull request Jan 9, 2026

Fix slice split #336

Closed

5 tasks

liunyl linked an issue Jan 12, 2026 that may be closed by this pull request

[Bug]: crashed at CcShard::UpdateLruList (tx_service/src/cc/cc_shard.cpp:830) on test eloqdoc index_update #340

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bugs#352

Fix bugs#352
liunyl merged 2 commits intomainfrom
fix_bugs

liunyl commented Jan 9, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 9, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

liunyl commented Jan 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

liunyl commented Jan 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 9, 2026 •

edited

Loading