Update datamigration for rangepartition by yi-xmu · Pull Request #462 · eloqdata/tx_service

yi-xmu · 2026-03-18T03:41:50Z

Maintain the range size in ccmap during bucket migration.

Remove the range size info from the ccmap from old bucket owner node group.
Specifically, after kickout range data for migration, the range size information for the corresponding range ID is deleted.
Update the forward logic for the keys written during the "double-write" phase.

Summary by CodeRabbit

New Features
- Read-only access to an entry's partition identifier exposed via public API.
Bug Fixes
- Improved tracking of keys during range splits and migrations.
- Migration-clean now removes obsolete range-size metadata and consistently cleans migrated partitions.
- Better handling of forward/target tracking for moved data to avoid stale state.
Chores
- Consolidated table-type resolution and streamlined related control flow and logging.

1. Adapt kickoutdata operation with new key sharding logic for range partition 2. Fix the execution flow: after processing the range partition table, the hash partition table needs to be checked. 3. Simplify code using a unified interface.

This reverts commit 866931f.

coderabbitai · 2026-03-18T03:42:05Z

Walkthrough

Adds a public accessor to expose KickoutCcEntryCc partition id, tracks keys on migrating/splitting ranges in write entries, removes range-size metadata during migration-clean paths in TemplateCcMap, and updates transaction/kickout flows to record forward targets and unify table-type resolution.

Changes

Cohort / File(s)	Summary
CC Request Interface `tx_service/include/cc/cc_request.h`	Added `int32_t GetPartitionId() const` to `KickoutCcEntryCc` to expose the internal `range_id_`.
CC Map / Range-size Cleanup `tx_service/include/cc/template_cc_map.h`	Added `void RemoveRangeSize(uint32_t partition_id)` and wired calls to remove range-size metadata from kickout/migration-clean paths.
Write-set Entry Tracking `tx_service/include/read_write_entry.h`	Added `bool on_dirty_range_` to `WriteSetEntry` to mark keys living in splitting/migrating ranges (default `false`).
Local CC Handler `tx_service/src/cc/local_cc_handler.cpp`	Treat `CleanRangeDataForMigration` the same as `CleanRangeData` for core assignment and local routing of kickout requests.
Transaction Operation Logic `tx_service/src/tx_operation.cpp`	Record range dirty state into `write_entry.on_dirty_range_`, capture forward-address range id via a new pointer when creating forward entries, and unify `TableName::Type` derivation across forward/kickout/migration flows with related control-flow adjustments.

Sequence Diagram(s)

mermaid
sequenceDiagram
autonumber
participant TxOp as TxOperation
participant LocalCC as LocalCcHandler
participant CCMap as TemplateCcMap
participant Storage as Storage/Shards

TxOp->>LocalCC: request kickout/forward (includes forward_addr, partition id)
LocalCC->>LocalCC: map CleanType (treat migration as range clean)
LocalCC->>CCMap: complete KickoutCcEntryCc (calls GetPartitionId)
Note right of CCMap: if migration-clean -> RemoveRangeSize(partition_id)
CCMap-->>LocalCC: completion ack
LocalCC->>Storage: finalize deletion/migration of data
TxOp->>TxOp: set write_entry.on_dirty_range_ / store forward-entry range id

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

[Feature]: Update data migration with new key sharding logic for range partition. #457: Matches removal of range-size metadata during migration kickout and new RemoveRangeSize callsite.

Possibly related PRs

Range sizes definition #438: Introduced range_sizes_ APIs that RemoveRangeSize manipulates.
Shard keys on one core for range partition #454: Related range-partitioning changes that consume KickoutCcEntryCc partition info.
Move key-count and dirty-key tracking from TemplateCcMap to CcShard #382: Related TemplateCcMap changes touching dirty/size hooks and shard stats.

Suggested labels

bug

Suggested reviewers

liunyl
lokax

Poem

🐇 I hopped through partitions, light and spry,
I nudged a range and watched it fly.
Sizes cleared and forwards set,
Dirty flags mark paths well-met,
Hooray — migrations tidy, bye-bye! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title broadly references the main change (updating data migration for range partitioning), but lacks specificity about the key improvements like range size cleanup.
Description check	✅ Passed	The description clearly explains the two main objectives (range size cleanup and forward logic updates) and provides concrete implementation details, but does not complete the provided template checklist items.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch update_datamigration_for_rangepartition

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tx_service/include/cc/template_cc_map.h`:
- Around line 6986-6993: The code casts req.GetPartitionId() (int32_t) to
uint32_t and calls RemoveRangeSize when CleanType::CleanRangeDataForMigration,
but only asserts partition_id != INT32_MAX which is disabled in release builds;
add a runtime bounds check to ensure partition_id is non-negative (and not
INT32_MAX if needed) before the static_cast and RemoveRangeSize call to prevent
negative values wrapping to huge uint32_t values. Locate the block handling
CleanType::CleanRangeDataForMigration around req.GetPartitionId(), validate
partition_id >= 0 (and still guard against INT32_MAX if required), and only call
RemoveRangeSize(static_cast<uint32_t>(partition_id)) when the check passes; if
the check fails, log or handle the error consistently with existing patterns in
local_cc_shards.h and cc_request.h.

In `@tx_service/src/tx_operation.cpp`:
- Around line 7993-8013: The code can fall through from the range-table
exhaustion path into the hash-partitioned kickout code with
kickout_hash_partitioned_tbl_it_ still at begin() or equal to
hash_partitioned_tables_snapshot_.cend(), which skips the first hash table or
can advance cend() when the snapshot is empty; before entering the hash kickout
block (the code that checks kickout_hash_partitioned_tbl_it_ and then sets
post_all_bucket_lock_op_ or populates kickout_data_op_), add the same
dropped-table handoff handling used for the range loop: if the range-loop
exhausted tables, update/advance kickout_hash_partitioned_tbl_it_ safely
(checking hash_partitioned_tables_snapshot_.empty()), and if it equals cend()
set post_all_bucket_lock_op_.write_type_ = PostWriteType::PostCommit and call
ForwardToSubOperation(txm, &post_all_bucket_lock_op_) then return; otherwise set
kickout_data_op_.node_group_, table_name_, start_key_, end_key_, bucket_ids_,
clean_type_ and call ForwardToSubOperation(txm, &kickout_data_op_) as currently
done.
- Around line 7993-8013: The error path currently assumes kickout_range_tbl_it_
is valid and logs kickout_range_tbl_it_->first, but when handling
hash-partitioned tables the code sets kickout_data_op_.clean_type_ =
CleanType::CleanBucketData and kickout_range_tbl_it_ is equal to
ranges_in_bucket_snapshot_.cend(), causing undefined behavior on retry; update
the error handling in the retry/error branch to branch on
kickout_data_op_.clean_type_ (or otherwise check kickout_range_tbl_it_ !=
ranges_in_bucket_snapshot_.cend()) and, for CleanType::CleanBucketData use
kickout_data_op_.table_name_ for logging and diagnostics (or log both where
available) instead of dereferencing kickout_range_tbl_it_->first so hash-table
retries no longer dereference cend.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2c22b6fd-c987-4eb1-aa7d-08c8e7b60dbc

📥 Commits

Reviewing files that changed from the base of the PR and between 1a4729b and 94a5786.

📒 Files selected for processing (5)

tx_service/include/cc/cc_request.h
tx_service/include/cc/template_cc_map.h
tx_service/include/read_write_entry.h
tx_service/src/cc/local_cc_handler.cpp
tx_service/src/tx_operation.cpp

coderabbitai · 2026-03-18T03:57:23Z

tx_service/include/cc/template_cc_map.h

+            if (req.GetCleanType() == CleanType::CleanRangeDataForMigration)
+            {
+                // For data migration, we need to delete the range size info
+                // for the range that has been migrated to other node group.
+                int32_t partition_id = req.GetPartitionId();
+                assert(partition_id != INT32_MAX);
+                RemoveRangeSize(static_cast<uint32_t>(partition_id));
+            }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect KickoutCcEntryCc partition-id type/default/sentinels. rg -n -C4 '\bclass\s+KickoutCcEntryCc\b|\brange_id_\b|GetPartitionId\s*\(|SetPartitionId\s*\(' tx_service/include/cc/cc_request.h # Inspect where migration-clean requests set/read the partition id. rg -n -C3 'CleanRangeDataForMigration|SetPartitionId\s*\(|GetPartitionId\s*\(' tx_service -g '!**/build/**' # Confirm all RemoveRangeSize callsites and casts. rg -n -C3 'RemoveRangeSize\s*\(' tx_service/include/cc/template_cc_map.h tx_service -g '!**/build/**'

Repository: eloqdata/tx_service

Length of output: 14477

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find where range_id is set for CleanRangeDataForMigration requests rg -n -B5 -A5 'clean_type.*CleanRangeDataForMigration|CleanRangeDataForMigration.*range_id' tx_service/src -g '!**/build/**' # Check the Reset method that sets range_id for KickoutCcEntryCc rg -n -B3 -A10 'void Reset.*range_id|range_id_\s*=\s*range_id' tx_service/include/cc/cc_request.h | head -60 # Check RemoveRangeSize implementation to confirm it uses uint32_t as key rg -n -A15 'void RemoveRangeSize\(uint32_t' tx_service/include/cc/template_cc_map.h # Check if there are other casts from int32_t to uint32_t in similar cleanup paths rg -n 'static_cast<uint32_t>' tx_service/include/cc/template_cc_map.h | grep -i 'partition\|range'

Repository: eloqdata/tx_service

Length of output: 5027

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find where CleanRangeDataForMigration requests are created (from tx_operation.cpp hint) rg -n -B10 -A10 'CleanRangeDataForMigration.*=' tx_service/src/tx_operation.cpp | head -80 # Look for where range_id is populated for migration requests rg -n -B5 'range_id.*=' tx_service/src/tx_operation.cpp | grep -A5 -B5 'kickout\|migration' # Check function signature for CreateKickoutCcEntry / similar to see range_id source rg -n 'CreateKickoutCcEntry|PrepareCcRequest' tx_service/src -g '!**/build/**' | head -20

Repository: eloqdata/tx_service

Length of output: 45

🏁 Script executed:

#!/bin/bash set -euo pipefail # Look at tx_operation.cpp around the migration cleanup code sed -n '7880,7950p' tx_service/src/tx_operation.cpp # Search for where range_id is passed to Reset() in migration context rg -n 'Reset.*range_id|range_id.*Reset' tx_service/src/tx_operation.cpp # Look for migration-related functions that might create requests rg -n 'MigrateRange|migration.*range|range.*migration' tx_service/src/tx_operation.cpp | head -20

Repository: eloqdata/tx_service

Length of output: 3167

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find declaration of ranges_in_bucket_snapshot_ rg -n 'ranges_in_bucket_snapshot_' tx_service/src/tx_operation.cpp | head -10 # Look at the type and how it's populated rg -n -B3 -A3 'ranges_in_bucket_snapshot_.*=' tx_service/src/tx_operation.cpp | head -40 # Check GetTableRangeKeys to understand what range IDs are returned rg -n 'GetTableRangeKeys' tx_service/include -g '!**/build/**' | head -5

Repository: eloqdata/tx_service

Length of output: 705

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find the declaration of ranges_in_bucket_snapshot_ in class definition rg -n 'ranges_in_bucket_snapshot_' tx_service/src/tx_operation.h tx_service/include -g '!**/build/**' # Look at the try_emplace context to see what's being inserted sed -n '7605,7620p' tx_service/src/tx_operation.cpp # Look at what populates the container values sed -n '7620,7640p' tx_service/src/tx_operation.cpp

Repository: eloqdata/tx_service

Length of output: 224

🏁 Script executed:

#!/bin/bash set -euo pipefail # Get declaration context around line 1504 sed -n '1495,1515p' tx_service/include/tx_operation.h # Look at where ranges_in_bucket_snapshot_ is populated with range IDs sed -n '7605,7635p' tx_service/src/tx_operation.cpp

Repository: eloqdata/tx_service

Length of output: 2167

🏁 Script executed:

#!/bin/bash set -euo pipefail # Check if there are existing checks for negative partition/range IDs in similar code paths rg -n 'partition.*>= 0|range_id.*>= 0|if.*>=.*0.*partition|if.*>=.*0.*range' tx_service/include/cc -g '!**/build/**' # Check all places where partition_id is cast to uint32_t to see if any validate rg -n -B2 'static_cast<uint32_t>\(.*partition' tx_service/include/cc/template_cc_map.h # Look at UpdateRangeSize calls to see if they have bounds checking rg -n -B5 'UpdateRangeSize\(static_cast<uint32_t>' tx_service/include/cc/template_cc_map.h | head -40

Repository: eloqdata/tx_service

Length of output: 2146

Add runtime bounds check before casting partition_id to uint32_t.

At lines 6990–6992, the code directly casts GetPartitionId() (int32_t) to uint32_t without checking for negative values. The assertion at line 6991 only guards against INT32_MAX and is disabled in release builds. If a negative value reaches this path (through upstream corruption or a bug), it will wrap to a large positive value when cast, erasing the wrong entry from range_sizes_ and leaving stale range-size metadata.

This pattern already exists defensively in the codebase (e.g., local_cc_shards.h:1192, cc_request.h:7836). Add a runtime check consistent with established practice:

Suggested fix

if (req.GetCleanType() == CleanType::CleanRangeDataForMigration) { // For data migration, we need to delete the range size info // for the range that has been migrated to other node group. int32_t partition_id = req.GetPartitionId(); - assert(partition_id != INT32_MAX); - RemoveRangeSize(static_cast<uint32_t>(partition_id)); + if (partition_id >= 0 && partition_id != INT32_MAX) + { + RemoveRangeSize(static_cast<uint32_t>(partition_id)); + } + else + { + LOG(ERROR) << "Invalid partition_id in migration cleanup: " + << partition_id; + } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (req.GetCleanType() == CleanType::CleanRangeDataForMigration)

{

// For data migration, we need to delete the range size info

// for the range that has been migrated to other node group.

int32_t partition_id = req.GetPartitionId();

assert(partition_id != INT32_MAX);

RemoveRangeSize(static_cast<uint32_t>(partition_id));

}

if (req.GetCleanType() == CleanType::CleanRangeDataForMigration)

{

// For data migration, we need to delete the range size info

// for the range that has been migrated to other node group.

int32_t partition_id = req.GetPartitionId();

if (partition_id >= 0 && partition_id != INT32_MAX)

{

RemoveRangeSize(static_cast<uint32_t>(partition_id));

}

else

{

LOG(ERROR) << "Invalid partition_id in migration cleanup: "

<< partition_id;

}

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tx_service/include/cc/template_cc_map.h` around lines 6986 - 6993, The code casts req.GetPartitionId() (int32_t) to uint32_t and calls RemoveRangeSize when CleanType::CleanRangeDataForMigration, but only asserts partition_id != INT32_MAX which is disabled in release builds; add a runtime bounds check to ensure partition_id is non-negative (and not INT32_MAX if needed) before the static_cast and RemoveRangeSize call to prevent negative values wrapping to huge uint32_t values. Locate the block handling CleanType::CleanRangeDataForMigration around req.GetPartitionId(), validate partition_id >= 0 (and still guard against INT32_MAX if required), and only call RemoveRangeSize(static_cast<uint32_t>(partition_id)) when the check passes; if the check fails, log or handle the error consistently with existing patterns in local_cc_shards.h and cc_request.h.

The caller will guarantee that this value is non-negative.

@yi-xmu Thanks for the clarification. If the caller guarantees partition_id is non-negative, I’m good to drop the runtime check.

Optional tiny follow-up for maintainability: tighten the existing debug assertion to reflect the invariant and leave a brief comment. For example:

// Caller guarantees non-negative partition_id for migration cleanups. assert(partition_id >= 0 && partition_id != INT32_MAX); RemoveRangeSize(static_cast<uint32_t>(partition_id));

If you prefer, I can prep a small patch to do just this (assert + comment) and nothing else.

✏️ Learnings added

Learnt from: yi-xmu Repo: eloqdata/tx_service PR: 462 File: tx_service/include/cc/template_cc_map.h:6986-6993 Timestamp: 2026-03-18T04:01:54.203Z Learning: In eloqdata/tx_service, during CleanRangeDataForMigration handling in TemplateCcMap::Execute(KickoutCcEntryCc &req) (tx_service/include/cc/template_cc_map.h), the caller guarantees KickoutCcEntryCc::GetPartitionId() returns a non-negative partition_id; negative IDs will not reach RemoveRangeSize().

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: yi-xmu Repo: eloqdata/tx_service PR: 438 File: tx_service/include/cc/template_cc_map.h:11969-11981 Timestamp: 2026-03-12T05:24:16.445Z Learning: In eloqdata/tx_service, TemplateCcMap::UpdateRangeSize(uint32_t,int32_t,bool) in tx_service/include/cc/template_cc_map.h must return true only once when the loaded, non-dirty range size first crosses StoreRange::range_max_size; it must latch the internal need_split flag as true until ResetRangeStatus(partition_id) clears it, and must not overwrite the latched flag back to false on later calls.

Learnt from: liunyl Repo: eloqdata/tx_service PR: 149 File: include/cc/cc_request.h:1876-1927 Timestamp: 2025-10-20T04:30:07.884Z Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.

Learnt from: yi-xmu Repo: eloqdata/tx_service PR: 454 File: tx_service/include/cc/object_cc_map.h:1574-1575 Timestamp: 2026-03-16T02:32:29.159Z Learning: In `tx_service/include/cc/object_cc_map.h`, the `flags_str` field unpacked from entry tuples in `Execute(UploadBatchCc &req)` is intentionally unused. `ObjectCcMap` is non-range-partitioned (`TemplateCcMap<KeyT, ValueT, false, false>`), so `range_size_flags` carried in `flags_str` have no meaning in this context. The pause/resume logic does not need to track a flags offset for this class.

Learnt from: lzxddz Repo: eloqdata/tx_service PR: 199 File: include/cc/local_cc_shards.h:233-234 Timestamp: 2025-11-11T07:10:40.346Z Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.

Learnt from: lokax Repo: eloqdata/tx_service PR: 254 File: tx_service/src/cc/local_cc_shards.cpp:2949-3188 Timestamp: 2025-12-02T10:43:27.431Z Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.

Learnt from: lokax Repo: eloqdata/tx_service PR: 149 File: src/remote/cc_stream_receiver.cpp:1066-1075 Timestamp: 2025-10-21T06:46:53.700Z Learning: In src/remote/cc_stream_receiver.cpp, for ScanNextRequest handling, BucketIds() on RemoteScanNextBatch should never be empty—this is an expected invariant of the scan protocol.

Learnt from: yi-xmu Repo: eloqdata/tx_service PR: 454 File: tx_service/include/data_sync_task.h:185-189 Timestamp: 2026-03-16T02:36:20.407Z Learning: In eloqdata/tx_service, within a `DataSyncTask` (tx_service/include/data_sync_task.h), the member `id_` represents the partition ID of the current partition for range-partitioned tables (and core_idx for hash-partitioned tables). It is the same identifier used as the key in `range_sizes_` by `UpdateRangeSize()` and `ResetRangeStatus()`. Do not confuse `id_` with a separate "range_id" — in the range partition context, `id_` IS the partition_id.

tx_service/src/tx_operation.cpp

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

tx_service/src/tx_operation.cpp (1)

7975-7981: ⚠️ Potential issue | 🔴 Critical

Guard hash-path retry logging from dereferencing ranges_in_bucket_snapshot_.cend().

At Line 7979, error logging always dereferences kickout_range_tbl_it_->first. On hash-table kickout retries, kickout_range_tbl_it_ can be ranges_in_bucket_snapshot_.cend(), causing undefined behavior before retry.

Suggested fix

         if (kickout_data_op_.hd_result_.IsError())
         {
+            std::string_view failed_table_name_sv = "<unknown>";
+            if (kickout_data_op_.table_name_ != nullptr)
+            {
+                failed_table_name_sv = kickout_data_op_.table_name_->StringView();
+            }
             LOG(ERROR) << "Data migration: fail to kickout range data"
                        << ", table name "
-                       << kickout_range_tbl_it_->first.StringView()
+                       << failed_table_name_sv
                        << ", tx_number:" << txm->TxNumber()
                        << ", keep retrying";
             RetrySubOperation(txm, &kickout_data_op_);
             return;
         }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tx_service/src/tx_operation.cpp` around lines 7975 - 7981, The error log
unconditionally dereferences kickout_range_tbl_it_->first which may be equal to
ranges_in_bucket_snapshot_.cend() during hash-table kickout retries; update the
block that checks kickout_data_op_.hd_result_.IsError() to first test whether
kickout_range_tbl_it_ != ranges_in_bucket_snapshot_.cend() and only append
kickout_range_tbl_it_->first.StringView() when valid, otherwise log a safe
placeholder (e.g. "<no-range-iterator>") or omit the table name; keep other
context like txm->TxNumber() and the retry message intact so no dereference
occurs when the iterator is cend().

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tx_service/include/cc/template_cc_map.h`:
- Around line 6986-6993: The assertion in the CleanRangeDataForMigration branch
is too strict and disallows a valid partition_id == 0; update the check where
you read partition_id from req.GetPartitionId() (inside the
CleanType::CleanRangeDataForMigration block) to allow zero by asserting
partition_id >= 0 && partition_id != INT32_MAX, then continue to call
RemoveRangeSize(static_cast<uint32_t>(partition_id)) as before; ensure you only
change the assert condition (not the cast or RemoveRangeSize call) so negative
values and INT32_MAX remain guarded.

In `@tx_service/src/tx_operation.cpp`:
- Around line 735-746: The code captures a raw pointer new_bucket_range_id_ptr
to an element in write_entry.forward_addr_ after a try_emplace, but subsequent
try_emplace calls (same map) can rehash and invalidate that pointer; change the
logic to store the map key ((new_bucket_ng << 10) | residual) instead of a raw
pointer and, wherever new_bucket_range_id_ptr is later dereferenced (uses around
the later try_emplace sites and at lines noted), perform a fresh lookup in
write_entry.forward_addr_ (e.g., find(key) or operator[] if creating) to obtain
the element (or its .first) before reading/writing it; update all uses that
currently rely on new_bucket_range_id_ptr (and similar pointer usages) to do a
key-based re-lookup to ensure safety across rehashes.

---

Duplicate comments:
In `@tx_service/src/tx_operation.cpp`:
- Around line 7975-7981: The error log unconditionally dereferences
kickout_range_tbl_it_->first which may be equal to
ranges_in_bucket_snapshot_.cend() during hash-table kickout retries; update the
block that checks kickout_data_op_.hd_result_.IsError() to first test whether
kickout_range_tbl_it_ != ranges_in_bucket_snapshot_.cend() and only append
kickout_range_tbl_it_->first.StringView() when valid, otherwise log a safe
placeholder (e.g. "<no-range-iterator>") or omit the table name; keep other
context like txm->TxNumber() and the retry message intact so no dereference
occurs when the iterator is cend().

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fd7c2774-e355-49bf-8f7c-f2e99e0b2bb3

📥 Commits

Reviewing files that changed from the base of the PR and between 94a5786 and ab86fd5.

📒 Files selected for processing (2)

tx_service/include/cc/template_cc_map.h
tx_service/src/tx_operation.cpp

coderabbitai · 2026-03-18T04:21:50Z

tx_service/include/cc/template_cc_map.h

+            if (req.GetCleanType() == CleanType::CleanRangeDataForMigration)
+            {
+                // For data migration, we need to delete the range size info
+                // for the range that has been migrated to other node group.
+                int32_t partition_id = req.GetPartitionId();
+                assert(partition_id > 0 && partition_id != INT32_MAX);
+                RemoveRangeSize(static_cast<uint32_t>(partition_id));
+            }


⚠️ Potential issue | 🟡 Minor

Allow partition 0 in the migration cleanup assertion.

At Line 6991, assert(partition_id > 0 && partition_id != INT32_MAX); is stricter than the established invariant and can trip debug builds for a valid partition_id == 0.

Suggested fix

- assert(partition_id > 0 && partition_id != INT32_MAX); + assert(partition_id >= 0 && partition_id != INT32_MAX); RemoveRangeSize(static_cast<uint32_t>(partition_id));

Based on learnings: during CleanRangeDataForMigration, the caller guarantees KickoutCcEntryCc::GetPartitionId() is non-negative.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tx_service/include/cc/template_cc_map.h` around lines 6986 - 6993, The assertion in the CleanRangeDataForMigration branch is too strict and disallows a valid partition_id == 0; update the check where you read partition_id from req.GetPartitionId() (inside the CleanType::CleanRangeDataForMigration block) to allow zero by asserting partition_id >= 0 && partition_id != INT32_MAX, then continue to call RemoveRangeSize(static_cast<uint32_t>(partition_id)) as before; ensure you only change the assert condition (not the cast or RemoveRangeSize call) so negative values and INT32_MAX remain guarded.

@yi-xmu Got it—leaving the assert as-is. If the invariant ever changes and a debug hit shows partition_id can be 0 during migration cleanup, we can relax it to >= 0 at that time.

🧠 Learnings used

Learnt from: yi-xmu Repo: eloqdata/tx_service PR: 462 File: tx_service/include/cc/template_cc_map.h:6986-6993 Timestamp: 2026-03-18T04:01:54.203Z Learning: In eloqdata/tx_service, during CleanRangeDataForMigration handling in TemplateCcMap::Execute(KickoutCcEntryCc &req) (tx_service/include/cc/template_cc_map.h), the caller guarantees KickoutCcEntryCc::GetPartitionId() returns a non-negative partition_id; negative IDs will not reach RemoveRangeSize().

Learnt from: yi-xmu Repo: eloqdata/tx_service PR: 438 File: tx_service/include/cc/template_cc_map.h:11969-11981 Timestamp: 2026-03-12T05:24:16.445Z Learning: In eloqdata/tx_service, TemplateCcMap::UpdateRangeSize(uint32_t,int32_t,bool) in tx_service/include/cc/template_cc_map.h must return true only once when the loaded, non-dirty range size first crosses StoreRange::range_max_size; it must latch the internal need_split flag as true until ResetRangeStatus(partition_id) clears it, and must not overwrite the latched flag back to false on later calls.

Learnt from: liunyl Repo: eloqdata/tx_service PR: 149 File: include/cc/cc_request.h:1876-1927 Timestamp: 2025-10-20T04:30:07.884Z Learning: ScanNextBatchCc in include/cc/cc_request.h is used only for hash-partition scans; range-partition scans are handled by ScanSliceCc.

Learnt from: yi-xmu Repo: eloqdata/tx_service PR: 454 File: tx_service/include/cc/object_cc_map.h:1574-1575 Timestamp: 2026-03-16T02:32:29.159Z Learning: In `tx_service/include/cc/object_cc_map.h`, the `flags_str` field unpacked from entry tuples in `Execute(UploadBatchCc &req)` is intentionally unused. `ObjectCcMap` is non-range-partitioned (`TemplateCcMap<KeyT, ValueT, false, false>`), so `range_size_flags` carried in `flags_str` have no meaning in this context. The pause/resume logic does not need to track a flags offset for this class.

Learnt from: lzxddz Repo: eloqdata/tx_service PR: 199 File: include/cc/local_cc_shards.h:233-234 Timestamp: 2025-11-11T07:10:40.346Z Learning: In the LocalCcShards class in include/cc/local_cc_shards.h, the EnqueueCcRequest methods use `shard_code & 0x3FF` followed by `% cc_shards_.size()` to distribute work across processor cores for load balancing. This is intentional and separate from partition ID calculation. The 0x3FF mask creates a consistent distribution range (0-1023) before modulo by actual core count.

Learnt from: lokax Repo: eloqdata/tx_service PR: 149 File: src/remote/cc_stream_receiver.cpp:1066-1075 Timestamp: 2025-10-21T06:46:53.700Z Learning: In src/remote/cc_stream_receiver.cpp, for ScanNextRequest handling, BucketIds() on RemoteScanNextBatch should never be empty—this is an expected invariant of the scan protocol.

Learnt from: lokax Repo: eloqdata/tx_service PR: 254 File: tx_service/src/cc/local_cc_shards.cpp:2949-3188 Timestamp: 2025-12-02T10:43:27.431Z Learning: In tx_service/src/cc/local_cc_shards.cpp, whenever TryPinNodeGroupData is used, only call Sharder::Instance().UnpinNodeGroupData(node_group) if the recorded term is >= 0 (i.e., pin succeeded). Example: LocalCcShards::PostProcessFlushTaskEntries guards the unpin with `if (term >= 0)`.

tx_service/src/tx_operation.cpp

update comment

coderabbitai

🧹 Nitpick comments (1)

tx_service/src/tx_operation.cpp (1)
7894-7899: Extract the range-table reconstruction into one helper.

The TableName::Type(...) + kickout_range_table_ rebuild now lives in four branches. Pulling that into a helper would keep the retry/drop paths aligned and make future table-type changes one edit instead of four.

Also applies to: 7927-7934, 8022-8029, 8077-8084
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/src/tx_operation.cpp` around lines 7894 - 7899, Several branches
duplicate the TableName::Type(...) + kickout_range_table_ reconstruction (used
where kickout_range_table_ is set from kickout_range_tbl_it_->first); extract
that logic into a single helper (e.g., BuildTableNameFromSlice or
ResolveTableName) that accepts the key/slice or iterator key (the value passed
to TableName::Type and StringView()) and returns a TableName constructed with
the resolved Type and Engine; replace the four duplicated blocks (the current
uses of TableName::Type(...) and kickout_range_table_ assignment plus the
similar code at the other noted locations) with calls to this helper so
retry/drop paths share the same construction logic.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tx_service/src/tx_operation.cpp`:
- Around line 7894-7899: Several branches duplicate the TableName::Type(...) +
kickout_range_table_ reconstruction (used where kickout_range_table_ is set from
kickout_range_tbl_it_->first); extract that logic into a single helper (e.g.,
BuildTableNameFromSlice or ResolveTableName) that accepts the key/slice or
iterator key (the value passed to TableName::Type and StringView()) and returns
a TableName constructed with the resolved Type and Engine; replace the four
duplicated blocks (the current uses of TableName::Type(...) and
kickout_range_table_ assignment plus the similar code at the other noted
locations) with calls to this helper so retry/drop paths share the same
construction logic.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 31bcdb08-830c-47ab-9013-f94e9474e773

📥 Commits

Reviewing files that changed from the base of the PR and between ab86fd5 and 48e5817.

📒 Files selected for processing (2)

tx_service/include/cc/template_cc_map.h
tx_service/src/tx_operation.cpp

🚧 Files skipped from review as they are similar to previous changes (1)

tx_service/include/cc/template_cc_map.h

yi-xmu added 5 commits March 18, 2026 11:15

Update data migration opration forward

6b5c40e

1. Adapt kickoutdata operation with new key sharding logic for range partition 2. Fix the execution flow: after processing the range partition table, the hash partition table needs to be checked. 3. Simplify code using a unified interface.

Remove range size info during kickoutdata for data migration

769953f

Load range size after the data migration finished.

d686176

Revert "Load range size after the data migration finished."

92f2a35

This reverts commit 866931f.

update forward logic

94a5786

yi-xmu self-assigned this Mar 18, 2026

yi-xmu requested a review from liunyl March 18, 2026 03:42

yi-xmu linked an issue Mar 18, 2026 that may be closed by this pull request

[Feature]: Update data migration with new key sharding logic for range partition. #457

Open

4 tasks

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

update comment

48e5817

update comment

yi-xmu force-pushed the update_datamigration_for_rangepartition branch from ab86fd5 to 48e5817 Compare March 18, 2026 04:45

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update datamigration for rangepartition#462

Update datamigration for rangepartition#462
yi-xmu wants to merge 6 commits intomainfrom
update_datamigration_for_rangepartition

yi-xmu commented Mar 18, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 18, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 18, 2026 •

edited

Loading

Uh oh!

yi-xmu Mar 18, 2026

Uh oh!

coderabbitai bot Mar 18, 2026

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 18, 2026 •

edited

Loading

Uh oh!

yi-xmu Mar 18, 2026

Uh oh!

coderabbitai bot Mar 18, 2026

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yi-xmu commented Mar 18, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yi-xmu Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yi-xmu Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yi-xmu commented Mar 18, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 18, 2026 •

edited

Loading

coderabbitai bot Mar 18, 2026 •

edited

Loading

coderabbitai bot Mar 18, 2026 •

edited

Loading