Conversation
WalkthroughImplements shard-aware routing and per-shard DataShard lifecycle: threads shard_id through client APIs, closures, RPCs, protos, scanners, and internal requests; adds bucket_infos/InitBucketsInfo and GetShardIdByPartitionId; converts partition_id to int32; replaces single global datastore with per-shard DataShard array; downgrades several assert/fatal paths to warnings/no-ops. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant DSC as DataStoreServiceClient
participant CM as ClusterManager
participant Node as RemoteNode
participant DSS as DataStoreService
participant Shard as DataShard
Client->>DSC: Read(table, partition_id, key)
DSC->>DSC: shard_id = GetShardIdByPartitionId(partition_id)
DSC->>CM: resolve owner for shard_id
alt owner is local
DSC->>DSS: Read(table, partition_id, shard_id, key)
DSS->>Shard: data_store_->Read(..., shard_id, ...)
Shard-->>DSS: result
DSS-->>DSC: result
else owner is remote
DSC->>Node: RPC Read(..., shard_id, ...)
Node-->>DSC: response
end
DSC-->>Client: return result
sequenceDiagram
participant Main
participant DSS as DataStoreService
participant Shard as DataShard[*]
Main->>DSS: StartService(create_db_if_missing)
DSS->>DSS: for each shard_id in GetAllDataShards()
DSS->>Shard: allocate/open DataShard(shard_id)
Shard-->>DSS: shard_status -> Open/ReadWrite
DSS-->>Main: Start complete
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
data_store_service_client.cpp (4)
316-318: Don’t crash in ScheduleTimerTasks stubAsserting false here will kill the process if the stub is ever exercised. Prefer a no-op or TODO log.
- LOG(ERROR) << "ScheduleTimerTasks not implemented"; - assert(false); + LOG(WARNING) << "ScheduleTimerTasks not implemented (noop)";
2624-2624: Remove assert(false) in FetchArchives or guard behind feature flagThis function asserts and then proceeds with fully-formed logic. The assert will crash in debug builds.
- assert(false); + // TODO: validate behavior and remove when complete; leaving enabled.
2734-2734: Remove assert(false) in FetchVisibleArchiveSame pattern; the code below is functional. Don’t crash.
- assert(false); + // TODO: keep functional path; no-op assert removed.
3070-3117: CloseDataStore uses hard-coded 0; should close the ng_id passedLines 3081-3082 use CloseDataStore(0); this breaks multi-shard. Use CloseDataStore(ng_id).
- // Now, only support one shard. - data_store_service_->CloseDataStore(0); + data_store_service_->CloseDataStore(ng_id);eloq_data_store_service/internal_request.h (1)
754-763: Pass shard_id intoPrepareShardingError.
PrepareShardingErrornow expects a shard identifier, but this code is still forwarding the partition id. On a multi-shard node the partition id can easily exceed the number of local shards, so this ends up indexingdata_shards_with the wrong value (or throwing) and returns sharding metadata for the wrong shard. Please pass the closure’sshard_idinstead, and mirror the same fix in the otherPrepareShardingErrorcall sites in this file (ReadLocalRequest::SetFinish,ScanRpcRequest::SetFinish, andScanLocalRequest::SetFinish).- ds_service_->PrepareShardingError(req_->partition_id(), + ds_service_->PrepareShardingError(req_->shard_id(), resp_->mutable_result());
🧹 Nitpick comments (5)
data_store_service_client.cpp (1)
3178-3234: InitBucketsInfo: avoid srand/rand and global RNG side effects
- srand/rand mutate global state and aren’t thread-safe.
- Prefer per-call mt19937_64 seeded by ng, use uniform_int_distribution<uint16_t>. Also avoid modulo bias.
- for (auto ng : node_groups) - { - srand(ng); - size_t generated = 0; - while (generated < 64) - { - uint16_t rand_num = rand() % txservice::total_range_buckets; + for (auto ng : node_groups) { + std::mt19937_64 rng(static_cast<uint64_t>(ng)); + std::uniform_int_distribution<uint16_t> dist(0, txservice::total_range_buckets - 1); + size_t generated = 0; + while (generated < 64) { + uint16_t rand_num = dist(rng); if (rand_num_to_ng.find(rand_num) == rand_num_to_ng.end()) { generated++; rand_num_to_ng.emplace(rand_num, ng); }eloq_data_store_service/rocksdb_data_store.cpp (1)
61-67: Destructor guard relies on pointer nullingGiven the change above, destructor won’t re-enter Shutdown. Without nulling, the guard stays true and Shutdown will run again.
No code change beyond the previous diff is needed.
eloq_data_store_service/rocksdb_cloud_data_store.cpp (1)
155-163: Avoid const_cast; mutate member directlycloud_config_ is a non-const member. Use it directly and simplify path normalization; ensure idempotence if constructed multiple times.
- auto *cloud_config_ptr = - const_cast<EloqDS::RocksDBCloudConfig *>(&cloud_config_); - assert(!cloud_config_ptr->object_path_.empty()); - if (cloud_config_ptr->object_path_.back() != '/') + assert(!cloud_config_.object_path_.empty()); + if (cloud_config_.object_path_.back() != '/') { - cloud_config_ptr->object_path_.append("/"); + cloud_config_.object_path_.append("/"); } - cloud_config_ptr->object_path_.append("ds_" + std::to_string(shard_id)); + cloud_config_.object_path_.append("ds_" + std::to_string(shard_id));eloq_data_store_service/data_store_service.h (2)
714-714: Replace magic number with named constant.The shard count
1000is hardcoded. This should be a named constant for maintainability and clarity.Consider defining a constant:
+ static constexpr uint32_t MAX_SHARD_COUNT = 1000; + - std::array<DataShard, 1000> data_shards_; + std::array<DataShard, MAX_SHARD_COUNT> data_shards_;This makes the architecture more flexible for future changes and clarifies the design intent.
683-690: Remove commented-out code before merging.The old single-shard member declarations (lines 683-689) are commented out. These should be removed entirely rather than left as comments, as they add noise and can cause confusion.
- // uint32_t shard_id_{UINT32_MAX}; - // std::unique_ptr<DataStore> data_store_{nullptr}; - // std::atomic<DSShardStatus> shard_status_{DSShardStatus::Closed}; - // std::atomic<uint64_t> ongoing_write_requests_{0}; - - // // scan iterator cache - // TTLWrapperCache scan_iter_cache_; -
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
data_store_service_client.cpp(90 hunks)data_store_service_client.h(14 hunks)data_store_service_client_closure.cpp(17 hunks)data_store_service_client_closure.h(34 hunks)data_store_service_scanner.cpp(3 hunks)data_store_service_scanner.h(4 hunks)eloq_data_store_service/CMakeLists.txt(0 hunks)eloq_data_store_service/data_store_service.cpp(44 hunks)eloq_data_store_service/data_store_service.h(13 hunks)eloq_data_store_service/data_store_service_util.h(1 hunks)eloq_data_store_service/ds_request.proto(5 hunks)eloq_data_store_service/internal_request.h(55 hunks)eloq_data_store_service/main.cpp(1 hunks)eloq_data_store_service/rocksdb_cloud_data_store.cpp(6 hunks)eloq_data_store_service/rocksdb_data_store.cpp(1 hunks)eloq_data_store_service/rocksdb_data_store_common.cpp(2 hunks)eloq_data_store_service/rocksdb_data_store_common.h(0 hunks)
💤 Files with no reviewable changes (2)
- eloq_data_store_service/rocksdb_data_store_common.h
- eloq_data_store_service/CMakeLists.txt
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-17T06:36:11.137Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 80
File: rocksdb_handler.cpp:1226-1335
Timestamp: 2025-09-17T06:36:11.137Z
Learning: In the EloqData store_handler codebase, the maximum bucket ID is guaranteed to be smaller than UINT16_MAX (65535), so bucket_id + 1 operations do not risk integer overflow when used in FetchBucketData methods.
Applied to files:
data_store_service_client_closure.cppdata_store_service_client.hdata_store_service_client.cpp
📚 Learning: 2025-09-17T11:08:35.322Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 83
File: data_store_service_client_closure.cpp:408-448
Timestamp: 2025-09-17T11:08:35.322Z
Learning: In the DataStoreServiceClient batch write system, the global coordinator (SyncPutAllData) manages concurrency at the partition level, not at the individual batch level. The global unfinished_request_cnt_ is only decremented when an entire partition is fully completed or failed, not after each individual batch within a partition.
Applied to files:
data_store_service_client_closure.cppeloq_data_store_service/data_store_service.cppdata_store_service_client.cpp
🧬 Code graph analysis (10)
eloq_data_store_service/rocksdb_data_store.cpp (1)
eloq_data_store_service/data_store_service_util.h (1)
shard_id_(123-123)
data_store_service_scanner.cpp (1)
data_store_service_scanner.h (1)
data_shard_id_(266-266)
data_store_service_scanner.h (1)
eloq_data_store_service/internal_request.h (9)
string_view(106-109)string_view(126-129)string_view(273-276)string_view(606-614)string_view(621-629)string_view(818-826)partition_id_(647-647)partition_id_(869-869)partition_id_(1528-1528)
eloq_data_store_service/rocksdb_cloud_data_store.cpp (1)
eloq_data_store_service/data_store_service_util.h (1)
shard_id_(123-123)
eloq_data_store_service/data_store_service.h (2)
eloq_data_store_service/data_store_service.cpp (14)
StartService(245-314)StartService(245-245)Read(400-433)Read(400-403)Read(435-478)Read(435-443)ScanClose(996-1029)ScanClose(996-999)ScanClose(1031-1063)ScanClose(1031-1036)CloseDataStore(1846-1891)CloseDataStore(1846-1846)OpenDataStore(1893-1929)OpenDataStore(1893-1893)eloq_data_store_service/data_store_service_config.cpp (6)
PrepareShardingError(904-948)PrepareShardingError(904-905)IsOwnerOfShard(285-293)IsOwnerOfShard(285-285)IsOwnerOfShard(690-701)IsOwnerOfShard(690-691)
eloq_data_store_service/internal_request.h (1)
data_store_service_client_closure.h (31)
Reset(81-85)Reset(144-153)Reset(210-216)Reset(269-274)Reset(385-392)Reset(600-619)Reset(933-948)Reset(1154-1177)Reset(1393-1408)Reset(1598-1624)Reset(1627-1658)Reset(1660-1694)Reset(1933-2251)Reset(2328-2347)Reset(2606-2612)Reset(3002-3010)vector(1078-1081)vector(1121-1128)vector(2253-2256)string(804-809)string(2173-2178)string_view(311-368)string_view(752-755)string_view(767-770)string_view(792-797)string_view(1291-1294)string_view(1533-1536)string_view(2133-2136)string_view(2138-2141)string_view(2241-2246)string_view(2453-2456)
data_store_service_client_closure.h (2)
eloq_data_store_service/internal_request.h (13)
partition_id_(647-647)partition_id_(869-869)partition_id_(1528-1528)string_view(106-109)string_view(126-129)string_view(273-276)string_view(606-614)string_view(621-629)string_view(818-826)string_view(988-996)string_view(1123-1131)string_view(1425-1433)string_view(1440-1448)eloq_data_store_service/data_store_service_util.h (1)
shard_id_(123-123)
eloq_data_store_service/data_store_service.cpp (3)
eloq_data_store_service/data_store_service.h (4)
IsOwnerOfShard(618-623)PrepareShardingError(525-532)IncreaseWriteReqCount(606-610)DecreaseWriteReqCount(612-616)data_store_service_client.cpp (6)
Read(1151-1160)Read(1151-1156)Read(3574-3590)Read(3574-3579)ScanNext(3766-3799)ScanNext(3766-3780)eloq_data_store_service/rocksdb_data_store_common.cpp (9)
Read(416-477)Read(416-416)key(832-832)start_key(367-367)start_key(679-679)end_key(382-382)end_key(684-684)ScanNext(735-1045)ScanNext(735-735)
data_store_service_client.h (2)
data_store_service_client.cpp (20)
FindFreeNodeIndex(3313-3332)FindFreeNodeIndex(3313-3313)InitBucketsInfo(3178-3234)InitBucketsInfo(3178-3182)OnLeaderStart(3032-3047)OnLeaderStart(3032-3033)OnLeaderStop(3049-3061)OnLeaderStop(3049-3049)OnStartFollowing(3070-3117)OnStartFollowing(3070-3073)GetShardIdByPartitionId(3145-3162)GetShardIdByPartitionId(3145-3146)IsLocalShard(3135-3143)IsLocalShard(3135-3135)GetOwnerNodeIndexOfShard(3253-3258)GetOwnerNodeIndexOfShard(3253-3254)GetAllDataShards(3164-3176)GetAllDataShards(3164-3164)UpdateOwnerNodeIndexOfShard(3260-3311)UpdateOwnerNodeIndexOfShard(3260-3261)eloq_data_store_service/data_store_service_config.cpp (2)
GetShardIdByPartitionId(802-807)GetShardIdByPartitionId(802-803)
data_store_service_client.cpp (3)
data_store_service_client.h (3)
TxPort2DssPort(126-129)need_bootstrap_(683-816)KvPartitionIdOf(621-626)eloq_data_store_service/data_store_service.cpp (8)
BatchWriteRecords(854-902)BatchWriteRecords(854-858)BatchWriteRecords(1124-1189)BatchWriteRecords(1124-1137)Read(400-433)Read(400-403)Read(435-478)Read(435-443)data_store_service_client_closure.cpp (2)
PartitionBatchCallback(493-538)PartitionBatchCallback(493-496)
🪛 Clang (14.0.6)
eloq_data_store_service/data_store_service_util.h
[error] 24-24: 'atomic' file not found
(clang-diagnostic-error)
🔇 Additional comments (21)
eloq_data_store_service/data_store_service_util.h (1)
24-29: Static analysis false positive: ignore the 'atomic' error.The static analysis error about 'atomic' file not found is a false positive.
<atomic>is a standard C++11 header and should compile correctly in the target environment. The new includes support the multi-shard data structures introduced in this PR.data_store_service_client_closure.h (3)
602-614: LGTM! Type changes and shard_id propagation are consistent.The updated signature correctly changes
partition_idtoint32_tand addsuint32_t shard_id, aligning with the shard-aware architecture. The initialization ofkey_directly from the parameter simplifies key handling.
1850-1858: LGTM! Accessor methods follow consistent patterns.The
PartitionId()andShardId()accessor methods are well-defined and return the correct types, enabling proper shard routing throughout the system.
631-633: Sentinel values are intentionally inconsistent across different modules but adequately scoped.The review comment's concern about consistency has merit, but the codebase shows intentional design differences rather than errors:
- data_store_service_client_closure.h: Uses
INT32_MAXas a sentinel for "uninitialized/reset" state in closure classes (lines 631, 1605, 1913)- eloq_data_store_service/internal_request.h and data_store_service_scanner.h: Use
0as the sentinel valueThese are separate modules with different semantics—not shared classes. The
shard_id_sentinel (UINT32_MAX) is consistently checked in assertions throughout the codebase (e.g.,data_store_service.cpplines 326, 1089, 1097, 1105, etc.), validating sentinel-aware code paths exist.However,
partition_id_sentinels lack comparable defensive checks—there are no comparisons or validations againstINT32_MAXor0found in the codebase. This represents a minor defensive programming gap rather than an outright inconsistency issue.data_store_service_client.h (4)
90-102: LGTM! Shard initialization logic is sound.The initialization correctly:
- Populates
dss_shards_array with node indices for each shard- Builds
dss_shard_ids_set from the cluster manager- Calls
InitBucketsInfoto establish bucket-to-shard mappingsThis properly sets up the multi-shard architecture.
387-395: Leader callback signatures correctly updated for node group awareness.The updated signatures for
OnLeaderStart,OnLeaderStop, andOnStartFollowingnow acceptng_id(node group ID) parameters, enabling proper per-shard leadership management in the multi-shard architecture.
624-625: Verify hash consistency for system table partition mapping.The
KvPartitionIdOfmethod usesstd::hash<std::string_view>to compute partition IDs for system tables. Ensure that:
- The hash function produces consistent results across different nodes
- The hash distribution is uniform to avoid hotspots
Based on learnings
751-755: LGTM! New shard-aware state members are well-structured.The additions of:
dss_shard_ids_(set of active shard IDs)bucket_infos_(bucket-to-shard mapping)provide the necessary infrastructure for multi-shard routing and management. The use of
std::unique_ptrforBucketInfoensures proper ownership semantics.data_store_service_scanner.cpp (1)
417-419: The code at lines 417-419 correctly handles partition ID boundaries and produces valid shard IDs.The loop iterates
for (uint32_t part_cnt = 0; part_cnt < HASH_PARTITION_COUNT; part_cnt++)and passesstatic_cast<int32_t>(part_cnt)toGetShardIdByPartitionId. This guarantees the partition ID is always in the range[0, HASH_PARTITION_COUNT-1). The implementation uses two assertions to validate the returned bucket ID and shard ID exist in their respective maps, providing runtime safety. All 34 callers throughout the codebase follow similar patterns with documented, computed, or hardcoded valid partition IDs.data_store_service_client.cpp (4)
3560-3563: Local vs remote routing: pattern looks consistentAll Read/DeleteRange/ScanNext/ScanClose paths check IsLocalShard and route to the right target with timeouts and owner index lookup. Good.
Also applies to: 3446-3450, 3498-3504, 3750-3763, 3804-3838, 3870-3894, 3594-3621, 3648-3676
4384-4390: Catalog paths: shard resolution for partition 0 is now consistentPassing shard_id for partition 0 catalogs eliminates implicit shard 0 assumptions. Looks good.
Also applies to: 4610-4621, 4655-4672
4715-4749: Batch-size accounting improvementsIncrementing batch_size for key sizes prevents oversize batches. This is correct and matches MAX_WRITE_BATCH_SIZE enforcement.
Also applies to: 4761-4787, 4801-4810, 4916-4931
399-404: Per-partition coordinator decrement verified correctThe batch callback chaining at lines 493-538 ensures
OnPartitionCompleted()is called exactly once per partition lifecycle:
- If any batch fails: called at line 507
- If all batches succeed: called at line 536 (when no more batches exist)
The same
callback_data(containing onepartition_stateandglobal_coordinator) persists across all batches for a partition, and the callback chains subsequent batches before callingOnPartitionCompleted(). This confirms the global coordinator's unfinished_request_cnt_ decrements only when the partition fully completes or fails, not per batch. The routing logic at lines 399-404 usingbatch[0].partition_idis correct and properly grouped.eloq_data_store_service/main.cpp (1)
321-323: Startup wiring LGTMStartService signature update is correctly applied and respects bootstrap/single-node. No issues spotted.
eloq_data_store_service/rocksdb_cloud_data_store.cpp (2)
370-393: Timing/logging improvements LGTMSwitch to steady_clock and explicit NewCloudFileSystem timing are good operationally.
Also applies to: 647-671
612-636: Paranoid checks disabled with max_open_files=0: note operational tradeoffThis is fine for open-speed and to avoid LRU churn, but document the rationale in code; already noted. No change requested.
eloq_data_store_service/data_store_service.h (5)
628-631: Verify GetClusterManager visibility change is intentional.The
GetClusterManager()method returns a non-const reference and appears to have been made public (indicated by the~marker on line 628). This exposes the cluster manager for external modification.Ensure that:
- This visibility change is intentional and necessary for the multi-shard architecture
- External callers have a legitimate need to access/modify cluster manager state
- Thread-safety is maintained if multiple threads access the cluster manager
If external modification is not required, consider returning a const reference instead:
const DataStoreServiceClusterManager& GetClusterManager() const
606-616: LGTM: Per-shard write request tracking.The refactoring to per-shard write request counters is correctly implemented. The use of
.at()provides bounds checking, and the atomic operations with appropriate memory ordering are correct.
585-593: LGTM: Defensive shard status check.The implementation correctly verifies that the shard is initialized (
shard_id_matches the requestedshard_id) before returning the actual status, otherwise returnsClosed. This is good defensive programming.
634-644: LGTM: Safe per-shard data store accessor.The private
GetDataStoremethod correctly validates that the shard is initialized before returning the data store pointer, returningnullptrotherwise. The use of.at()provides bounds checking.
190-190: StartService signature appropriately simplified for multi-shard architecture.The signature has been narrowed to a single
create_db_if_missingparameter, which is appropriate for the new multi-shard architecture where shard-specific configuration is obtained from the cluster manager rather than passed as parameters.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
data_store_service_client.cpp (1)
3069-3116: Inconsistent use of ng_id in OnStartFollowingLines 3040 and 3057 correctly use the
ng_idparameter when callingOpenDataStore(ng_id)andCloseDataStore(ng_id). However, line 3080 hardcodes0instead of using theng_idparameter. This inconsistency could cause the wrong shard to be closed when following a leader.Apply this fix:
if (data_store_service_ != nullptr) { - // Now, only support one shard. - data_store_service_->CloseDataStore(0); + data_store_service_->CloseDataStore(ng_id); }
♻️ Duplicate comments (5)
data_store_service_client.cpp (5)
313-317: Still unresolved: Fatal assert will crash the processThe unconditional
assert(false)at line 316 will abort the process if this method is called. This was flagged in past reviews but remains unaddressed.Replace with graceful error handling:
void DataStoreServiceClient::ScheduleTimerTasks() { LOG(ERROR) << "ScheduleTimerTasks not implemented"; - assert(false); + // TODO: Implement timer-based tasks when needed }
2616-2624: Still unresolved: Fatal assert in FetchArchivesThe unconditional
assert(false)at line 2623 was flagged in past reviews but remains unaddressed. This will crash the process if called.Replace with proper error handling:
bool DataStoreServiceClient::FetchArchives( const txservice::TableName &table_name, const txservice::KVCatalogInfo *kv_info, const txservice::TxKey &key, std::vector<txservice::VersionTxRecord> &archives, uint64_t from_ts) { - assert(false); - LOG(INFO) << "FetchArchives: table_name: " << table_name.StringView(); + LOG(ERROR) << "FetchArchives not fully implemented yet"; + return false;
2724-2734: Still unresolved: Fatal assert in FetchVisibleArchiveThe unconditional
assert(false)at line 2733 was flagged in past reviews but remains unaddressed.Replace with proper error handling:
bool DataStoreServiceClient::FetchVisibleArchive( const txservice::TableName &table_name, const txservice::KVCatalogInfo *kv_info, const txservice::TxKey &key, const uint64_t upper_bound_ts, txservice::TxRecord &rec, txservice::RecordStatus &rec_status, uint64_t &commit_ts) { - assert(false); - + LOG(ERROR) << "FetchVisibleArchive not fully implemented yet"; + return false;
213-273: Still unresolved: Redundant check and unguarded map accessPast review flagged two issues that remain unaddressed:
Redundant check (lines 257-262): The condition
if (leader_id == node_id)is checked twice—once implicitly when callingSwitchShardToReadWriteand again immediately after. Consolidate into a single block.Unguarded map access (line 255):
nodes_map.at(leader_id)is called without verifying thatleader_idexists innodes_map. Theng_leadersmap is independent ofnodes_map, so a leader_id fromng_leadersmight not exist innodes_map. The assert on line 252 comes after this access, so it doesn't prevent the exception.Apply this fix:
// set primary node if (ng_leaders.find(ng_id) != ng_leaders.end()) { uint32_t leader_id = ng_leaders.at(ng_id); - assert(nodes_map.find(leader_id) != nodes_map.end()); - if (nodes_map.find(leader_id) != nodes_map.end()) + auto leader_it = nodes_map.find(leader_id); + if (leader_it != nodes_map.end()) { - cluster_manager.UpdatePrimaryNode(ng_id, - nodes_map.at(leader_id)); - } - if (leader_id == node_id) - { - contain_this_node = true; + cluster_manager.UpdatePrimaryNode(ng_id, leader_it->second); + if (leader_id == node_id) + { + contain_this_node = true; + cluster_manager.SwitchShardToReadWrite(ng_id, + DSShardStatus::Closed); + } + } + else + { + LOG(ERROR) << "Leader " << leader_id << " not found in nodes_map for ng " << ng_id; + } + if (leader_id == node_id && leader_it == nodes_map.end()) + { + contain_this_node = true; cluster_manager.SwitchShardToReadWrite(ng_id, DSShardStatus::Closed); } }
3144-3167: Still unresolved: Unsynchronized access to shared containersPast review flagged that
bucket_infos_anddss_shard_ids_are read without synchronization, which can race with topology updates. This concern remains unaddressed.Add synchronization to prevent races:
uint32_t DataStoreServiceClient::GetShardIdByPartitionId( int32_t partition_id, bool is_range_partition) const { uint16_t bucket_id; if (is_range_partition) { bucket_id = txservice::Sharder::MapRangeIdToBucketId(partition_id); } else { bucket_id = txservice::Sharder::MapHashPartitionIdToBucketId(partition_id); } + // TODO: Add shared_lock when bucket_infos_mutex_ is introduced + // std::shared_lock lk(bucket_infos_mutex_); auto it = bucket_infos_.find(bucket_id); assert(it != bucket_infos_.end()); if (it != bucket_infos_.end()) { uint32_t shard_id = bucket_infos_.at(bucket_id)->BucketOwner(); assert(dss_shard_ids_.find(shard_id) != dss_shard_ids_.end()); return shard_id; } return 0; }Verify whether
bucket_infos_is modified concurrently:#!/bin/bash # Search for writes to bucket_infos_ to assess race potential rg -nP '\bbucket_infos_\s*(\[|\.emplace|\.insert|\.erase|\.clear|\s*=)' -C2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
data_store_service_client.cpp(89 hunks)data_store_service_client.h(13 hunks)eloq_data_store_service/data_store_service.cpp(44 hunks)eloq_data_store_service/data_store_service.h(13 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-17T11:08:35.322Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 83
File: data_store_service_client_closure.cpp:408-448
Timestamp: 2025-09-17T11:08:35.322Z
Learning: In the DataStoreServiceClient batch write system, the global coordinator (SyncPutAllData) manages concurrency at the partition level, not at the individual batch level. The global unfinished_request_cnt_ is only decremented when an entire partition is fully completed or failed, not after each individual batch within a partition.
Applied to files:
eloq_data_store_service/data_store_service.cppdata_store_service_client.cpp
📚 Learning: 2025-09-17T06:36:11.137Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 80
File: rocksdb_handler.cpp:1226-1335
Timestamp: 2025-09-17T06:36:11.137Z
Learning: In the EloqData store_handler codebase, the maximum bucket ID is guaranteed to be smaller than UINT16_MAX (65535), so bucket_id + 1 operations do not risk integer overflow when used in FetchBucketData methods.
Applied to files:
data_store_service_client.hdata_store_service_client.cpp
🧬 Code graph analysis (4)
eloq_data_store_service/data_store_service.h (1)
eloq_data_store_service/data_store_service.cpp (14)
StartService(236-305)StartService(236-236)Read(391-424)Read(391-394)Read(426-469)Read(426-434)ScanClose(987-1020)ScanClose(987-990)ScanClose(1022-1054)ScanClose(1022-1027)CloseDataStore(1837-1882)CloseDataStore(1837-1837)OpenDataStore(1884-1920)OpenDataStore(1884-1884)
eloq_data_store_service/data_store_service.cpp (3)
eloq_data_store_service/data_store_service.h (4)
IsOwnerOfShard(615-625)PrepareShardingError(525-529)IncreaseWriteReqCount(603-607)DecreaseWriteReqCount(609-613)data_store_service_client.cpp (12)
Read(1150-1159)Read(1150-1155)Read(3580-3596)Read(3580-3585)DeleteRange(3629-3651)DeleteRange(3629-3636)ScanNext(3772-3805)ScanNext(3772-3786)ScanClose(3847-3871)ScanClose(3847-3852)BatchWriteRecords(4172-4207)BatchWriteRecords(4172-4185)eloq_data_store_service/rocksdb_data_store_common.cpp (15)
Read(416-477)Read(416-416)key(832-832)DeleteRange(339-414)DeleteRange(339-339)start_key(367-367)start_key(679-679)end_key(382-382)end_key(684-684)ScanNext(735-1045)ScanNext(735-735)ScanClose(1047-1071)ScanClose(1047-1047)BatchWriteRecords(479-641)BatchWriteRecords(479-480)
data_store_service_client.h (2)
data_store_service_client.cpp (24)
FindFreeNodeIndex(3319-3338)FindFreeNodeIndex(3319-3319)InitBucketsInfo(3184-3240)InitBucketsInfo(3184-3188)OnLeaderStart(3031-3046)OnLeaderStart(3031-3032)OnLeaderStop(3048-3060)OnLeaderStop(3048-3048)OnStartFollowing(3069-3116)OnStartFollowing(3069-3072)GetShardIdByPartitionId(3144-3167)GetShardIdByPartitionId(3144-3145)IsLocalShard(3134-3142)IsLocalShard(3134-3134)GetOwnerNodeIndexOfShard(3259-3264)GetOwnerNodeIndexOfShard(3259-3260)GetAllDataShards(3169-3182)GetAllDataShards(3169-3169)UpdateOwnerNodeIndexOfShard(3266-3317)UpdateOwnerNodeIndexOfShard(3266-3267)HandleShardingError(3340-3374)HandleShardingError(3340-3341)UpgradeShardVersion(3376-3424)UpgradeShardVersion(3376-3379)eloq_data_store_service/data_store_service_config.cpp (4)
GetShardIdByPartitionId(802-807)GetShardIdByPartitionId(802-803)HandleShardingError(950-992)HandleShardingError(950-951)
data_store_service_client.cpp (2)
data_store_service_client.h (3)
TxPort2DssPort(126-129)need_bootstrap_(680-804)KvPartitionIdOf(621-626)data_store_service_client_closure.cpp (2)
PartitionBatchCallback(493-538)PartitionBatchCallback(493-496)
🔇 Additional comments (3)
data_store_service_client.cpp (3)
286-289: LGTM: Early return when bootstrap not neededThe guard correctly skips table initialization when
need_bootstrap_is false, improving efficiency.
467-492: LGTM: Proper shard-aware routing in PutAllThe method correctly determines partition type and resolves shard_id before batch writes. The
is_range_partitionedflag is properly derived and propagated.
3169-3182: LGTM: Thread-safety issue resolved with mutexPast review flagged unsynchronized access to
dss_shard_ids_. The addition ofdss_shard_ids_mutex_at line 3173 properly addresses this concern. The lock scope is minimal and appropriate.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
data_store_service_client.cpp (5)
313-317: Remove fatal assert in ScheduleTimerTasks; return gracefullyAsserting false crashes callers. Replace with a no-op + TODO until implemented.
void DataStoreServiceClient::ScheduleTimerTasks() { - LOG(ERROR) << "ScheduleTimerTasks not implemented"; - assert(false); + DLOG(WARNING) << "ScheduleTimerTasks not implemented yet"; + // TODO: implement periodic tasks (health checks, cleanup, etc.) }
1871-1885: DropKvTableAsync: avoid assert(false); make it a documented no-op or implementThis assert crashes the process if ever called. Either remove the API or implement a safe async path; minimally, return without asserting.
void DataStoreServiceClient::DropKvTableAsync(const std::string &kv_table_name) { - // FIXME(lzx): this function may not be used now. - assert(false); + DLOG(WARNING) << "DropKvTableAsync is not implemented and will be a no-op for now: " << kv_table_name; + return;
3069-3116: OnStartFollowing: CloseDataStore(0) ignores ng_idHardcoding shard 0 contradicts multi-shard support. Use ng_id consistently.
- if (data_store_service_ != nullptr) - { - // Now, only support one shard. - data_store_service_->CloseDataStore(0); - } + if (data_store_service_ != nullptr) { + data_store_service_->CloseDataStore(ng_id); + }
3015-3021: RestoreTxCache: avoid assert(false); log and returnDon’t abort the process on unimplemented path.
void DataStoreServiceClient::RestoreTxCache(txservice::NodeGroupId cc_ng_id, int64_t cc_ng_term) { - LOG(ERROR) << "RestoreTxCache not implemented"; - assert(false); + DLOG(WARNING) << "RestoreTxCache not implemented for ng_id=" << cc_ng_id + << " term=" << cc_ng_term; + return; }
3580-3627: Add bounds validation in GetOwnerNodeIndexOfShard before useThe review comment identifies a legitimate vulnerability.
GetOwnerNodeIndexOfShardcurrently asserts only that the returned value is notUINT32_MAX, but does not validate that the value is within the valid range[0, 1024)for indexingdss_nodes_.The evidence shows:
dss_nodes_is a fixed-size array of 1024 elements (valid indices 0-1023)FindFreeNodeIndex()returns 1024 as an error sentinel when no free slot exists- If this error value (1024) gets stored in
dss_shards_[shard_id],GetOwnerNodeIndexOfShardwill return it- All call sites directly index
dss_nodes_[node_index]without bounds checking (lines 2980, 3618, 3674, 3717, 3761, 3836, 3892, 4237)- Even
HandleShardingError(line 3390) loads and directly indexes without validationThe fix should add an upper-bound check in
GetOwnerNodeIndexOfShardto ensure the returned value is less thandss_nodes_.size()(1024), or validate at each call site before indexing.data_store_service_client.h (2)
414-424: Signature/type mismatch will not compile: uint64_t vs size_tCPP defines SerializeTxRecord helpers with std::vector<size_t>, but header declares std::vector<uint64_t>. Align to size_t.
- static void SerializeTxRecord(bool is_deleted, - const txservice::TxRecord *rec, - std::vector<uint64_t> &record_tmp_mem_area, - std::vector<std::string_view> &record_parts, - size_t &write_batch_size); + static void SerializeTxRecord(bool is_deleted, + const txservice::TxRecord *rec, + std::vector<size_t> &record_tmp_mem_area, + std::vector<std::string_view> &record_parts, + size_t &write_batch_size); ... - static void SerializeTxRecord(const txservice::TxRecord *rec, - std::vector<uint64_t> &record_tmp_mem_area, - std::vector<std::string_view> &record_parts, - size_t &write_batch_size); + static void SerializeTxRecord(const txservice::TxRecord *rec, + std::vector<size_t> &record_tmp_mem_area, + std::vector<std::string_view> &record_parts, + size_t &write_batch_size);
73-116: Constructor: initialize + bucket map wiring looks good; add bounds checkBefore writing dss_shards_[shard_id], assert shard_id < dss_shards_.size() to avoid OOB if configs drift.
- dss_shards_[shard_id].store(node_idx); + assert(shard_id < dss_shards_.size()); + dss_shards_[shard_id].store(node_idx);
♻️ Duplicate comments (4)
data_store_service_client.cpp (4)
2623-2669: Remove unconditional assert in FetchArchives and fail gracefullyThis crashes in release scenarios invoking the API. Return false with an error path instead of asserting.
bool DataStoreServiceClient::FetchArchives( const txservice::TableName &table_name, const txservice::KVCatalogInfo *kv_info, const txservice::TxKey &key, std::vector<txservice::VersionTxRecord> &archives, uint64_t from_ts) { - assert(false); + // TODO: implement or route to async variant + LOG(ERROR) << "FetchArchives (sync) not implemented"; + return false;
2724-2809: Remove unconditional assert in FetchVisibleArchive and fail gracefullySame as above—avoid crashing production.
bool DataStoreServiceClient::FetchVisibleArchive( const txservice::TableName &table_name, const txservice::KVCatalogInfo *kv_info, const txservice::TxKey &key, const uint64_t upper_bound_ts, txservice::TxRecord &rec, txservice::RecordStatus &rec_status, uint64_t &commit_ts) { - assert(false); + // TODO: implement or route to async variant + LOG(ERROR) << "FetchVisibleArchive (sync) not implemented"; + return false;
3144-3167: GetShardIdByPartitionId: add synchronization and avoid silent 0 fallback
- bucket_infos_ access is unsynchronized; races with topology updates.
- Returning 0 on miss can misroute requests.
Proposed fix:
- Guard bucket_infos_ with a shared lock.
- If bucket not found, return an error or throw/log and avoid defaulting to 0.
uint32_t DataStoreServiceClient::GetShardIdByPartitionId( int32_t partition_id, bool is_range_partition) const { - uint16_t bucket_id; + uint16_t bucket_id; if (is_range_partition) { bucket_id = txservice::Sharder::MapRangeIdToBucketId(partition_id); } else { bucket_id = txservice::Sharder::MapHashPartitionIdToBucketId(partition_id); } - auto it = bucket_infos_.find(bucket_id); + // TODO: protect bucket_infos_ with a mutex; see header change to add bucket_infos_mutex_ + auto it = bucket_infos_.find(bucket_id); assert(it != bucket_infos_.end()); if (it != bucket_infos_.end()) { uint32_t shard_id = bucket_infos_.at(bucket_id)->BucketOwner(); assert(dss_shard_ids_.find(shard_id) != dss_shard_ids_.end()); return shard_id; } - return 0; + LOG(ERROR) << "Bucket not found for partition_id=" << partition_id + << " (bucket_id=" << bucket_id << ")"; + return UINT32_MAX; // explicit invalid }
3184-3240: Replace srand/rand with thread-safe RNG and deterministic samplingrand()/srand() are not thread-safe and use global state. Use a local std::mt19937 seeded by ng.
- std::map<uint16_t, uint32_t> rand_num_to_ng; - // use ng id as seed to generate random numbers + std::map<uint16_t, uint32_t> rand_num_to_ng; + // use ng id as seed to generate deterministic virtual nodes per group for (auto ng : node_groups) { - srand(ng); + std::mt19937 rng(ng); + std::uniform_int_distribution<uint16_t> dist( + 0, txservice::total_range_buckets - 1); size_t generated = 0; while (generated < 64) { - uint16_t rand_num = rand() % txservice::total_range_buckets; + uint16_t rand_num = dist(rng); if (rand_num_to_ng.find(rand_num) == rand_num_to_ng.end()) { generated++; rand_num_to_ng.emplace(rand_num, ng); } if (rand_num_to_ng.size() >= txservice::total_range_buckets)
🧹 Nitpick comments (8)
data_store_service_client.cpp (4)
111-141: Avoid redundant re-initialization and relying on nodes_[0] as primary
- Use strict version check to skip no-op upgrades when versions are equal.
- Don’t assume nodes_[0] is the owner; prefer the cluster manager’s recorded primary if available.
Suggested tweak:
- if (current_version <= cluster_manager.GetTopologyVersion() && + if (current_version < cluster_manager.GetTopologyVersion() && dss_topology_version_.compare_exchange_strong(current_version, new_version)) { - // The first node is the owner of shard. - assert(group.nodes_.size() > 0); - while (!UpgradeShardVersion(group.shard_id_, - group.version_, - group.nodes_[0].host_name_, - group.nodes_[0].port_)) + assert(!group.nodes_.empty()); + // TODO: consider using the primary node from cluster_manager instead of nodes_[0] + while (!UpgradeShardVersion(group.shard_id_, + group.version_, + group.nodes_[0].host_name_, + group.nodes_[0].port_)) {
135-147: Fix log typos to aid on-call grepability“DataStoreServiceCliet” → “DataStoreServiceClient”.
- LOG(INFO) << "DataStoreServiceCliet UpgradeShardVersion success, " + LOG(INFO) << "DataStoreServiceClient UpgradeShardVersion success, " ... - LOG(INFO) - << "DataStoreServiceCliet SetupConfig skipped, current_version:" + LOG(INFO) + << "DataStoreServiceClient SetupConfig skipped, current_version:"
213-272: TxConfigsToDssClusterConfig: good guards; set “this node” before switching to RWThe leader existence check and duplicate condition consolidation look good. SetThisNode should be called before switching to ReadWrite to ensure local metadata is in place.
- // set primary node + // set primary node if (ng_leaders.find(ng_id) != ng_leaders.end()) { uint32_t leader_id = ng_leaders.at(ng_id); assert(nodes_map.find(leader_id) != nodes_map.end()); if (nodes_map.find(leader_id) != nodes_map.end()) { cluster_manager.UpdatePrimaryNode(ng_id, nodes_map.at(leader_id)); } if (leader_id == node_id) { - contain_this_node = true; - cluster_manager.SwitchShardToReadWrite(ng_id, - DSShardStatus::Closed); + contain_this_node = true; } } // set this node if (contain_this_node && nodes_map.find(node_id) != nodes_map.end()) { auto &this_node_config = nodes_map.at(node_id); cluster_manager.SetThisNode(this_node_config.host_name_, this_node_config.port_); + // if we are leader, flip to RW after local-node metadata is set + if (ng_leaders.count(ng_id) && ng_leaders.at(ng_id) == node_id) { + cluster_manager.SwitchShardToReadWrite(ng_id, + DSShardStatus::Closed); + } }
3169-3182: GetAllDataShards: thread-safe copy is good; preserve deterministic orderLooks correct; iterating std::set keeps ascending order which is useful for fanout. Consider documenting ordering guarantees.
data_store_service_client.h (4)
640-646: Comment param mismatch: update docstringThe comment says “@param partition_id” but the method takes shard_id. Update to avoid confusion for maintainers.
479-481: Expose GetShardIdByPartitionId publicly: OK; add explicit invalid return policyDocument that it may return UINT32_MAX on failure (after cpp change) and that callers must handle it by refreshing topology.
652-661: UpdateShardOwner declared but not defined in this PRIf unused, consider removing to reduce surface area; otherwise add implementation.
736-741: Node vs shard array sizesdss_nodes_ is 1024 and dss_shards_ is 1000; document rationale and enforce that owner indices stored in dss_shards_ are < dss_nodes_.size().
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
data_store_service_client.cpp(89 hunks)data_store_service_client.h(13 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-17T11:08:35.322Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 83
File: data_store_service_client_closure.cpp:408-448
Timestamp: 2025-09-17T11:08:35.322Z
Learning: In the DataStoreServiceClient batch write system, the global coordinator (SyncPutAllData) manages concurrency at the partition level, not at the individual batch level. The global unfinished_request_cnt_ is only decremented when an entire partition is fully completed or failed, not after each individual batch within a partition.
Applied to files:
data_store_service_client.cpp
📚 Learning: 2025-09-17T06:36:11.137Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 80
File: rocksdb_handler.cpp:1226-1335
Timestamp: 2025-09-17T06:36:11.137Z
Learning: In the EloqData store_handler codebase, the maximum bucket ID is guaranteed to be smaller than UINT16_MAX (65535), so bucket_id + 1 operations do not risk integer overflow when used in FetchBucketData methods.
Applied to files:
data_store_service_client.cppdata_store_service_client.h
🧬 Code graph analysis (2)
data_store_service_client.cpp (2)
data_store_service_client.h (3)
TxPort2DssPort(126-129)need_bootstrap_(680-804)KvPartitionIdOf(621-626)data_store_service_client_closure.cpp (2)
PartitionBatchCallback(493-538)PartitionBatchCallback(493-496)
data_store_service_client.h (1)
data_store_service_client.cpp (22)
FindFreeNodeIndex(3319-3338)FindFreeNodeIndex(3319-3319)InitBucketsInfo(3184-3240)InitBucketsInfo(3184-3188)OnLeaderStart(3031-3046)OnLeaderStart(3031-3032)OnLeaderStop(3048-3060)OnLeaderStop(3048-3048)OnStartFollowing(3069-3116)OnStartFollowing(3069-3072)GetShardIdByPartitionId(3144-3167)GetShardIdByPartitionId(3144-3145)IsLocalShard(3134-3142)IsLocalShard(3134-3134)GetOwnerNodeIndexOfShard(3259-3264)GetOwnerNodeIndexOfShard(3259-3260)GetAllDataShards(3169-3182)GetAllDataShards(3169-3169)UpdateOwnerNodeIndexOfShard(3266-3317)UpdateOwnerNodeIndexOfShard(3266-3267)UpgradeShardVersion(3376-3424)UpgradeShardVersion(3376-3379)
🔇 Additional comments (10)
data_store_service_client.cpp (3)
286-289: Early-return when not bootstrapping looks goodAvoids unnecessary init work on secondaries/clients. No issues spotted.
655-668: Shard routing for FetchCurrentTableStatistics: LGTMCorrectly derives kv partition and resolves shard_id. No changes needed.
1001-1023: Shard routing for FetchTableRanges: LGTMConsistent partition→shard resolution and RPC path. No issues.
data_store_service_client.h (7)
736-744: Shard owner index bounds and capacity validation
- dss_shards_ has fixed size 1000; enforce bounds wherever shard_id is used.
- Validate configured shard_count against 1000 at startup.
Minimal guard:
- std::array<std::atomic<uint32_t>, 1000> dss_shards_; + std::array<std::atomic<uint32_t>, 1000> dss_shards_; // MAX_SHARDS = 1000And ensure callers assert shard_id < dss_shards_.size() (see cpp comments). Consider introducing a constexpr MAX_SHARDS and validating config on load.
387-395: Leadership hooks: interface looks right; ensure implementations avoid hardcoded shard 0See cpp comment recommending CloseDataStore(ng_id) in OnStartFollowing.
620-626: KvPartitionIdOf(table): LGTMDeterministic mapping via std::hash and Sharder is consistent with multi‑shard routing.
124-137: TxPort2DssPort offset: confirm contractPort +7 mapping is unchanged; ensure DS listens on this offset in deployments to avoid misroutes.
493-506: Read/BatchWrite/Scan signatures now shard-aware: LGTMPublic → internal signatures align with cpp usage. No issues spotted.
566-590: ScanNext/ScanClose shard_id propagation: LGTMInterfaces align with remote/local routing logic.
344-363: FetchVisibleArchive/FetchArchives async variants OKPublic async APIs exist; ensure sync counterparts don’t assert (see cpp diffs).
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
data_store_service_client.cpp (1)
3069-3081: Close the shard you’re actually following.After making the client shard-aware,
OnStartFollowingmust release the shard identified byng_id. The remainingCloseDataStore(0)call still assumes a single shard and will leave the real shard open (and incorrectly tear down shard 0) wheneverng_id != 0. That breaks follower transitions for every non-zero shard. Switch the call to use the passed-inng_id.- // Now, only support one shard. - data_store_service_->CloseDataStore(0); + data_store_service_->CloseDataStore(ng_id);
♻️ Duplicate comments (1)
eloq_data_store_service/data_store_service.h (1)
615-624: Treat read-only shards as owned.Restricting
IsOwnerOfShardtoReadWritereintroduces the regression we discussed earlier: followers (ReadOnly) stop serving reads and shutdown paths bail out with REQUESTED_NODE_NOT_OWNER. Keep the shard-id guard, but consider bothReadWriteandReadOnlyas owned states.- return ds_ref.shard_status_.load(std::memory_order_acquire) == - DSShardStatus::ReadWrite; + const auto status = + ds_ref.shard_status_.load(std::memory_order_acquire); + return status == DSShardStatus::ReadWrite || + status == DSShardStatus::ReadOnly;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
data_store_service_client.cpp(89 hunks)eloq_data_store_service/data_store_service.h(13 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-17T06:36:11.137Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 80
File: rocksdb_handler.cpp:1226-1335
Timestamp: 2025-09-17T06:36:11.137Z
Learning: In the EloqData store_handler codebase, the maximum bucket ID is guaranteed to be smaller than UINT16_MAX (65535), so bucket_id + 1 operations do not risk integer overflow when used in FetchBucketData methods.
Applied to files:
data_store_service_client.cpp
📚 Learning: 2025-09-17T11:08:35.322Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 83
File: data_store_service_client_closure.cpp:408-448
Timestamp: 2025-09-17T11:08:35.322Z
Learning: In the DataStoreServiceClient batch write system, the global coordinator (SyncPutAllData) manages concurrency at the partition level, not at the individual batch level. The global unfinished_request_cnt_ is only decremented when an entire partition is fully completed or failed, not after each individual batch within a partition.
Applied to files:
data_store_service_client.cpp
🧬 Code graph analysis (2)
eloq_data_store_service/data_store_service.h (3)
eloq_data_store_service/data_store_service.cpp (14)
StartService(236-305)StartService(236-236)Read(391-424)Read(391-394)Read(426-469)Read(426-434)ScanClose(987-1020)ScanClose(987-990)ScanClose(1022-1054)ScanClose(1022-1027)CloseDataStore(1837-1882)CloseDataStore(1837-1837)OpenDataStore(1884-1920)OpenDataStore(1884-1884)eloq_data_store_service/data_store_service_config.cpp (6)
PrepareShardingError(904-948)PrepareShardingError(904-905)IsOwnerOfShard(285-293)IsOwnerOfShard(285-285)IsOwnerOfShard(690-701)IsOwnerOfShard(690-691)eloq_data_store_service/data_store.h (1)
DataStore(42-143)
data_store_service_client.cpp (3)
data_store_service_client.h (3)
TxPort2DssPort(126-129)need_bootstrap_(680-804)KvPartitionIdOf(621-626)eloq_data_store_service/data_store_service.cpp (8)
BatchWriteRecords(845-893)BatchWriteRecords(845-849)BatchWriteRecords(1115-1180)BatchWriteRecords(1115-1128)Read(391-424)Read(391-394)Read(426-469)Read(426-434)data_store_service_client_closure.cpp (2)
PartitionBatchCallback(493-538)PartitionBatchCallback(493-496)
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
data_store_service_client.cpp (1)
3007-3020: Useng_idinstead of hardcoded0for consistency.Line 3018 hardcodes
CloseDataStore(0)while the method receivesng_idas a parameter. For consistency withOnLeaderStartandOnLeaderStop(which correctly useng_id), and to support the multi-shard design, this should beCloseDataStore(ng_id).Apply this diff:
if (data_store_service_ != nullptr) { - // Now, only support one shard. - data_store_service_->CloseDataStore(0); + data_store_service_->CloseDataStore(ng_id); }
🧹 Nitpick comments (2)
eloq_data_store_service/data_store_service.h (2)
682-691: Reset shard_id_ in ShutDown for lifecycle consistency.
CloseDataStore(data_store_service.cpp line 1837) checksds_ref.shard_id_ == UINT32_MAXto detect unassigned shards. For consistency, resetshard_id_toUINT32_MAXafter shutdown.Apply this diff:
void ShutDown() { if (data_store_ != nullptr) { shard_status_.store(DSShardStatus::Closed, std::memory_order_release); data_store_->Shutdown(); scan_iter_cache_.Clear(); + shard_id_ = UINT32_MAX; } }
626-629: Fix formatting: Remove leading space before return type.Line 626 has an extra space before
DataStoreServiceClusterManager.Apply this diff:
- DataStoreServiceClusterManager &GetClusterManager() + DataStoreServiceClusterManager &GetClusterManager()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
data_store_service_client.cpp(89 hunks)data_store_service_client.h(13 hunks)eloq_data_store_service/data_store_service.h(13 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-17T06:36:11.137Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 80
File: rocksdb_handler.cpp:1226-1335
Timestamp: 2025-09-17T06:36:11.137Z
Learning: In the EloqData store_handler codebase, the maximum bucket ID is guaranteed to be smaller than UINT16_MAX (65535), so bucket_id + 1 operations do not risk integer overflow when used in FetchBucketData methods.
Applied to files:
data_store_service_client.hdata_store_service_client.cpp
📚 Learning: 2025-09-17T11:08:35.322Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 83
File: data_store_service_client_closure.cpp:408-448
Timestamp: 2025-09-17T11:08:35.322Z
Learning: In the DataStoreServiceClient batch write system, the global coordinator (SyncPutAllData) manages concurrency at the partition level, not at the individual batch level. The global unfinished_request_cnt_ is only decremented when an entire partition is fully completed or failed, not after each individual batch within a partition.
Applied to files:
data_store_service_client.cpp
🧬 Code graph analysis (3)
eloq_data_store_service/data_store_service.h (4)
eloq_data_store_service/data_store_service.cpp (14)
StartService(236-305)StartService(236-236)Read(391-424)Read(391-394)Read(426-469)Read(426-434)ScanClose(987-1020)ScanClose(987-990)ScanClose(1022-1054)ScanClose(1022-1027)CloseDataStore(1837-1882)CloseDataStore(1837-1837)OpenDataStore(1884-1920)OpenDataStore(1884-1884)data_store_service_client.cpp (6)
Read(1088-1097)Read(1088-1093)Read(3504-3520)Read(3504-3509)ScanClose(3771-3795)ScanClose(3771-3776)eloq_data_store_service/data_store_service_config.cpp (6)
PrepareShardingError(904-948)PrepareShardingError(904-905)IsOwnerOfShard(285-293)IsOwnerOfShard(285-285)IsOwnerOfShard(690-701)IsOwnerOfShard(690-691)eloq_data_store_service/data_store.h (1)
DataStore(42-143)
data_store_service_client.h (1)
data_store_service_client.cpp (24)
FindFreeNodeIndex(3243-3262)FindFreeNodeIndex(3243-3243)InitBucketsInfo(3122-3181)InitBucketsInfo(3122-3126)OnLeaderStart(2969-2984)OnLeaderStart(2969-2970)OnLeaderStop(2986-2998)OnLeaderStop(2986-2986)OnStartFollowing(3007-3054)OnStartFollowing(3007-3010)GetShardIdByPartitionId(3082-3105)GetShardIdByPartitionId(3082-3083)IsLocalShard(3072-3080)IsLocalShard(3072-3072)GetOwnerNodeIndexOfShard(3183-3188)GetOwnerNodeIndexOfShard(3183-3184)GetAllDataShards(3107-3120)GetAllDataShards(3107-3107)UpdateOwnerNodeIndexOfShard(3190-3241)UpdateOwnerNodeIndexOfShard(3190-3191)HandleShardingError(3264-3298)HandleShardingError(3264-3265)UpgradeShardVersion(3300-3348)UpgradeShardVersion(3300-3303)
data_store_service_client.cpp (4)
data_store_service_client.h (4)
TxPort2DssPort(132-135)need_bootstrap_(686-810)KvPartitionIdOf(627-632)Connect(152-157)eloq_data_store_service/data_store_service.cpp (16)
BatchWriteRecords(845-893)BatchWriteRecords(845-849)BatchWriteRecords(1115-1180)BatchWriteRecords(1115-1128)Read(391-424)Read(391-394)Read(426-469)Read(426-434)ScanNext(895-951)ScanNext(895-910)ScanNext(953-985)ScanNext(953-956)DeleteRange(561-608)DeleteRange(561-565)DeleteRange(610-663)DeleteRange(610-617)eloq_data_store_service/rocksdb_data_store_common.cpp (9)
BatchWriteRecords(479-641)BatchWriteRecords(479-480)key(832-832)Read(416-477)Read(416-416)ScanNext(735-1045)ScanNext(735-735)DeleteRange(339-414)DeleteRange(339-339)data_store_service_client_closure.cpp (2)
PartitionBatchCallback(493-538)PartitionBatchCallback(493-496)
🔇 Additional comments (8)
data_store_service_client.h (2)
92-97: Good: Constructor validates shard_id bounds.The bounds check ensures shard_id doesn't exceed the hardcoded array size during initialization, addressing past concerns about the 1000-element limit.
137-142: LGTM: Signature updated for shard-aware configuration.Adding
node_idandng_leadersparameters enables per-shard ownership resolution during cluster configuration, consistent with the multi-shard architecture.data_store_service_client.cpp (6)
224-227: Early return logic looks correct.The early return when
!need_bootstrap_appropriately skips initialization that's not required in shard-aware configurations.
3082-3105: Consider the fallback return value.If
bucket_infos_.find(bucket_id)returnsend(), the assert at line 3097 fires, but in release builds the function would return0(line 3104). If0is not a valid shard_id or if this masks errors, consider returning an error status or throwing instead.However, if the assert guarantees that this path never executes in correct code, and
0is a safe sentinel, this is acceptable.
3122-3181: Consistent hashing implementation looks correct.The deterministic virtual node generation using
std::mt19937seeded with node group ID ensures stable bucket assignments across restarts. The 64 virtual nodes per group provide good distribution.
2368-2406: Partition-to-shard mapping in CopyBaseToArchive is correct.The logic correctly determines
is_range_partitionedonce (line 2368) and consistently uses it for all partition-to-shard mappings, properly handling both hash and range partitioned tables.
3107-3120: Thread-safe shard enumeration.The shared lock correctly protects concurrent access to
dss_shard_ids_, and returning a vector copy ensures safe iteration by callers without holding the lock.
2143-2146: No issues found. The double partition mapping is intentional and correct.The code follows a consistent pattern:
HashArchiveKeycomputes a partition ID in archive key space, thenKvPartitionIdOf(partition_id, false)maps that to the KV partition space. This is confirmed by the second overload signatureint32_t KvPartitionIdOf(int32_t key_partition, bool is_range_partition = true), which accepts a partition ID and transforms it. The same pattern is used throughout the codebase (lines 2570–2572, 2681–2683, 2778–2781, 2816–2818), confirming this is the intended design.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
eloq_data_store_service/data_store_service.h (1)
615-620: Critical: Add shard_id verification to IsOwnerOfShard.
IsOwnerOfShardonly checksshard_status_ != Closedbut does not verifyds_ref.shard_id_ == shard_id. This is inconsistent withFetchDSShardStatus(lines 584-589) which does check shard_id equality. Without this check, an uninitialized shard slot (whereshard_id_is stillUINT32_MAXfrom line 691) or a mismatched shard could be treated as owned if its status happens to be non-Closed.Apply this diff to add the missing check:
bool IsOwnerOfShard(uint32_t shard_id) const { const auto &ds_ref = data_shards_.at(shard_id); - return ds_ref.shard_status_.load(std::memory_order_acquire) != - DSShardStatus::Closed; + return ds_ref.shard_id_ == shard_id && + ds_ref.shard_status_.load(std::memory_order_acquire) != + DSShardStatus::Closed; }
🧹 Nitpick comments (1)
eloq_data_store_service/data_store_service.h (1)
680-689: Consider resetting shard_id_ in ShutDown for consistency.The
ShutDown()method setsshard_status_toClosedand clears resources, but does not resetshard_id_back toUINT32_MAX. While not strictly necessary if status checks are sufficient, this creates an inconsistency with initialization (line 691) whereshard_id_starts atUINT32_MAX. Code likeGetDataStore(line 633) checksshard_id_equality to determine if a shard is properly initialized, so resetting it after shutdown would make the lifecycle clearer.Apply this diff if you want to maintain consistency:
void ShutDown() { if (data_store_ != nullptr) { shard_status_.store(DSShardStatus::Closed, std::memory_order_release); data_store_->Shutdown(); scan_iter_cache_.Clear(); + shard_id_ = UINT32_MAX; } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
eloq_data_store_service/data_store_service.cpp(45 hunks)eloq_data_store_service/data_store_service.h(14 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-17T11:08:35.322Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 83
File: data_store_service_client_closure.cpp:408-448
Timestamp: 2025-09-17T11:08:35.322Z
Learning: In the DataStoreServiceClient batch write system, the global coordinator (SyncPutAllData) manages concurrency at the partition level, not at the individual batch level. The global unfinished_request_cnt_ is only decremented when an entire partition is fully completed or failed, not after each individual batch within a partition.
Applied to files:
eloq_data_store_service/data_store_service.cpp
🧬 Code graph analysis (2)
eloq_data_store_service/data_store_service.h (4)
eloq_data_store_service/data_store_service.cpp (10)
StartService(242-310)StartService(242-242)Read(392-425)Read(392-395)Read(427-470)Read(427-435)ScanClose(988-1021)ScanClose(988-991)ScanClose(1023-1055)ScanClose(1023-1028)eloq_data_store_service/rocksdb_data_store_common.cpp (5)
Read(416-477)Read(416-416)key(832-832)ScanClose(1047-1071)ScanClose(1047-1047)data_store_service_client.cpp (6)
Read(1088-1097)Read(1088-1093)Read(3504-3520)Read(3504-3509)ScanClose(3771-3795)ScanClose(3771-3776)eloq_data_store_service/data_store.h (1)
DataStore(42-143)
eloq_data_store_service/data_store_service.cpp (4)
eloq_data_store_service/data_store_service.h (4)
IsOwnerOfShard(615-620)PrepareShardingError(525-529)IncreaseWriteReqCount(603-607)DecreaseWriteReqCount(609-613)eloq_data_store_service/data_store_service_config.cpp (6)
IsOwnerOfShard(285-293)IsOwnerOfShard(285-285)IsOwnerOfShard(690-701)IsOwnerOfShard(690-691)PrepareShardingError(904-948)PrepareShardingError(904-905)eloq_data_store_service/rocksdb_data_store_common.cpp (13)
Read(416-477)Read(416-416)key(832-832)DeleteRange(339-414)DeleteRange(339-339)start_key(367-367)start_key(679-679)end_key(382-382)end_key(684-684)ScanNext(735-1045)ScanNext(735-735)ScanClose(1047-1071)ScanClose(1047-1047)data_store_service_client.cpp (10)
Read(1088-1097)Read(1088-1093)Read(3504-3520)Read(3504-3509)DeleteRange(3553-3575)DeleteRange(3553-3560)ScanNext(3696-3729)ScanNext(3696-3710)ScanClose(3771-3795)ScanClose(3771-3776)
🔇 Additional comments (11)
eloq_data_store_service/data_store_service.h (3)
190-190: LGTM: StartService signature simplified for per-shard architecture.The signature change correctly removes per-call parameters, delegating shard ownership queries to
cluster_manager_. The implementation properly initializes all owned shards.
582-613: LGTM: Per-shard accessors correctly implemented.
FetchDSShardStatus,IncreaseWriteReqCount, andDecreaseWriteReqCountproperly use per-shard references with appropriate atomic memory ordering. Note thatFetchDSShardStatuscorrectly verifiesshard_id_equality before returning status.
631-641: LGTM: GetDataStore properly validates shard_id.The method correctly verifies
shard_id_equality before returning the data store pointer, ensuring the shard slot is properly initialized.eloq_data_store_service/data_store_service.cpp (8)
216-221: LGTM: Per-shard initialization correctly implemented.The constructor properly initializes all shard slots with their respective IDs and Closed status, establishing a clean initial state for the per-shard architecture.
236-238: LGTM: Destructor properly cleans up all shards.The destructor correctly iterates through all shard slots and invokes
ShutDown()to ensure proper cleanup.
242-310: LGTM: StartService correctly implements per-shard initialization.The method properly:
- Queries cluster_manager for owned shards
- Uses atomic compare_exchange for safe state transitions (Closed → Starting → final mode)
- Handles both ReadOnly and ReadWrite initial modes
- Provides appropriate error handling and logging
312-390: LGTM: ConnectAndStartDataStore handles per-shard lifecycle correctly.The method properly:
- Uses atomic compare_exchange with spin-wait for concurrent initialization attempts
- Handles both new data store creation and existing data store reinitialization
- Synchronizes with cluster_manager for shard status updates
- Uses appropriate atomic memory ordering
392-425: RPC handlers correctly implement per-shard dispatch pattern.All RPC handlers follow a consistent, correct pattern:
- Extract
shard_idfrom request- Check ownership via
IsOwnerOfShard- Increment write request counter for write operations
- Verify shard status (ReadOnly/ReadWrite for reads; ReadWrite only for writes)
- Dispatch to per-shard
data_store_Note: These rely on
IsOwnerOfShardwhich has a critical bug flagged in the header file review.Also applies to: 472-517, 562-609
1076-1109: LGTM: Scan iterator cache operations correctly use per-shard cache.All scan iterator cache methods properly delegate to the per-shard
scan_iter_cache_member, maintaining session isolation across shards.
1831-1904: LGTM: CloseDataStore and OpenDataStore implement proper per-shard lifecycle with good observability.Both methods correctly:
- Access per-shard state via
ds_ref- Implement progressive state transitions (CloseDataStore: ReadWrite → ReadOnly → Closed)
- Provide detailed logging with shard_id and status for debugging
- Measure and log timing for OpenDataStore operations
Note: CloseDataStore relies on
IsOwnerOfShardwhich has a critical bug flagged in the header file review.
2281-2365: LGTM: Shard state transition methods correctly implement atomic state machine.All three transition methods properly:
- Use per-shard
ds_refreferences- Employ atomic compare_exchange for safe concurrent transitions
- Handle idempotent cases (already in target state)
- Synchronize with
cluster_manager_for cluster-wide consistency- Notably,
SwitchReadWriteToReadOnlycorrectly waits for ongoing write requests to complete before finalizing the transition
7a94594 to
19ea736
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
eloq_data_store_service/rocksdb_data_store.cpp (1)
69-83: Nulldata_store_service_in base classShutdown()or enforce contract.The issue is real. The destructor at line 60-67 checks
data_store_service_ != nullptrand callsShutdown()if true, indicating double-shutdown safety is intended. However:
- Base class
Shutdown()(rocksdb_data_store_common.cpp:291) nullsquery_worker_pool_but notdata_store_service_- Derived class
Shutdown()(rocksdb_data_store.cpp:69-83) also does not nulldata_store_service_- If external code deletes
data_store_service_but this object isn't destroyed, the pointer remains non-null- When destructor runs, it calls
Shutdown()again, which callsdata_store_service_->ForceEraseScanIters()on a dangling pointer—use-after-freeEither null
data_store_service_in the base class after callingForceEraseScanIters(), or document that the service must outlive the data store and thatShutdown()cannot be called if the service is destroyed externally.eloq_data_store_service/internal_request.h (2)
754-763: Pass shard_id when preparing sharding errors
PrepareShardingErrornow expects a shard id, but both the RPC and local Read completion paths still passpartition_id. When the node isn’t the owner this will emit the wrong sharding metadata (likely returning “shard 123” instead of the real shard), breaking client re-routing. Switch these calls to use the request’s shard id.- ds_service_->PrepareShardingError(req_->partition_id(), + ds_service_->PrepareShardingError(req_->shard_id(), resp_->mutable_result()); … - ds_service_->PrepareShardingError(partition_id_, result_); + ds_service_->PrepareShardingError(shard_id_, result_);Also applies to: 855-863
1324-1333: Fix shard_id propagation in Scan error handlingSame issue here: the Scan RPC/local paths still pass
partition_idintoPrepareShardingError, so clients receive the wrong shard metadata on ownership changes. Use the associated shard id instead.- ds_service_->PrepareShardingError(req_->partition_id(), + ds_service_->PrepareShardingError(req_->shard_id(), resp_->mutable_result()); … - ds_service_->PrepareShardingError(partition_id_, result_); + ds_service_->PrepareShardingError(shard_id_, result_);Also applies to: 1513-1522
♻️ Duplicate comments (2)
eloq_data_store_service/data_store_service.h (1)
615-625: Critical: IsOwnerOfShard missing shard_id validation.Unlike
FetchDSShardStatus(line 584) andGetDataStore(line 638),IsOwnerOfSharddoes not validate thatds_ref.shard_id_ == shard_id. This inconsistency could cause it to returntruefor the wrong shard if slots are reused or not properly initialized.The prior review comment explicitly stated: "keep the existing pattern of reading shard_id first and then loading shard_status_" and "return false for non-matching shard_id." This validation is missing.
Apply this diff to add the missing validation:
bool IsOwnerOfShard(uint32_t shard_id) const { const auto &ds_ref = data_shards_.at(shard_id); DLOG(INFO) << "====IsOwnerOfShard shard_id:" << shard_id << " res:" << (int) (ds_ref.shard_status_.load( std::memory_order_acquire) != DSShardStatus::Closed); - return ds_ref.shard_status_.load(std::memory_order_acquire) != - DSShardStatus::Closed; + return ds_ref.shard_id_ == shard_id && + ds_ref.shard_status_.load(std::memory_order_acquire) != + DSShardStatus::Closed; }data_store_service_client.cpp (1)
3985-4001: Replace assert(false) with proper error handling.Lines 3995-4001 contain an
elsebranch withassert(false)(line 3997) followed by fallback logic. This pattern creates inconsistent behavior:
- Debug builds: Process crashes if
Sequences::table_name_is not hash-partitioned.- Release builds: Code continues with fallback
kv_partition_id, potentially masking a configuration error.If the
Sequencestable must be hash-partitioned (as the comment implies), replace the assert and fallback with:else { - assert(false); - // Use the init partition id. - kv_partition_id = KvPartitionIdOf(txservice::Sequences::table_name_); - data_shard_id = GetShardIdByPartitionId(kv_partition_id, false); + LOG(ERROR) << "Sequences table must be hash-partitioned, but is not"; + return false; }This ensures callers can handle the error gracefully.
🧹 Nitpick comments (1)
eloq_data_store_service/data_store_service.h (1)
676-705: Consider resetting shard_id_ in ShutDown() for consistency.The documentation and cleanup logic are good. However,
shard_id_is not reset toUINT32_MAXinShutDown(), even though other methods likeFetchDSShardStatusandGetDataStorecheck for this sentinel value. For defensive coding and consistency with the pattern established elsewhere, consider adding:void ShutDown() { if (data_store_ != nullptr) { data_store_->Shutdown(); data_store_ = nullptr; } if (scan_iter_cache_ != nullptr) { scan_iter_cache_->Clear(); scan_iter_cache_ = nullptr; } + shard_id_ = UINT32_MAX; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
data_store_service_client.cpp(90 hunks)data_store_service_client.h(12 hunks)data_store_service_client_closure.cpp(17 hunks)data_store_service_client_closure.h(34 hunks)data_store_service_scanner.cpp(3 hunks)data_store_service_scanner.h(4 hunks)eloq_data_store_service/CMakeLists.txt(0 hunks)eloq_data_store_service/data_store_service.cpp(45 hunks)eloq_data_store_service/data_store_service.h(14 hunks)eloq_data_store_service/data_store_service_util.h(1 hunks)eloq_data_store_service/ds_request.proto(5 hunks)eloq_data_store_service/internal_request.h(55 hunks)eloq_data_store_service/main.cpp(1 hunks)eloq_data_store_service/rocksdb_cloud_data_store.cpp(5 hunks)eloq_data_store_service/rocksdb_data_store.cpp(1 hunks)eloq_data_store_service/rocksdb_data_store_common.cpp(3 hunks)eloq_data_store_service/rocksdb_data_store_common.h(1 hunks)
💤 Files with no reviewable changes (1)
- eloq_data_store_service/CMakeLists.txt
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-17T06:36:11.137Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 80
File: rocksdb_handler.cpp:1226-1335
Timestamp: 2025-09-17T06:36:11.137Z
Learning: In the EloqData store_handler codebase, the maximum bucket ID is guaranteed to be smaller than UINT16_MAX (65535), so bucket_id + 1 operations do not risk integer overflow when used in FetchBucketData methods.
Applied to files:
data_store_service_client_closure.cppdata_store_service_client.hdata_store_service_client.cpp
📚 Learning: 2025-09-17T11:08:35.322Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 83
File: data_store_service_client_closure.cpp:408-448
Timestamp: 2025-09-17T11:08:35.322Z
Learning: In the DataStoreServiceClient batch write system, the global coordinator (SyncPutAllData) manages concurrency at the partition level, not at the individual batch level. The global unfinished_request_cnt_ is only decremented when an entire partition is fully completed or failed, not after each individual batch within a partition.
Applied to files:
data_store_service_client_closure.cppeloq_data_store_service/data_store_service.cppdata_store_service_client.cpp
🧬 Code graph analysis (13)
data_store_service_client_closure.cpp (1)
data_store_service_client.cpp (1)
callback_data(2575-2581)
eloq_data_store_service/rocksdb_data_store.cpp (2)
eloq_data_store_service/rocksdb_cloud_data_store.cpp (2)
Shutdown(173-194)Shutdown(173-173)eloq_data_store_service/rocksdb_data_store_common.cpp (2)
Shutdown(283-298)Shutdown(283-283)
data_store_service_scanner.h (1)
eloq_data_store_service/internal_request.h (9)
string_view(106-109)string_view(126-129)string_view(273-276)string_view(606-614)string_view(621-629)string_view(818-826)partition_id_(647-647)partition_id_(869-869)partition_id_(1528-1528)
eloq_data_store_service/rocksdb_data_store_common.h (4)
eloq_data_store_service/rocksdb_cloud_data_store.cpp (2)
Shutdown(173-194)Shutdown(173-173)eloq_data_store_service/rocksdb_data_store.cpp (2)
Shutdown(69-83)Shutdown(69-69)eloq_data_store_service/rocksdb_data_store_common.cpp (2)
Shutdown(283-298)Shutdown(283-283)eloq_data_store_service/thread_worker_pool.cpp (2)
Shutdown(91-106)Shutdown(91-91)
data_store_service_scanner.cpp (1)
data_store_service_scanner.h (1)
data_shard_id_(266-266)
eloq_data_store_service/data_store_service.cpp (3)
eloq_data_store_service/data_store_service.h (4)
IsOwnerOfShard(615-625)PrepareShardingError(525-529)IncreaseWriteReqCount(603-607)DecreaseWriteReqCount(609-613)eloq_data_store_service/data_store_service_config.cpp (6)
IsOwnerOfShard(285-293)IsOwnerOfShard(285-285)IsOwnerOfShard(690-701)IsOwnerOfShard(690-691)PrepareShardingError(904-948)PrepareShardingError(904-905)eloq_data_store_service/rocksdb_data_store_common.cpp (3)
key(849-849)ScanNext(752-1062)ScanNext(752-752)
eloq_data_store_service/rocksdb_data_store_common.cpp (5)
eloq_data_store_service/rocksdb_cloud_data_store.cpp (2)
Shutdown(173-194)Shutdown(173-173)eloq_data_store_service/rocksdb_data_store.cpp (2)
Shutdown(69-83)Shutdown(69-69)rocksdb_handler.cpp (4)
Shutdown(2323-2358)Shutdown(2323-2323)Shutdown(3404-3436)Shutdown(3404-3404)eloq_data_store_service/thread_worker_pool.cpp (2)
Shutdown(91-106)Shutdown(91-91)eloq_data_store_service/data_store_service_util.h (1)
shard_id_(123-123)
eloq_data_store_service/internal_request.h (2)
eloq_data_store_service/data_store_service_config.h (1)
uint32_t(107-110)data_store_service_client_closure.h (31)
Reset(81-85)Reset(144-153)Reset(210-216)Reset(269-274)Reset(385-392)Reset(600-619)Reset(933-948)Reset(1154-1177)Reset(1393-1408)Reset(1598-1624)Reset(1627-1658)Reset(1660-1694)Reset(1933-2251)Reset(2328-2347)Reset(2606-2612)Reset(3002-3010)vector(1078-1081)vector(1121-1128)vector(2253-2256)string(804-809)string(2173-2178)string_view(311-368)string_view(752-755)string_view(767-770)string_view(792-797)string_view(1291-1294)string_view(1533-1536)string_view(2133-2136)string_view(2138-2141)string_view(2241-2246)string_view(2453-2456)
data_store_service_client_closure.h (2)
eloq_data_store_service/internal_request.h (13)
partition_id_(647-647)partition_id_(869-869)partition_id_(1528-1528)string_view(106-109)string_view(126-129)string_view(273-276)string_view(606-614)string_view(621-629)string_view(818-826)string_view(988-996)string_view(1123-1131)string_view(1425-1433)string_view(1440-1448)eloq_data_store_service/data_store_service_util.h (1)
shard_id_(123-123)
data_store_service_client.h (1)
data_store_service_client.cpp (24)
FindFreeNodeIndex(3250-3269)FindFreeNodeIndex(3250-3250)InitBucketsInfo(3129-3188)InitBucketsInfo(3129-3133)OnLeaderStart(2969-2986)OnLeaderStart(2969-2970)OnLeaderStop(2988-3002)OnLeaderStop(2988-2988)OnStartFollowing(3011-3061)OnStartFollowing(3011-3015)GetShardIdByPartitionId(3089-3112)GetShardIdByPartitionId(3089-3090)IsLocalShard(3079-3087)IsLocalShard(3079-3079)GetOwnerNodeIndexOfShard(3190-3195)GetOwnerNodeIndexOfShard(3190-3191)GetAllDataShards(3114-3127)GetAllDataShards(3114-3114)UpdateOwnerNodeIndexOfShard(3197-3248)UpdateOwnerNodeIndexOfShard(3197-3198)HandleShardingError(3271-3305)HandleShardingError(3271-3272)UpgradeShardVersion(3307-3355)UpgradeShardVersion(3307-3310)
eloq_data_store_service/rocksdb_cloud_data_store.cpp (3)
eloq_data_store_service/rocksdb_cloud_data_store.h (1)
RocksDBCloudDataStore(38-161)eloq_data_store_service/rocksdb_data_store.cpp (2)
Shutdown(69-83)Shutdown(69-69)eloq_data_store_service/rocksdb_data_store_common.cpp (2)
Shutdown(283-298)Shutdown(283-283)
eloq_data_store_service/data_store_service.h (4)
eloq_data_store_service/data_store_service.cpp (12)
StartService(242-312)StartService(242-242)Read(399-432)Read(399-402)Read(434-477)Read(434-442)ScanClose(995-1028)ScanClose(995-998)ScanClose(1030-1062)ScanClose(1030-1035)TTLWrapperCache(72-77)TTLWrapperCache(79-88)data_store_service_client.cpp (6)
Read(1088-1097)Read(1088-1093)Read(3511-3527)Read(3511-3516)ScanClose(3778-3802)ScanClose(3778-3783)eloq_data_store_service/data_store_service_config.cpp (6)
PrepareShardingError(904-948)PrepareShardingError(904-905)IsOwnerOfShard(285-293)IsOwnerOfShard(285-285)IsOwnerOfShard(690-701)IsOwnerOfShard(690-691)eloq_data_store_service/data_store.h (1)
DataStore(42-143)
data_store_service_client.cpp (3)
data_store_service_client.h (3)
TxPort2DssPort(134-137)KvPartitionIdOf(629-634)Connect(154-159)eloq_data_store_service/data_store_service.cpp (16)
BatchWriteRecords(853-901)BatchWriteRecords(853-857)BatchWriteRecords(1118-1183)BatchWriteRecords(1118-1131)Read(399-432)Read(399-402)Read(434-477)Read(434-442)ScanNext(903-959)ScanNext(903-918)ScanNext(961-993)ScanNext(961-964)DeleteRange(569-616)DeleteRange(569-573)DeleteRange(618-671)DeleteRange(618-625)data_store_service_client_closure.cpp (2)
PartitionBatchCallback(493-538)PartitionBatchCallback(493-496)
🪛 Clang (14.0.6)
eloq_data_store_service/data_store_service_util.h
[error] 24-24: 'atomic' file not found
(clang-diagnostic-error)
🔇 Additional comments (28)
data_store_service_client_closure.cpp (2)
516-521: LGTM: Shard ID propagation in batch write flow.Extracting
data_shard_idfrom the closure and passing it toBatchWriteRecordscorrectly propagates shard context through the partition batch callback.
72-73: Key() method confirmed to returnstd::string_view.Verification at
data_store_service_client_closure.h:767-770confirmsReadClosure::Key()returnsconst std::string_view. The code change fromKey().back()toKey()is correct given this return type.data_store_service_scanner.cpp (1)
417-419: LGTM: Per-partition shard routing correctly implemented.The scanner now computes
data_shard_idper partition usingGetShardIdByPartitionIdand passes it toReset. This aligns with the broader shard-aware architecture.data_store_service_client.h (1)
502-506: Approve: Shard-aware Read signature.The updated
Readsignature now correctly acceptsint32_t partition_idanduint32_t shard_id, aligning with the per-shard routing introduced across the PR.eloq_data_store_service/data_store_service_util.h (1)
24-31: LGTM: Standard library includes for shard-aware containers.The new includes (
atomic,map,memory,set,unordered_map) support per-shard data structures (e.g.,bucket_infos_,dss_shard_ids_) introduced across the PR. The static analysis error about<atomic>not found is a false positive due to the analysis environment lacking system headers.eloq_data_store_service/main.cpp (1)
321-322: LGTM: StartService API simplified to single parameter.The change from
StartService(FLAGS_bootstrap || is_single_node, 0, 0)toStartService(FLAGS_bootstrap || is_single_node)aligns with the per-shard initialization refactoring mentioned in the PR summary.eloq_data_store_service/rocksdb_data_store_common.h (1)
165-165: LGTM: Shutdown method exposed for per-shard cleanup.Adding
void Shutdown() overrideenables centralized cleanup ofquery_worker_pool_and scanner state, as shown in the implementation (rocksdb_data_store_common.cpp:282-297). This supports the per-shard lifecycle refactoring.eloq_data_store_service/ds_request.proto (1)
95-95: No breaking change detected — review comment is incorrect.The current code shows
bytes key_str = 3;as a single (non-repeated) field in both locations (lines 95 and 109). There is no evidence in the codebase that this field was everrepeated bytes. The repetition inFetchRecordsRequestoccurs at the message container level (repeated key keys), not at thekey_strfield level. The review comment appears to mischaracterize the proto structure or assumes a prior state that does not exist in the current codebase.Likely an incorrect or invalid review comment.
eloq_data_store_service/data_store_service.h (9)
190-190: LGTM: StartService signature simplified for multi-shard support.The signature change from
StartService(bool, uint32_t, uint32_t)toStartService(bool)is correct. The implementation now dynamically discovers shards viacluster_manager_.GetShardsForThisNode(), eliminating the need for explicit shard parameters.
220-228: LGTM: Read signature updated for per-shard routing.The method signature correctly reflects the shard-aware architecture:
partition_idchanged toint32_t(consistent with prior review resolutions)shard_idparameter added for explicit shard routing- Changes align with implementation in
data_store_service.cpplines 433-476
256-268: LGTM: BatchWriteRecords signature updated for shard routing.The
shard_idparameter addition at line 258 is consistent with the per-shard routing architecture applied to other methods.
320-327: LGTM: DeleteRange, ScanNext, and ScanClose signatures updated consistently.All three methods correctly adopt the per-shard routing pattern with
int32_t partition_idanduint32_t shard_idparameters, consistent with the Read and BatchWriteRecords changes.Also applies to: 403-417, 440-445
525-529: LGTM: PrepareShardingError correctly delegates to cluster manager.The delegation to
cluster_manager_.PrepareShardingError(shard_id, result)properly resolves the critical issue flagged in prior reviews where topology routing information was lost. The cluster manager's implementation populates all necessary routing metadata for client failover.
582-590: LGTM: FetchDSShardStatus properly validates shard initialization.The method correctly checks
ds_ref.shard_id_ == shard_idbefore returning the shard status, ensuring the shard slot is initialized for the requested shard_id.
603-613: LGTM: Write request counters operate directly on shard slots.These methods directly access
data_shards_.at(shard_id).ongoing_write_requests_without validatingshard_id_. This is acceptable since:
- They're called in hot paths after ownership validation
- Atomic operations are safe even on uninitialized slots (default value 0)
- Performance matters here
636-646: LGTM: GetDataStore properly validates shard initialization.The method correctly validates
data_shards_.at(shard_id).shard_id_ == shard_idbefore returning the data store pointer, consistent with the defensive pattern inFetchDSShardStatus.
707-707: LGTM: Per-shard data architecture established.The
std::array<DataShard, 1000> data_shards_provides fixed-capacity shard storage with bounds-checked access via.at(). The limit of 1000 shards is a reasonable upper bound for the system's scale.data_store_service_client_closure.h (6)
499-510: LGTM: ReadBaseForArchiveCallbackData updated consistently.The
AddResultmethod andpartition_id_member correctly useint32_t, consistent with the partition_id type change applied throughout the codebase.Also applies to: 532-532
600-899: LGTM: ReadClosure updated consistently for per-shard routing.The closure correctly integrates shard-aware changes:
int32_t partition_id_withINT32_MAXsentinel (line 631)uint32_t shard_id_withUINT32_MAXsentinel (line 632)std::string_view key_simplifies key storage (line 633)PrepareRequestproperly populatesshard_idandkey_str(lines 660, 662)- Accessors return correct types (lines 757-770)
- Error handling updates owner index with
shard_id_(line 688)All changes are consistent with the per-shard routing architecture.
1154-1366: LGTM: DeleteRangeClosure updated consistently.The closure follows the same pattern as
ReadClosure:
int32_t partition_idanduint32_t shard_idparameters inReset(lines 1156-1157)PrepareRequestpopulatesshard_idin RPC request (line 1200)- Error handling propagates
shard_id_(line 1228)- Accessors added (lines 1296-1304)
1598-1892: LGTM: BatchWriteRecordsClosure updated consistently.Both
Resetoverloads correctly handle the new parameters:
- Lines 1629-1630: First overload takes
int32_t partition_id, uint32_t shard_id- Lines 1662-1663: Second overload takes the same types
Clear()properly resets sentinels (lines 1605-1606)PrepareRequestpopulatesshard_id(line 1777)- Accessors provided (lines 1850-1858)
1901-2294: LGTM: ScanNextClosure updated consistently.The closure integrates shard-aware changes following the established pattern:
Resettakesint32_t partition_id, uint32_t shard_id(lines 1936-1937)- Sentinels properly set in
Clear()(lines 1913-1914)- RPC request populated with
shard_id(line 2008)- Accessors provided (lines 2123-2131)
2930-2958: LGTM: FetchArchivesCallbackData updated consistently.The constructor parameter and member both use
int32_t partition_id(lines 2931, 2949), consistent with the codebase-wide type change.data_store_service_client.cpp (5)
222-242: LGTM: Bootstrap guard improves efficiency.The
need_bootstrap_check prevents unnecessary initialization attempts when the client is already bootstrapped.
3114-3127: LGTM: Thread-safe shard enumeration.Correctly uses
shared_lockto protect read access todss_shard_ids_, addressing the concurrency concern raised in earlier reviews. The returned vector preserves sorted order from the underlyingstd::set.
3129-3188: LGTM: Thread-safe and deterministic bucket initialization.The use of
std::mt19937with node-group-ID seeding ensures:
- Thread-safe random generation (no shared state).
- Deterministic bucket-to-shard mapping across cluster nodes.
The virtual-node consistent-hashing logic correctly distributes buckets, and the early break (lines 3157-3163) guards against excessive node groups.
2969-3061: LGTM: Shard lifecycle management is consistent.The
bind_data_shard_with_ng_flag correctly controls whether shard open/close operations are performed. The three callbacks (OnLeaderStart,OnLeaderStop,OnStartFollowing) consistently:
- Guard shard operations behind the flag.
- Pass
ng_idtoOpenDataStore/CloseDataStore.- Maintain symmetry (open on leader start, close on leader stop/follower start).
3089-3112: Verify Sharder contract and shard ID 0 validity.The review identifies a real pattern concern, but resolution depends on external context:
Assert-then-check redundancy depends on Sharder guarantees: The
InitBucketsInfomethod populatesbucket_infos_with ALL bucket IDs in range[0, txservice::total_range_buckets). IfSharder::MapRangeIdToBucketIdandSharder::MapHashPartitionIdToBucketIdalways return valid IDs within that range, the assert at line 3104 should never fire and the subsequentifcheck (line 3105) becomes dead code. However, if Sharder can return out-of-range bucket IDs, the assert would fail in debug builds while line 3111's fallback silently returns0in release builds—masking the error.Returning 0 as error sentinel is problematic: The code returns
0when a bucket is not found, but there's no indication whether0is a valid shard ID or an error sentinel. Callers use the returned value directly without special handling for0, suggesting they cannot distinguish between a valid shard0and an error condition.To confirm the safety of this pattern:
- Verify that
txservice::Sharderguarantees returned bucket IDs are always< txservice::total_range_buckets- Confirm whether shard ID
0is a valid assignment or should be reserved as an error code- Document the intended behavior when a bucket lookup fails (crash on invalid input vs. graceful degradation)
| if (shard_id >= dss_shards_.size()) | ||
| { | ||
| LOG(FATAL) << "Shard id " << shard_id | ||
| << " is out of range, should expand the hard-coded " | ||
| "dss_shards_ size."; | ||
| } |
There was a problem hiding this comment.
Add bounds validation for shard_id to prevent array overrun.
The constructor now checks shard_id >= dss_shards_.size() and logs a fatal error, but this only covers initialization. Runtime accesses at lines 659 and elsewhere (e.g., GetOwnerNodeIndexOfShard, UpdateOwnerNodeIndexOfShard) directly index dss_shards_[shard_id] without bounds checks. A shard_id ≥ 1000 from a malicious/buggy caller triggers undefined behavior.
Add validation before array access:
uint32_t GetOwnerNodeIndexOfShard(uint32_t shard_id) const
{
if (shard_id >= dss_shards_.size()) {
LOG(ERROR) << "shard_id " << shard_id << " exceeds array bounds";
return UINT32_MAX;
}
assert(dss_shards_[shard_id].load(std::memory_order_acquire) != UINT32_MAX);
return dss_shards_[shard_id].load(std::memory_order_acquire);
}Apply similar checks in UpdateOwnerNodeIndexOfShard, UpgradeShardVersion, and other methods that index dss_shards_.
🤖 Prompt for AI Agents
In data_store_service_client.h around lines 94 to 99 and throughout the file
(notably methods that index dss_shards_ such as GetOwnerNodeIndexOfShard,
UpdateOwnerNodeIndexOfShard, UpgradeShardVersion and any other direct
dss_shards_[shard_id] accesses), add explicit bounds validation before accessing
dss_shards_: check if shard_id >= dss_shards_.size(), log an ERROR with the
offending shard_id, and return a sentinel (e.g., UINT32_MAX) for getter-style
methods; for mutators, log and early-return or return a failure code as
appropriate; after the bounds check you may keep the existing assert and
original load/store with the proper memory order. Ensure every direct
dss_shards_[shard_id] access is guarded this way to prevent out-of-range
indexing.
c39a22d to
7a75157
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (11)
data_store_service_client.cpp (5)
3022-3026: Wrong shard closed on follower start; closes shard 0 instead of ng_idCalling CloseDataStore(0) will close the wrong shard when ng_id ≠ 0. Use ng_id (the DSS shard id) to avoid data unavailability.
- // Now, only support one shard. - data_store_service_->CloseDataStore(0); + // Close the shard corresponding to this node group. + data_store_service_->CloseDataStore(ng_id);
136-147: Fix typos in log/comments for clarityMinor nits: “Cliet” → “Client”; “Binded” → “Bound”; “swith” → “switch”.
- LOG(INFO) << "DataStoreServiceCliet UpgradeShardVersion success, " + LOG(INFO) << "DataStoreServiceClient UpgradeShardVersion success, " ... - LOG(INFO) - << "DataStoreServiceCliet SetupConfig skipped, current_version:" + LOG(INFO) + << "DataStoreServiceClient SetupConfig skipped, current_version:" ... - // Binded data store shard with ng. + // Bound data store shard with ng. ... - // swith to read only in case of data store status is read write + // switch to read-only in case data store status is read-writeAlso applies to: 2979-2981, 2995-2999
3191-3195: Add shard_id bounds check to prevent array overrunDirectly indexing dss_shards_ without validating shard_id risks OOB access. Return a sentinel on invalid input.
uint32_t DataStoreServiceClient::GetOwnerNodeIndexOfShard( uint32_t shard_id) const { - assert(dss_shards_[shard_id].load(std::memory_order_acquire) != UINT32_MAX); - return dss_shards_[shard_id].load(std::memory_order_acquire); + if (shard_id >= dss_shards_.size()) { + LOG(ERROR) << "shard_id " << shard_id << " exceeds dss_shards_ bounds"; + return UINT32_MAX; + } + auto idx = dss_shards_[shard_id].load(std::memory_order_acquire); + assert(idx != UINT32_MAX); + return idx; }
3312-3319: Return failure for invalid shard_id; avoid assert+FATAL pathIf shard_id is out of range, log and return false so callers can handle it gracefully.
- if (shard_id >= dss_shards_.size()) - { - assert(false); - // Now only support one shard. - LOG(FATAL) << "Shard id not found, shard_id: " << shard_id; - return true; - } + if (shard_id >= dss_shards_.size()) { + LOG(ERROR) << "Shard id not found, shard_id: " << shard_id; + return false; + }
4195-4245: Header/implementation mismatch: record_tmp_mem_area uses size_t in .cpp but uint64_t in headerThe two SerializeTxRecord overloads here take std::vector<size_t>&, but the header declares std::vector<uint64_t>&. This will not compile. Align the header to size_t (or switch implementation to uint64_t consistently). Suggest updating the header to size_t to match current implementation.
Proposed header fix shown in data_store_service_client.h review.
data_store_service_client.h (1)
416-431: Make SerializeTxRecord signatures consistent with implementation (size_t vs uint64_t)Implementation uses std::vector<size_t>& for record_tmp_mem_area. Header declares std::vector<uint64_t>&, causing a compile error. Align to size_t.
- static void SerializeTxRecord(bool is_deleted, - const txservice::TxRecord *rec, - std::vector<uint64_t> &record_tmp_mem_area, - std::vector<std::string_view> &record_parts, - size_t &write_batch_size); + static void SerializeTxRecord(bool is_deleted, + const txservice::TxRecord *rec, + std::vector<size_t> &record_tmp_mem_area, + std::vector<std::string_view> &record_parts, + size_t &write_batch_size); - static void SerializeTxRecord(const txservice::TxRecord *rec, - std::vector<uint64_t> &record_tmp_mem_area, - std::vector<std::string_view> &record_parts, - size_t &write_batch_size); + static void SerializeTxRecord(const txservice::TxRecord *rec, + std::vector<size_t> &record_tmp_mem_area, + std::vector<std::string_view> &record_parts, + size_t &write_batch_size);eloq_data_store_service/internal_request.h (2)
754-763: Pass shard_id, not partition_id, to PrepareShardingError.The method signature expects
uint32_t shard_id, but these SetFinish() implementations call it withpartition_id. Four occurrences need fixes:- ds_service_->PrepareShardingError(req_->partition_id(), + ds_service_->PrepareShardingError(req_->shard_id(), resp_->mutable_result());- ds_service_->PrepareShardingError(partition_id_, result_); + ds_service_->PrepareShardingError(shard_id_, result_);- ds_service_->PrepareShardingError(req_->partition_id(), + ds_service_->PrepareShardingError(req_->shard_id(), resp_->mutable_result());- ds_service_->PrepareShardingError(partition_id_, result_); + ds_service_->PrepareShardingError(shard_id_, result_);Applies to: lines 759, 858, 1327, 1518
523-531: Update GetPartitionId() and GetShardId() overrides from uint32_t to int32_t throughout the file.Verification confirms multiple GetPartitionId() and GetShardId() overrides still return uint32_t when they should return int32_t to match the interface contract:
- GetPartitionId() at lines 523, 611, 729, 828, 1235, 1430 use uint32_t (should be int32_t)
- GetShardId() at lines 121, 268, 383, 450, 528, 616, 734, 833, 927, 993, 1062, 1128, 1240, 1435, 1594, 1669 use uint32_t (should be int32_t)
Update all these method signatures to return
int32_t.data_store_service_scanner.h (1)
235-238: Fix return type: align GetPartitionId() with int32_t member.
partition_id_isint32_t, but the getter returnsuint32_t. This implicit conversion can corrupt negative values and break comparisons. Change the getter to returnint32_t.- uint32_t GetPartitionId() + int32_t GetPartitionId() { return partition_id_; }eloq_data_store_service/rocksdb_cloud_data_store.cpp (1)
238-251: Fix unconditional TLS verification disablement in S3 client configuration.Disabling TLS verification (
config.verifySSL = false;) weakens transport security and enables man-in-the-middle attacks. The AWS SDK defaults toverifySSL = true; this setting should not override that default.Affected locations:
eloq_data_store_service/rocksdb_cloud_data_store.cpp:250eloq_data_store_service/rocksdb_cloud_dump.cpp:218Fix: Remove the line
config.verifySSL = false;entirely, or explicitly set it totrueto respect the AWS SDK default:- // Disable SSL verification for HTTPS - config.verifySSL = false; + // Enable SSL verification (AWS SDK default) + config.verifySSL = true;If certificate verification must be made configurable in the future, add a
disable_tls_verify_field to theRocksDBCloudConfigstructure and gate the disablement behind that configuration flag. For now, keep verification enabled by default.data_store_service_client_closure.h (1)
602-619: Fix std::string_view lifetime issue in ReadClosure by storing owned strings.The
key_member is stored as non-owningstd::string_viewbut must remain valid throughout the async operation. The publicRead()API accepts temporarystd::string_viewparameters and passes them to the closure, creating a dangling reference risk if the caller's strings go out of scope before the closure completes.Change
std::string_view key_tostd::string key_(line ~892) and update the assignment at line 614:key_ = std::string(key);. Apply the same fix totable_name_and other non-owningstd::string_viewmembers stored in closures.
♻️ Duplicate comments (4)
eloq_data_store_service/ds_request.proto (1)
141-146: Proto wire-compat break: don’t renumber existing fieldsskip_wal/items shifted (3→4 and 4→5). This breaks compatibility. Add shard_id using a new field number; keep existing numbers.
message BatchWriteRecordsRequest { message Item { bytes key = 1; bytes value = 2; WriteOpType op_type = 3; uint64 ts = 4; uint64 ttl = 5; } string kv_table_name = 1; int32 partition_id = 2; - uint32 shard_id = 3; - bool skip_wal = 4; - repeated Item items = 5; + bool skip_wal = 3; // preserve original number + repeated Item items = 4; // preserve original number + uint32 shard_id = 6; // new field number }data_store_service_client.h (1)
648-653: Fix @param name to match signatureDoc says partition_id; function takes shard_id.
- * @param partition_id + * @param shard_ideloq_data_store_service/rocksdb_cloud_data_store.cpp (2)
173-194: Null out service pointer after shutdown to prevent re-entry/UAF.After
RocksDBDataStoreCommon::Shutdown()(which clears the worker pool), also setdata_store_service_ = nullptrlocally. This mirrors the guidance for the base RocksDB store and avoids accidental reuse.void RocksDBCloudDataStore::Shutdown() { RocksDBDataStoreCommon::Shutdown(); + data_store_service_ = nullptr; @@ if (db_ != nullptr) {
155-163: Stop mutating a const member; normalize object_path before initializing cloud_config_.Writing via const_cast on a const member is undefined behaviour. Compute a normalized path/copy up front and avoid mutating cloud_config_.
Example fix in this file (plus a small header change to add normalized_object_path_ and use it):
@@ - cloud_config_(cloud_config), + cloud_config_(cloud_config), cloud_fs_(), cloud_env_(nullptr), db_(nullptr) { - auto *cloud_config_ptr = - const_cast<EloqDS::RocksDBCloudConfig *>(&cloud_config_); - assert(!cloud_config_ptr->object_path_.empty()); - if (cloud_config_ptr->object_path_.back() != '/') - { - cloud_config_ptr->object_path_.append("/"); - } - cloud_config_ptr->object_path_.append("ds_" + std::to_string(shard_id)); + std::string obj_path = cloud_config.object_path_; + assert(!obj_path.empty()); + if (obj_path.back() != '/') obj_path.push_back('/'); + normalized_object_path_ = std::move(obj_path) + "ds_" + std::to_string(shard_id); }And use the normalized path when wiring cfs options:
@@ - cfs_options_.src_bucket.SetObjectPath(cloud_config_.object_path_); + cfs_options_.src_bucket.SetObjectPath(normalized_object_path_); @@ - cfs_options_.dest_bucket.SetObjectPath(cloud_config_.object_path_); + cfs_options_.dest_bucket.SetObjectPath(normalized_object_path_);Add a private member in rocksdb_cloud_data_store.h:
- const EloqDS::RocksDBCloudConfig cloud_config_; + const EloqDS::RocksDBCloudConfig cloud_config_; + std::string normalized_object_path_;
🧹 Nitpick comments (4)
data_store_service_client.cpp (1)
251-255: Avoid assert(false) in production pathsScheduleTimerTasks is a virtual API; asserting unconditionally may crash debug builds. Prefer returning after logging.
void DataStoreServiceClient::ScheduleTimerTasks() { - LOG(ERROR) << "ScheduleTimerTasks not implemented"; - assert(false); + LOG(ERROR) << "ScheduleTimerTasks not implemented"; + return; }data_store_service_client.h (1)
747-750: Update outdated comment about shard supportComment says “Now only support one shard” but code supports many shards (arrays size 1000). Refresh the comment to avoid confusion.
- // Now only support one shard. dss_shards_ caches the index in dss_nodes_ of - // shard owner. + // dss_shards_[shard_id] caches the index in dss_nodes_ of the shard owner.eloq_data_store_service/data_store_service.cpp (1)
1014-1019: Remove asserts in RPC error paths.
assert(false);inside handlers will crash the process on reachable error states; return structured errors instead.- if (shard_status != DSShardStatus::ReadWrite && - shard_status != DSShardStatus::ReadOnly) - { - assert(false); + if (shard_status != DSShardStatus::ReadWrite && + shard_status != DSShardStatus::ReadOnly) + {Also applies to: 1049-1054
eloq_data_store_service/data_store_service.h (1)
680-694: Optional hygiene: reset shard_id_ on shutdown.Not required for correctness here, but resetting
shard_id_toUINT32_MAXinDataShard::ShutDown()avoids stale identity and aligns with guard patterns elsewhere.void ShutDown() { if (data_store_ != nullptr) { data_store_->Shutdown(); data_store_ = nullptr; } if (scan_iter_cache_ != nullptr) { scan_iter_cache_->Clear(); scan_iter_cache_ = nullptr; } + shard_id_ = UINT32_MAX; }Also applies to: 695-700
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
data_store_service_client.cpp(90 hunks)data_store_service_client.h(12 hunks)data_store_service_client_closure.cpp(17 hunks)data_store_service_client_closure.h(34 hunks)data_store_service_scanner.cpp(3 hunks)data_store_service_scanner.h(4 hunks)eloq_data_store_service/CMakeLists.txt(0 hunks)eloq_data_store_service/data_store_service.cpp(45 hunks)eloq_data_store_service/data_store_service.h(14 hunks)eloq_data_store_service/data_store_service_util.h(1 hunks)eloq_data_store_service/ds_request.proto(5 hunks)eloq_data_store_service/internal_request.h(55 hunks)eloq_data_store_service/main.cpp(1 hunks)eloq_data_store_service/rocksdb_cloud_data_store.cpp(5 hunks)eloq_data_store_service/rocksdb_data_store.cpp(1 hunks)eloq_data_store_service/rocksdb_data_store_common.cpp(3 hunks)eloq_data_store_service/rocksdb_data_store_common.h(1 hunks)
💤 Files with no reviewable changes (1)
- eloq_data_store_service/CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (3)
- eloq_data_store_service/rocksdb_data_store.cpp
- eloq_data_store_service/rocksdb_data_store_common.h
- data_store_service_scanner.cpp
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-17T11:08:35.322Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 83
File: data_store_service_client_closure.cpp:408-448
Timestamp: 2025-09-17T11:08:35.322Z
Learning: In the DataStoreServiceClient batch write system, the global coordinator (SyncPutAllData) manages concurrency at the partition level, not at the individual batch level. The global unfinished_request_cnt_ is only decremented when an entire partition is fully completed or failed, not after each individual batch within a partition.
Applied to files:
eloq_data_store_service/data_store_service.cppdata_store_service_client_closure.cppdata_store_service_client.cpp
📚 Learning: 2025-09-17T06:36:11.137Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 80
File: rocksdb_handler.cpp:1226-1335
Timestamp: 2025-09-17T06:36:11.137Z
Learning: In the EloqData store_handler codebase, the maximum bucket ID is guaranteed to be smaller than UINT16_MAX (65535), so bucket_id + 1 operations do not risk integer overflow when used in FetchBucketData methods.
Applied to files:
data_store_service_client_closure.cppdata_store_service_client.cppdata_store_service_client.h
🧬 Code graph analysis (10)
data_store_service_scanner.h (1)
eloq_data_store_service/internal_request.h (9)
string_view(106-109)string_view(126-129)string_view(273-276)string_view(606-614)string_view(621-629)string_view(818-826)partition_id_(647-647)partition_id_(869-869)partition_id_(1528-1528)
eloq_data_store_service/rocksdb_cloud_data_store.cpp (3)
eloq_data_store_service/rocksdb_cloud_data_store.h (1)
RocksDBCloudDataStore(38-161)eloq_data_store_service/rocksdb_data_store.cpp (2)
Shutdown(69-83)Shutdown(69-69)eloq_data_store_service/rocksdb_data_store_common.cpp (2)
Shutdown(283-298)Shutdown(283-283)
eloq_data_store_service/rocksdb_data_store_common.cpp (4)
eloq_data_store_service/rocksdb_cloud_data_store.cpp (2)
Shutdown(173-194)Shutdown(173-173)eloq_data_store_service/rocksdb_data_store.cpp (2)
Shutdown(69-83)Shutdown(69-69)eloq_data_store_service/thread_worker_pool.cpp (2)
Shutdown(91-106)Shutdown(91-91)eloq_data_store_service/data_store_service_util.h (1)
shard_id_(123-123)
eloq_data_store_service/data_store_service.cpp (3)
eloq_data_store_service/data_store_service.h (4)
IsOwnerOfShard(615-620)PrepareShardingError(525-529)IncreaseWriteReqCount(603-607)DecreaseWriteReqCount(609-613)data_store_service_client.cpp (12)
Read(1088-1097)Read(1088-1093)Read(3511-3527)Read(3511-3516)DeleteRange(3560-3582)DeleteRange(3560-3567)ScanNext(3703-3736)ScanNext(3703-3717)ScanClose(3778-3802)ScanClose(3778-3783)BatchWriteRecords(4103-4138)BatchWriteRecords(4103-4116)eloq_data_store_service/rocksdb_data_store_common.cpp (15)
Read(433-494)Read(433-433)key(849-849)DeleteRange(356-431)DeleteRange(356-356)start_key(384-384)start_key(696-696)end_key(399-399)end_key(701-701)ScanNext(752-1062)ScanNext(752-752)ScanClose(1064-1088)ScanClose(1064-1064)BatchWriteRecords(496-658)BatchWriteRecords(496-497)
data_store_service_client_closure.cpp (1)
data_store_service_client.cpp (1)
callback_data(2575-2581)
data_store_service_client_closure.h (2)
eloq_data_store_service/internal_request.h (13)
partition_id_(647-647)partition_id_(869-869)partition_id_(1528-1528)string_view(106-109)string_view(126-129)string_view(273-276)string_view(606-614)string_view(621-629)string_view(818-826)string_view(988-996)string_view(1123-1131)string_view(1425-1433)string_view(1440-1448)eloq_data_store_service/data_store_service_util.h (1)
shard_id_(123-123)
data_store_service_client.cpp (3)
data_store_service_client.h (3)
need_bootstrap_(688-859)KvPartitionIdOf(629-634)bind_data_shard_with_ng_(689-813)eloq_data_store_service/data_store_service.cpp (16)
BatchWriteRecords(853-901)BatchWriteRecords(853-857)BatchWriteRecords(1118-1183)BatchWriteRecords(1118-1131)Read(399-432)Read(399-402)Read(434-477)Read(434-442)ScanNext(903-959)ScanNext(903-918)ScanNext(961-993)ScanNext(961-964)DeleteRange(569-616)DeleteRange(569-573)DeleteRange(618-671)DeleteRange(618-625)data_store_service_client_closure.cpp (2)
PartitionBatchCallback(493-538)PartitionBatchCallback(493-496)
data_store_service_client.h (3)
eloq_data_store_service/data_store_service_config.h (2)
DataStoreServiceClusterManager(317-513)uint32_t(107-110)data_store_service_client.cpp (24)
FindFreeNodeIndex(3250-3269)FindFreeNodeIndex(3250-3250)InitBucketsInfo(3129-3188)InitBucketsInfo(3129-3133)OnLeaderStart(2969-2986)OnLeaderStart(2969-2970)OnLeaderStop(2988-3002)OnLeaderStop(2988-2988)OnStartFollowing(3011-3061)OnStartFollowing(3011-3015)GetShardIdByPartitionId(3089-3112)GetShardIdByPartitionId(3089-3090)IsLocalShard(3079-3087)IsLocalShard(3079-3079)GetOwnerNodeIndexOfShard(3190-3195)GetOwnerNodeIndexOfShard(3190-3191)GetAllDataShards(3114-3127)GetAllDataShards(3114-3114)UpdateOwnerNodeIndexOfShard(3197-3248)UpdateOwnerNodeIndexOfShard(3197-3198)HandleShardingError(3271-3305)HandleShardingError(3271-3272)UpgradeShardVersion(3307-3355)UpgradeShardVersion(3307-3310)eloq_data_store_service/data_store_service_config.cpp (4)
GetShardIdByPartitionId(802-807)GetShardIdByPartitionId(802-803)HandleShardingError(950-992)HandleShardingError(950-951)
eloq_data_store_service/internal_request.h (2)
eloq_data_store_service/data_store_service_config.h (1)
uint32_t(107-110)data_store_service_client_closure.h (31)
Reset(81-85)Reset(144-153)Reset(210-216)Reset(269-274)Reset(385-392)Reset(600-619)Reset(933-948)Reset(1154-1177)Reset(1393-1408)Reset(1598-1624)Reset(1627-1658)Reset(1660-1694)Reset(1933-2251)Reset(2328-2347)Reset(2606-2612)Reset(3002-3010)vector(1078-1081)vector(1121-1128)vector(2253-2256)string(804-809)string(2173-2178)string_view(311-368)string_view(752-755)string_view(767-770)string_view(792-797)string_view(1291-1294)string_view(1533-1536)string_view(2133-2136)string_view(2138-2141)string_view(2241-2246)string_view(2453-2456)
eloq_data_store_service/data_store_service.h (4)
eloq_data_store_service/data_store_service.cpp (12)
StartService(242-312)StartService(242-242)Read(399-432)Read(399-402)Read(434-477)Read(434-442)ScanClose(995-1028)ScanClose(995-998)ScanClose(1030-1062)ScanClose(1030-1035)TTLWrapperCache(72-77)TTLWrapperCache(79-88)data_store_service_client.cpp (6)
Read(1088-1097)Read(1088-1093)Read(3511-3527)Read(3511-3516)ScanClose(3778-3802)ScanClose(3778-3783)eloq_data_store_service/data_store_service_config.cpp (6)
PrepareShardingError(904-948)PrepareShardingError(904-905)IsOwnerOfShard(285-293)IsOwnerOfShard(285-285)IsOwnerOfShard(690-701)IsOwnerOfShard(690-691)eloq_data_store_service/data_store.h (1)
DataStore(42-143)
🪛 Clang (14.0.6)
eloq_data_store_service/data_store_service_util.h
[error] 24-24: 'atomic' file not found
(clang-diagnostic-error)
🔇 Additional comments (16)
eloq_data_store_service/data_store_service_util.h (1)
24-29: Based on my verification, I now have a clear picture. The includes are used by files that depend on this header through transitive includes, not by this file directly. Let me finalize the review:Clarify transitive include pattern and verify atomic compilation issue.
The newly added includes are used by dependent files (
data_store_service_scanner.h,data_store_service.h,data_store_service_config.h) but through transitive includes rather than direct usage indata_store_service_util.hitself. Specifically:
data_store_service_scanner.huses<atomic>and appears to rely on util.h to provide itdata_store_service.huses<atomic>,<map>,<set>, and<unordered_map>, but already includes some of these directlyWhile this transitive include pattern makes the includes functional, it violates the principle that each header file should directly include only what it uses. Each dependent file should include what it directly needs rather than relying on a util header to provide them.
Additionally, the static analysis error reporting
<atomic>not found should be investigated—this is a standard C++ header and should be available.Recommendations:
- Remove these includes from
data_store_service_util.hand have each dependent file include what it directly uses- Verify whether the
<atomic>compilation error is a real blocker or a false positive in your build environmentdata_store_service_client_closure.cpp (5)
72-73: LGTM: Consistent key extraction refactoring.The change from
Key().back()toKey()is applied consistently across all callbacks, indicating a systematic refactoring of the key handling approach.Also applies to: 96-96
265-265: LGTM: Simplified key handling.Using
key_strdirectly from the scan result aligns with the key handling refactor throughout the file.
278-287: LGTM: Correct shard-aware routing implementation.The code correctly derives the shard ID from bucket→partition→shard mapping and propagates it to the continuation scan. This aligns with the multi-shard support objectives.
516-521: LGTM: Shard context preserved across batch continuations.The code correctly extracts and propagates the shard ID from the closure to the next batch write operation, maintaining shard affinity throughout the batch processing flow.
619-619: LGTM: Comprehensive shard context propagation.Shard IDs are consistently extracted from closures and propagated to all downstream Read/ScanNext operations across multiple callbacks, ensuring proper shard affinity throughout operation chains.
Also applies to: 688-688, 821-821, 919-919, 1016-1016, 1134-1134, 1241-1241, 1310-1310, 1419-1419, 1445-1445
eloq_data_store_service/rocksdb_data_store_common.cpp (2)
283-298: LGTM: Clean per-shard shutdown implementation.The method properly cleans up resources with defensive null checks and allows for DataStore reuse by nullifying the worker pool (which will be re-created in Initialize()). The ForceEraseScanIters call ensures scan iterators are cleaned up for this specific shard.
444-455: LGTM: Cleaner BuildKey API.The refactoring to extract the key first and then pass it directly to BuildKey (instead of passing the entire ReadRequest) results in a cleaner, more explicit API with better separation of concerns.
eloq_data_store_service/main.cpp (1)
321-323: StartService signature change verified as consistentThe signature update has been properly applied: the header declaration, implementation, and call site in main.cpp all align with the new
bool create_db_if_missingparameter. No additional call sites or inconsistencies detected.data_store_service_client_closure.h (7)
499-510: LGTM: Partition ID type change is consistent.The change from
uint32_ttoint32_tfor partition_id aligns with the broader shard-aware refactoring and is consistent between the method signature and member variable.Also applies to: 532-532
660-662: LGTM: Shard ID propagation is correct.The shard_id is properly set in the RPC request (line 660) and correctly used in retry logic (line 688). The direct usage of
shard_id_instead of deriving it frompartition_id_is the correct approach for per-shard routing.Also applies to: 688-688
1156-1177: LGTM: DeleteRangeClosure properly updated for shard awareness.The closure correctly:
- Accepts both
partition_id(int32_t) andshard_id(uint32_t) in Reset()- Populates
shard_idin the RPC request- Uses
shard_id_directly in retry logic- Exposes public accessors for both IDs
Also applies to: 1200-1200, 1228-1228, 1296-1304
1605-1606: LGTM: BatchWriteRecordsClosure shard-aware updates are complete.All Reset() overloads properly accept and store both
partition_id(int32_t) andshard_id(uint32_t). The RPC request population and retry logic correctly useshard_id_. Public accessors are properly exposed.Also applies to: 1629-1694, 1777-1777, 1850-1858
1933-1982: LGTM: ScanNextClosure correctly implements shard-aware routing.The closure properly:
- Accepts and stores both
partition_id(int32_t) andshard_id(uint32_t)- Sets
shard_idin the RPC request- Uses
shard_id_in owner node updates during retry- Provides public accessors with correct return types
Also applies to: 2008-2008, 2055-2055, 2123-2131
2930-2949: LGTM: FetchArchivesCallbackData partition_id type updated consistently.The constructor parameter and member variable are both changed to
int32_t, maintaining consistency with the shard-aware refactoring.
933-977: LGTM: FlushDataClosure correctly handles multi-shard operations.The closure uses a vector of
shard_ids_and processes them sequentially. The RPC request correctly sets the current shard_id (line 974), and retry logic properly uses it for owner node updates (line 998-999).Also applies to: 998-999
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
data_store_service_client.cpp (1)
2969-3060: Fix misleading comment in OnLeaderStop.The comment at line 2995 says "swith to read only in case of data store status is read write," but
CloseDataStore()actually performs a full close transition (ReadWrite → ReadOnly → Closed). The shard lifecycle logic is sound and correctly binds ng_id to shard operations whenbind_data_shard_with_ng_is enabled. The 1:1 ng_id/shard_id mapping is implicit and validated by the assert at line 3034 inOnStartFollowing.Update line 2995 comment to accurately describe the full close operation:
// Close the data store shard (transitions ReadWrite → ReadOnly → Closed)
🧹 Nitpick comments (2)
data_store_service_client.cpp (2)
3101-3108: Redundant check after assertion.Line 3102 asserts that the bucket exists, then line 3103 checks the same condition. Since assertions are enabled in debug builds and the bucket_infos_ map is guaranteed to contain valid bucket_ids, either remove the assert (if you want graceful error handling) or remove the if check (if invariant is guaranteed).
Current pattern:
auto it = bucket_infos_.find(bucket_id); assert(it != bucket_infos_.end()); // Will abort in debug if false if (it != bucket_infos_.end()) // Redundant given the assert { ... }Apply this diff to remove the redundant check:
auto it = bucket_infos_.find(bucket_id); assert(it != bucket_infos_.end()); - if (it != bucket_infos_.end()) - { - uint32_t shard_id = it->second->BucketOwner(); - assert(dss_shard_ids_.find(shard_id) != dss_shard_ids_.end()); - return shard_id; - } + uint32_t shard_id = it->second->BucketOwner(); + assert(dss_shard_ids_.find(shard_id) != dss_shard_ids_.end()); + return shard_id; - LOG(ERROR) << "Bucket not found for partition_id=" << partition_id - << " (bucket_id=" << bucket_id << ")"; - return UINT32_MAX;
3193-3197: Unreachable return after LOG(FATAL).
LOG(FATAL)terminates the process, making the subsequentreturn UINT32_MAXunreachable. Either change toLOG(ERROR)if you want to return an error value, or remove the return statement if process termination is intended.If graceful error handling is desired:
if (shard_id >= dss_shards_.size()) { - LOG(FATAL) << "shard_id " << shard_id << " exceeds array bounds"; + LOG(ERROR) << "shard_id " << shard_id << " exceeds array bounds"; return UINT32_MAX; }Or if process termination is intended, remove the unreachable return:
if (shard_id >= dss_shards_.size()) { LOG(FATAL) << "shard_id " << shard_id << " exceeds array bounds"; - return UINT32_MAX; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
data_store_service_client.cpp(89 hunks)eloq_data_store_service/internal_request.h(55 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- eloq_data_store_service/internal_request.h
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-17T06:36:11.137Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 80
File: rocksdb_handler.cpp:1226-1335
Timestamp: 2025-09-17T06:36:11.137Z
Learning: In the EloqData store_handler codebase, the maximum bucket ID is guaranteed to be smaller than UINT16_MAX (65535), so bucket_id + 1 operations do not risk integer overflow when used in FetchBucketData methods.
Applied to files:
data_store_service_client.cpp
📚 Learning: 2025-09-17T11:08:35.322Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 83
File: data_store_service_client_closure.cpp:408-448
Timestamp: 2025-09-17T11:08:35.322Z
Learning: In the DataStoreServiceClient batch write system, the global coordinator (SyncPutAllData) manages concurrency at the partition level, not at the individual batch level. The global unfinished_request_cnt_ is only decremented when an entire partition is fully completed or failed, not after each individual batch within a partition.
Applied to files:
data_store_service_client.cpp
🧬 Code graph analysis (1)
data_store_service_client.cpp (2)
data_store_service_client.h (5)
TxPort2DssPort(134-137)need_bootstrap_(688-859)KvPartitionIdOf(629-634)bind_data_shard_with_ng_(689-813)Connect(154-159)data_store_service_client_closure.cpp (2)
PartitionBatchCallback(493-538)PartitionBatchCallback(493-496)
🔇 Additional comments (5)
data_store_service_client.cpp (5)
3129-3188: LGTM: Thread-safe deterministic bucket assignment.The implementation correctly uses
std::mt19937with deterministic seeding to generate consistent bucket-to-shard mappings. The consistent hashing approach with 64 virtual nodes per shard provides good load distribution, and the collision handling ensures each virtual node position is unique.
3989-4003: LGTM: Proper error handling for Sequences table.Good improvement over the previous
assert(false). The code now gracefully returns false with a clear error message if the Sequences table is not hash-partitioned, allowing callers to handle the failure appropriately.
2368-2368: LGTM: Consistent shard routing in archive copy operation.The
CopyBaseToArchivemethod correctly propagates theis_range_partitionedflag when computing bothkv_part_idandshard_idfor reading base table records. The concurrent read management withMAX_FLYING_READ_COUNTthrottling prevents overwhelming the system while copying data.Also applies to: 2402-2415
3114-3127: LGTM: Thread-safe shard enumeration with proper locking.The method now correctly acquires a
shared_lockbefore iterating overdss_shard_ids_, addressing the concurrency concern from the previous review. The lock is held only during the copy operation, and the returned vector provides a consistent snapshot of active shard IDs.
419-420: I need to verify a few specific cases to confirm correctness. Let me examine the contexts more closely:I need to examine the specific contexts for a few questionable cases. Let me get the surrounding code:
Based on my analysis of all 42 calls to
GetShardIdByPartitionIdin the output, I can now provide the verification result:Analysis Complete: No Critical Issues Found
The
is_range_partitionparameter is consistently and correctly used across all call sites:
Catalog operations (partition_id=0): All 10 calls correctly pass
false(partition 0 is always hash-partitioned)
- Lines 1543, 1589, 1643, 1700, 1748, 1773, 4323, 4548, 4595, and one additional
Catalog/KvPartitionIdOf pattern: All 14 calls using
KvPartitionIdOf(ccm_table_name)correctly passfalse
- Lines 597, 627, 669, 819, 940, 988, 1385, 3843, 3896, 4052, and others
Range partition operations: All 2 explicit calls correctly pass
true
- Line 1052:
KvPartitionIdOf(partition_id, true)withGetShardIdByPartitionId(..., true)✓- Line 1148:
KvPartitionIdOf(range_partition_id, true)withGetShardIdByPartitionId(..., true)✓Schema-derived operations: All 4 calls correctly derive from
!IsHashPartitioned()
- Lines 3387-3390, 3500-3503: Match pattern for both KvPartitionIdOf and GetShardIdByPartitionId calls
- Lines 420, 2406: Compute consistency correctly
Hash archive/KvPartitionIdOf(table_name) implicit false: All 10+ calls correctly pass
false
- Lines 2572, 2683, 2779, 2781, 2818, 3997, 4325, and others
The parameter consistently aligns with the partition type being passed—catalog operations and hash partitions use
false, while range partitions usetrue. The pattern is deterministic and secure against routing errors.
33a62c9 to
deb72ed
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
data_store_service_client.cpp (1)
2540-2751: Deprecated archive APIs now log instead of asserting but still perform work
FetchArchives(...)andFetchVisibleArchive(...)(the table-schema-based variants) now log warnings that they “should not be used (noop)” instead of asserting, but they still execute full ScanNext/Decode logic and return real data. If these paths are truly deprecated in favor of ccmap, consider either:
- Returning early after the warning, or
- Updating the comments/log messages to reflect that they are deprecated but still functional.
eloq_data_store_service/rocksdb_cloud_data_store_factory.h (1)
48-84: Guard against emptyobject_path_before calling.back()Calling
std::string::back()on an empty string is undefined behavior in C++17. At lines 50–53, the code callsobject_path_.back()without checking if the string is empty:if (shard_cloud_config.object_path_.back() != '/') { shard_cloud_config.object_path_.append("/"); }Even if current configurations always provide a non-empty path, this is a brittle assumption. Add a guard:
- if (shard_cloud_config.object_path_.back() != '/') - { - shard_cloud_config.object_path_.append("/"); - } + if (!shard_cloud_config.object_path_.empty() && + shard_cloud_config.object_path_.back() != '/') + { + shard_cloud_config.object_path_.push_back('/'); + } + else if (shard_cloud_config.object_path_.empty()) + { + shard_cloud_config.object_path_ = "/"; + }
🧹 Nitpick comments (1)
data_store_service_client.cpp (1)
32-83: Topology setup and bootstrap gating now behave sanelyIncluding
<random>, replacing the old fatalassert(false)inSetupConfig()with version-checked, idempotent shard-upgrade logic, and short-circuitingConnect()whenneed_bootstrap_is false collectively fix prior crash-on-start behavior and make topology upgrades retryable. Version compare uses<=so you may see redundant UpgradeShardVersion calls at equal version, but those are cheap and guarded; if you want to avoid duplicate logs, switch the condition to< new_version.Also applies to: 109-149, 222-242
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
data_store_service_client.cpp(101 hunks)eloq_data_store_service/internal_request.h(55 hunks)eloq_data_store_service/rocksdb_cloud_data_store.cpp(5 hunks)eloq_data_store_service/rocksdb_cloud_data_store_factory.h(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-17T06:36:11.137Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 80
File: rocksdb_handler.cpp:1226-1335
Timestamp: 2025-09-17T06:36:11.137Z
Learning: In the EloqData store_handler codebase, the maximum bucket ID is guaranteed to be smaller than UINT16_MAX (65535), so bucket_id + 1 operations do not risk integer overflow when used in FetchBucketData methods.
Applied to files:
data_store_service_client.cpp
📚 Learning: 2025-09-17T11:08:35.322Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 83
File: data_store_service_client_closure.cpp:408-448
Timestamp: 2025-09-17T11:08:35.322Z
Learning: In the DataStoreServiceClient batch write system, the global coordinator (SyncPutAllData) manages concurrency at the partition level, not at the individual batch level. The global unfinished_request_cnt_ is only decremented when an entire partition is fully completed or failed, not after each individual batch within a partition.
Applied to files:
data_store_service_client.cpp
🧬 Code graph analysis (4)
eloq_data_store_service/rocksdb_cloud_data_store.cpp (3)
eloq_data_store_service/rocksdb_data_store.cpp (2)
Shutdown(69-83)Shutdown(69-69)eloq_data_store_service/rocksdb_data_store_common.cpp (2)
Shutdown(283-298)Shutdown(283-283)eloq_data_store_service/thread_worker_pool.cpp (2)
Shutdown(91-106)Shutdown(91-91)
eloq_data_store_service/rocksdb_cloud_data_store_factory.h (2)
eloq_data_store_service/rocksdb_cloud_data_store.cpp (2)
RocksDBCloudDataStore(138-155)RocksDBCloudDataStore(157-163)eloq_data_store_service/rocksdb_cloud_data_store.h (1)
RocksDBCloudDataStore(38-161)
eloq_data_store_service/internal_request.h (3)
eloq_data_store_service/data_store_service_config.h (2)
uint32_t(107-110)EloqDS(98-514)data_store_service_client_closure.h (30)
Reset(81-85)Reset(144-153)Reset(210-216)Reset(269-274)Reset(385-392)Reset(600-619)Reset(933-948)Reset(1154-1177)Reset(1393-1408)Reset(1598-1624)Reset(1627-1658)Reset(1660-1694)Reset(1933-2251)Reset(2328-2347)Reset(2606-2612)Reset(3002-3010)string(804-809)string(2173-2178)string_view(311-368)string_view(752-755)string_view(767-770)string_view(792-797)string_view(1291-1294)string_view(1533-1536)string_view(2133-2136)string_view(2138-2141)string_view(2241-2246)string_view(2453-2456)EloqDS(39-42)EloqDS(44-184)eloq_data_store_service/data_store.h (1)
EloqDS(28-145)
data_store_service_client.cpp (2)
data_store_service_client.h (4)
TxPort2DssPort(134-137)need_bootstrap_(688-859)KvPartitionIdOf(629-634)bind_data_shard_with_ng_(689-813)data_store_service_client_closure.cpp (2)
PartitionBatchCallback(493-538)PartitionBatchCallback(493-496)
🔇 Additional comments (20)
eloq_data_store_service/internal_request.h (7)
36-69: Shard-aware write request interfaces look consistent
WriteRecordsRequestand its RPC/local implementations now carryuint32_t shard_idexplicitly and use it inSetFinish()viaDecreaseWriteReqCount(...). Types (int32_tforpartition_id,uint32_tforshard_id) and initialisation toUINT32_MAXare consistent across Clear/Reset paths; no functional issues spotted here.Also applies to: 70-179
329-469: FlushData shard wiring is coherent
FlushDataRequestnow exposesGetShardId(), with RPC mapping fromreq_->shard_id()and local requests storingshard_id_(reset toUINT32_MAXinClear()) and using it forDecreaseWriteReqCount. This matches the client changes that fan out flushes per shard.
471-654: Partition/signature fixes and shard_id on DeleteRange are correct
DeleteRangeRequestnow usesint32_t GetPartitionId()and addsGetShardId(), with both RPC and local implementations returning the proto/local fields andSetFinish()decrementing per-shard write counters. This addresses previous signed/unsigned mismatches while keeping the API uniform across read/write/delete.
663-863: ReadRequest and implementations: key/partition/shard accessors align with client usageThe base
ReadRequestnow exposesGetKey(),GetPartitionId()(int32_t), andGetShardId()(uint32_t), with RPC/Local variants mapping cleanly to proto fields and local members. Lifetime of thestd::string_viewmembers is unchanged from previous design; shard_id defaults and resets are consistent with other request types.
865-1134: Create/Drop table shard propagation matches new client DropTable/FlushData flowsAdding
GetShardId()toCreateTableRequest/DropTableRequestand wiring shard_id through RPC/local variants (includingDecreaseWriteReqCount(shard_id)inSetFinish) is consistent with the per-shard Drop/Flush logic in the client. The use ofUINT32_MAXas an invalid sentinel is uniform across request types.
1136-1518: ScanRequest: partition_id type and shard_id addition align with sharded scanners
ScanRequestnow usesint32_t GetPartitionId()and addsGetShardId(), with RPC/local implementations correctly mapping proto and local state. This matches the newScanNext(..., partition_id, shard_id, ...)client signature and should keep routing correct for both hash and range scans.
1520-1682: Snapshot-for-backup shard plumbing is well-integrated
CreateSnapshotForBackupRequest’s newGetShardId()(and the RPC/local implementations) allow the client’s per-shard snapshot fanout to address the right data store instance and decrement per-shard write counters inSetFinish. The defaultshard_id_{UINT32_MAX}and Clear/Reset logic are consistent with other request types.data_store_service_client.cpp (9)
151-210: TxConfigsToDssClusterConfig: mapping NG configs to shards is robustThe function now builds a
nodes_mapfrom ng member configs, adds only candidate nodes as shard members, updates the primary fromng_leaders(with an existence check), and setsthis_node/read-write status when the local node is a candidate leader. This addresses earlier concerns about redundantleader_id == node_idchecks and unguardednodes_map.at(...)access. Behavior looks correct given the provided cluster-manager API.
591-686: Metadata/catalog/statistics operations correctly include shard_id
FetchTableCatalog,Fetch(Current)TableStatistics,UpsertTableStatistics, and the associatedDeleteRangecleanup now computekv_partition_idviaKvPartitionIdOfand derivedata_shard_idviaGetShardIdByPartitionId(..., /*is_range=*/false). All subsequent Read/Scan/BatchWrite/DeleteRange calls pass the same partition/shard pair, aligning metadata access with the new sharding scheme.Also applies to: 742-921
1318-1337: CopyBaseToArchive: new shard computations and logging are reasonable
CopyBaseToArchivenow deriveskv_part_idandshard_idper base record using the is_range_partitioned flag, and uses those to call the shardedRead(...). The extraLOG(WARNING)when seeingEloqKvfor MVCC is helpful, and the batch_size accounting (key+value sizes) correctly feeds into the constructedFlushTaskEntry. Flow control (MAX_FLYING_READ_COUNT and condition variable) remains unchanged.Also applies to: 2352-2537
2855-2903: Shard-scoped snapshot and lifecycle hooks use the new IsLocalShard/AllDataShards helpers correctly
CreateSnapshotForBackupnow callsGetAllDataShards()and fans out snapshot creation per shard, withCreateSnapshotForBackupInternaldispatching local vs remote based onIsLocalShard(shard_id)and usingGetOwnerNodeIndexOfShardfor RPC routing. Leader/follower transitions (OnLeaderStart,OnLeaderStop,OnStartFollowing) respectbind_data_shard_with_ng_and open/close the appropriate data store shard viang_id. This is aligned with the per-shardDataStoreconstructor changes.Also applies to: 2969-3063, 3080-3088
3193-3203: GetOwnerNodeIndexOfShard: new FATAL guard is appropriate for out-of-range idsThe added bounds check (
shard_id >= dss_shards_.size()) withLOG(FATAL)prevents undefined access todss_shards_in release builds, which is critical for RPC routing. This is an improvement over relying solely on assertions.
3363-3402: Core Read/Delete/Scan/BatchWrite call paths are cleanly shard-awareThe main data-plane methods (
FetchRecord,FetchSnapshot,Read,DeleteRange,ScanNext,ScanClose,BatchWriteRecordsand their *_Internal variants) consistently:
- Compute
kv_partition_idfrom logicalpartition_idwith the appropriateis_range_partitionflag.- Derive
shard_idviaGetShardIdByPartitionId.- Use
IsLocalShard(shard_id)to choose between localdata_store_service_calls and RPC viaGetOwnerNodeIndexOfShard(shard_id)anddss_nodes_[node_index].Channel().This wiring matches the updated closure/request interfaces and should give correct per-shard routing across all operations.
Also applies to: 3485-3515, 3517-3533, 3535-3564, 3566-3620, 3710-3742, 3744-3782, 3784-3838
3840-3999: Catalog/range/statistics maintenance updated to use shard_id consistently
InitTableRanges,DeleteTableRanges,InitTableLastRangePartitionId,DeleteTableStatistics,UpsertCatalog, andDeleteCatalogall now:
- Compute a KV partition id (often 0 for global catalogs).
- Use
GetShardIdByPartitionId(partition_id or kv_partition_id, false)to choose the data shard.- Pass both partition and shard into
BatchWriteRecords/DeleteRange.That keeps system catalog and statistics maintenance aligned with the general sharding scheme.
Also applies to: 4051-4105, 4107-4615
286-471: PutAll: partition→shard routing is consistent for hash and range tablesAll
BatchWriteRecordscall sites throughoutdata_store_service_client.cpppass bothpartition_idandshard_idparameters. Hash and range partitions are routed consistently viaKvPartitionIdOf(..., false/true)→ partition id, andGetShardIdByPartitionId(partition_id, is_range_partitioned)→ shard id. The same partition/shard pair threads throughBatchWriteRecordsand subsequent callbacks.
934-1166: Range tables: shard routing pattern verified as coherentAll five functions consistently apply the shard routing pattern:
System table operations (FetchTableRanges, FetchRangeSlices, UpdateRangeSlices): Use
KvPartitionIdOf(table_name)withGetShardIdByPartitionId(..., false), routing viaMapHashPartitionIdToBucketId.Range partition operations (DeleteOutOfRangeData, LoadRangeSlice): Use
KvPartitionIdOf(partition_id, true)withGetShardIdByPartitionId(..., true), routing viaMapRangeIdToBucketId.Caller (UpsertRanges): Delegates to UpdateRangeSlices, maintaining consistency.
The
GetShardIdByPartitionIdimplementation correctly interprets theis_range_partitionflag to select the appropriate mapping function, and bothKvPartitionIdOfoverloads properly forward their inputs for consistent downstream routing.eloq_data_store_service/rocksdb_cloud_data_store.cpp (4)
159-159: LGTM: Destructor condition simplified consistently.The removal of
data_store_service_from the shutdown condition aligns with the pattern inRocksDBDataStore(seerocksdb_data_store.cpp:68-82). The service pointer is not owned by this class and doesn't require cleanup in the destructor.
167-185: Shutdown flow properly refactored with parent delegation.The call to
RocksDBDataStoreCommon::Shutdown()at line 167 correctly delegatesquery_worker_pool_cleanup (which includes nulling) andForceEraseScanItersinvocation. The subsequent db-specific cleanup properly nulls all owned resources under lock.Note: The parent's
Shutdown()doesn't nulldata_store_service_after callingForceEraseScanIters. Whiledata_store_service_isn't deleted (it's not owned), nulling it would provide extra defense against use-after-free ifShutdown()is called multiple times or the object is reused.Based on learnings
355-377: Good timing instrumentation improvement.Using
steady_clockfor the NewCloudFileSystem timing measurement is the correct choice—it's monotonic and unaffected by system clock adjustments. TheLOG(INFO)ensures timing is visible in production for observability.
632-655: Consistent timing improvements for DB open operation.The switch to
steady_clockandLOG(INFO)is consistent with the NewCloudFileSystem timing changes. This provides reliable timing measurements unaffected by system clock changes and ensures visibility in production logs.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
data_store_service_client.cpp (1)
3131-3190: Add precondition check for empty node_groups to prevent undefined behavior.If
node_groupsis empty,rand_num_to_ngremains empty after the generation loop. Line 3180-3182 will then dereferencerand_num_to_ng.begin()->secondon an empty map whenit == rand_num_to_ng.end(), causing undefined behavior.Apply this diff to add the missing guard:
void DataStoreServiceClient::InitBucketsInfo( const std::set<uint32_t> &node_groups, uint64_t version, std::unordered_map<uint16_t, std::unique_ptr<txservice::BucketInfo>> &ng_bucket_infos) { + if (node_groups.empty()) + { + LOG(ERROR) << "InitBucketsInfo called with empty node_groups"; + ng_bucket_infos.clear(); + return; + } + ng_bucket_infos.clear(); std::map<uint16_t, uint32_t> rand_num_to_ng;Based on learnings (bucket_infos_ immutability was clarified in PR #80).
🧹 Nitpick comments (3)
data_store_service_client.cpp (3)
1809-1821: Clarify deprecation status or remove implementation.
DropKvTableAsynclogs that it "should not be used (noop)" but continues to execute the full async drop logic. If this method is truly deprecated and shouldn't be called:
- Remove the implementation body and return early after the warning, or
- Remove the method entirely
If it may still be invoked, the warning is misleading.
void DataStoreServiceClient::DropKvTableAsync(const std::string &kv_table_name) { - // FIXME(lzx): this function may not be used now, delete it. LOG(WARNING) << "DropKvTableAsync should not be used (noop)"; - - AsyncDropTableCallbackData *callback_data = - new AsyncDropTableCallbackData(); - callback_data->kv_table_name_ = kv_table_name; - DropTable(std::string_view(callback_data->kv_table_name_.data(), - callback_data->kv_table_name_.size()), - callback_data, - &AsyncDropTableCallback); + return; // Early return if truly deprecated }Also applies to: 2554-2644, 2664-2751
2459-2488: Consolidate redundant EloqKv MVCC warnings.Lines 2462 and 2486-2487 both log the same warning "EloqKv engine not support mvcc feature" within the same conditional branch. This creates redundant log spam. Consider consolidating to a single warning or removing if MVCC with EloqKv is truly unsupported and this code path should not execute.
if (table_name.Engine() == txservice::TableEngine::EloqKv) { - // mvcc is not used for EloqKV LOG(WARNING) << "EloqKv engine not support mvcc feature"; txservice::TxObject *tx_object = static_cast<txservice::TxObject *>(record.get()); record = tx_object->DeserializeObject(val.data(), offset); } // ... later in same branch ... if (table_name.Engine() == txservice::TableEngine::EloqKv) { - // should not be here - LOG(WARNING) - << "EloqKv engine not support mvcc feature"; ref.SetNonVersionedPayload(record.get()); }Also applies to: 2631, 2742
2972-3062: Document the ng_id-to-shard_id mapping assumption.
OnLeaderStart,OnLeaderStop, andOnStartFollowingnow acceptng_idparameters and directly treat them asdss_shard_id(line 3031). This assumes a 1:1 mapping between node groups and data store shards. Consider adding a comment documenting this invariant, especially at line 3031 where the assignment occurs.// Treat leader_node_id as dss_leader_node_id uint32_t dss_leader_node_id = leader_node_id; + // In multi-shard mode, node group ID maps 1:1 to data store shard ID uint32_t dss_shard_id = ng_id;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
data_store_service_client.cpp(101 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-17T06:36:11.137Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 80
File: rocksdb_handler.cpp:1226-1335
Timestamp: 2025-09-17T06:36:11.137Z
Learning: In the EloqData store_handler codebase, the maximum bucket ID is guaranteed to be smaller than UINT16_MAX (65535), so bucket_id + 1 operations do not risk integer overflow when used in FetchBucketData methods.
Applied to files:
data_store_service_client.cpp
📚 Learning: 2025-09-17T11:08:35.322Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 83
File: data_store_service_client_closure.cpp:408-448
Timestamp: 2025-09-17T11:08:35.322Z
Learning: In the DataStoreServiceClient batch write system, the global coordinator (SyncPutAllData) manages concurrency at the partition level, not at the individual batch level. The global unfinished_request_cnt_ is only decremented when an entire partition is fully completed or failed, not after each individual batch within a partition.
Applied to files:
data_store_service_client.cpp
🧬 Code graph analysis (1)
data_store_service_client.cpp (2)
data_store_service_client.h (4)
TxPort2DssPort(134-137)need_bootstrap_(688-859)KvPartitionIdOf(629-634)bind_data_shard_with_ng_(689-813)data_store_service_client_closure.cpp (2)
PartitionBatchCallback(493-538)PartitionBatchCallback(493-496)
🔇 Additional comments (1)
data_store_service_client.cpp (1)
3089-3114: LGTM: Shard routing logic is correct and handles edge cases.
GetShardIdByPartitionIdcorrectly maps partition IDs to shard IDs through bucket ownership. The method:
- Distinguishes range vs. hash partitions (lines 3093-3101)
- Maps partition → bucket → shard using the bucket_infos_ map
- Returns
UINT32_MAXwith error logging for missing buckets (lines 3111-3113)- Uses asserts for debug verification while handling errors gracefully in release builds
The consistent propagation of
shard_idthroughout all read/write/scan operations (lines 418-419, 596, 625-626, etc.) correctly routes requests to the appropriate shards.Based on learnings (bucket_infos_ is immutable after initialization per PR #80).
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
data_store_service_client.cpp (3)
3320-3329: Replace LOG(FATAL) with error handling for missing shard.
LOG(FATAL)at line 3327 will crash the process if shard_id is not found. The return statement is unreachable. Consider usingLOG(ERROR)and returningfalseto indicate failure, allowing callers to handle the error appropriately.if (shard_id >= dss_shards_.size()) { - LOG(FATAL) << "Shard id not found, shard_id: " << shard_id; - return true; + LOG(ERROR) << "Shard id not found, shard_id: " << shard_id; + return false; }
2554-2563: Confirm warnings are contradictory—functions actively used but marked "should not be used."The review comment is valid. Both
FetchArchives(line 2554) andFetchVisibleArchive(line 2664) log warnings stating they "should not be used" and "(noop)", yet both functions execute full implementations and are actively called:
FetchArchivesis invoked at line 3392 whenonly_fetch_archives_is trueFetchVisibleArchiveis invoked at line 3505 under the same condition- Both are also called from
data_store_service_client_closure.cpp- Both have corresponding implementations in handler classes (DynamoHandler, BigTableHandler, RocksDBHandler)
The "(noop)" label is misleading—these functions perform substantial work (encoding, scanning, deserialization). The TODO at line 2673 ("Remove this function if not needed") reinforces that this inconsistency was known.
Recommendation: Either remove the warnings and clarify these functions handle archive-only fetch paths, or add early returns after the warnings if they're truly deprecated.
3312-3318: Implement proper topology change handling instead of LOG(FATAL) crash.Line 3315 LOG(FATAL) crashes the process when cluster topology undergoes a full node group change. The else branch should implement topology update logic similar to
DataStoreServiceClusterManager::HandleShardingError(lines 972-990 in eloq_data_store_service/data_store_service_config.cpp), which parses the new cluster configuration and applies it via an Update() call. Callers in the closure handlers (multiple locations) expect the function to return normally so they can proceed with retry logic, not terminate the process.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
data_store_service_client.cpp(101 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-17T06:36:11.137Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 80
File: rocksdb_handler.cpp:1226-1335
Timestamp: 2025-09-17T06:36:11.137Z
Learning: In the EloqData store_handler codebase, the maximum bucket ID is guaranteed to be smaller than UINT16_MAX (65535), so bucket_id + 1 operations do not risk integer overflow when used in FetchBucketData methods.
Applied to files:
data_store_service_client.cpp
📚 Learning: 2025-09-17T11:08:35.322Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 83
File: data_store_service_client_closure.cpp:408-448
Timestamp: 2025-09-17T11:08:35.322Z
Learning: In the DataStoreServiceClient batch write system, the global coordinator (SyncPutAllData) manages concurrency at the partition level, not at the individual batch level. The global unfinished_request_cnt_ is only decremented when an entire partition is fully completed or failed, not after each individual batch within a partition.
Applied to files:
data_store_service_client.cpp
🧬 Code graph analysis (1)
data_store_service_client.cpp (2)
data_store_service_client.h (4)
TxPort2DssPort(134-137)need_bootstrap_(688-859)KvPartitionIdOf(629-634)bind_data_shard_with_ng_(689-813)data_store_service_client_closure.cpp (2)
PartitionBatchCallback(493-538)PartitionBatchCallback(493-496)
🔇 Additional comments (5)
data_store_service_client.cpp (5)
3139-3144: Good defensive programming: empty node_groups check.The precondition check for empty
node_groupsis good defensive programming, even though a past review comment from the developer stated it should never be empty. This guard prevents undefined behavior if the invariant is violated and provides a clear error message.
3153-3174: LGTM: Thread-safe deterministic RNG.The use of
std::mt19937with deterministic seeding (line 3154) andstd::uniform_int_distribution(lines 3155-3156) properly addresses the thread-safety concern from previous reviews. The local RNG instance per node group ensures no shared state, and deterministic seeding guarantees consistent bucket-to-shard mapping across restarts.
3117-3130: LGTM: Thread-safe access with shared_lock.The addition of
std::shared_lock<std::shared_mutex>(line 3121) properly protects concurrent read access todss_shard_ids_, addressing the thread-safety concern from previous reviews. The lock is held only during the copy operation, keeping the critical section minimal.
3998-4011: LGTM: Proper error handling replaces assert.The conditional check for hash partitioning (lines 4001-4011) with error logging and
return falseproperly addresses the previous review concern aboutassert(false). The code now fails gracefully if the Sequences table is unexpectedly not hash-partitioned, allowing callers to handle the error.
2973-3063: LGTM: Shard lifecycle management correctly gated.The
bind_data_shard_with_ng_flag checks (lines 2976-2979, 2994-2997, 3020-3023) properly gate shard-specific operations. When binding is disabled, the methods return early, avoiding unnecessary work. The logic for opening/closing data stores and updating cluster manager state on leadership changes looks correct.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
data_store_service_client.cpp (1)
2561-2563: FetchArchives/FetchVisibleArchive marked as deprecated but still functional.Both methods log warnings indicating they "should not be used because all archive versions are fetched from ccmap (noop)," yet still implement full functionality with shard routing. Consider removing the implementations if they are truly unused, or remove the warning if they remain necessary.
Also applies to: 2674-2676
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
data_store_service_client.cpp(101 hunks)eloq_data_store_service/rocksdb_cloud_data_store_factory.h(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-17T06:36:11.137Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 80
File: rocksdb_handler.cpp:1226-1335
Timestamp: 2025-09-17T06:36:11.137Z
Learning: In the EloqData store_handler codebase, the maximum bucket ID is guaranteed to be smaller than UINT16_MAX (65535), so bucket_id + 1 operations do not risk integer overflow when used in FetchBucketData methods.
Applied to files:
data_store_service_client.cpp
📚 Learning: 2025-09-17T11:08:35.322Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 83
File: data_store_service_client_closure.cpp:408-448
Timestamp: 2025-09-17T11:08:35.322Z
Learning: In the DataStoreServiceClient batch write system, the global coordinator (SyncPutAllData) manages concurrency at the partition level, not at the individual batch level. The global unfinished_request_cnt_ is only decremented when an entire partition is fully completed or failed, not after each individual batch within a partition.
Applied to files:
data_store_service_client.cpp
🧬 Code graph analysis (2)
eloq_data_store_service/rocksdb_cloud_data_store_factory.h (2)
eloq_data_store_service/rocksdb_cloud_data_store.cpp (2)
RocksDBCloudDataStore(138-155)RocksDBCloudDataStore(157-163)eloq_data_store_service/rocksdb_cloud_data_store.h (1)
RocksDBCloudDataStore(38-161)
data_store_service_client.cpp (2)
data_store_service_client.h (5)
TxPort2DssPort(134-137)need_bootstrap_(688-859)KvPartitionIdOf(629-634)bind_data_shard_with_ng_(689-813)Connect(154-159)data_store_service_client_closure.cpp (2)
PartitionBatchCallback(493-538)PartitionBatchCallback(493-496)
🔇 Additional comments (15)
eloq_data_store_service/rocksdb_cloud_data_store_factory.h (1)
54-68: LGTM! Per-shard object path construction is correct.The per-shard cloud config setup properly handles empty paths, trailing slashes, and constructs unique shard paths. The constructor call matches the updated signature with shard_id and data_store_service parameters.
data_store_service_client.cpp (14)
32-32: LGTM! Required header for thread-safe RNG.The
<random>header is necessary forstd::mt19937andstd::uniform_int_distributionused inInitBucketsInfofor thread-safe random number generation.
151-211: LGTM! TxConfigsToDssClusterConfig correctly refactored for shard topology.The method now properly:
- Guards
nodes_map.at(leader_id)access with existence check (lines 189-190)- Consolidates the duplicate
leader_id == node_idchecks into a single block (lines 195-200)- Sets up cluster manager with shard membership, primary nodes, and this node configuration
224-227: LGTM! Bootstrap guard prevents unnecessary initialization.The
need_bootstrap_check allows Connect to skip initialization when bootstrap is not required, which is appropriate for shard-aware configurations.
251-254: LGTM! Replaced fatal assert with warning.The method now logs a warning and continues instead of crashing, consistent with the PR's pattern of converting fatal asserts to graceful no-ops.
404-429: LGTM! PutAll correctly routes batches to shards.The partition-level batching properly computes
shard_idusingGetShardIdByPartitionIdwith the appropriateis_range_partitionedflag, ensuring correct shard routing for each partition's writes.
596-604: LGTM! Consistent shard routing across all metadata and data operations.All methods now correctly:
- Compute
shard_idviaGetShardIdByPartitionId(partition_id, is_range_partition)- Pass
shard_idto underlying Read/Scan/BatchWrite/DeleteRange operations- Distinguish hash vs. range partitioning appropriately
The shard routing pattern is consistent and correctly propagated throughout the codebase.
Also applies to: 625-634, 667-675, 938-960, 986-1005, 1050-1061, 1146-1163, 818-823, 842-843, 872-873, 906-907, 1384-1385, 1407-1408, 1447-1448, 1542-1549, 1588-1589, 1642-1643, 1699-1700, 1747-1754, 1772-1773, 2367-2413, 2142-2145, 2192-2193, 2211-2211, 2288-2289
2973-3063: LGTM! Leadership event handlers now support per-shard operations.The methods correctly:
- Accept
ng_idparameter for shard identification- Guard shard operations with
bind_data_shard_with_ng_flag- Open/close data stores per shard during leadership transitions
- Update cluster topology and refresh config in
OnStartFollowing
3090-3115: LGTM! GetShardIdByPartitionId correctly maps partitions to shards.The method:
- Correctly computes
bucket_idfor hash vs. range partitions using appropriate mapper functions- Looks up the shard owner via
bucket_infos_(immutable after init, no lock needed per developer clarification)- Returns
UINT32_MAXwith error logging on missing bucket instead of assertingBased on learnings: bucket_infos_ is immutable after initialization.
3117-3130: LGTM! GetAllDataShards properly synchronized.The method correctly acquires a shared lock on
dss_shard_ids_mutex_before reading and copyingdss_shard_ids_, ensuring thread-safe access. The returned vector preserves the set's sorted order.
3132-3197: LGTM! InitBucketsInfo uses thread-safe RNG with proper error handling.The method correctly:
- Guards against empty
node_groupswith early return and error log- Uses thread-safe
std::mt19937seeded per node group for deterministic virtual node placement- Implements consistent hashing with 64 virtual nodes per node group
- Assigns buckets to the first virtual node position on the ring
The RNG is now thread-safe and deterministic, fixing the previous
rand()/srand()issue.
3199-3209: LGTM! GetOwnerNodeIndexOfShard handles bounds errors gracefully.The method now uses
LOG(ERROR)and returnsUINT32_MAXinstead ofLOG(FATAL)whenshard_idexceeds bounds, allowing callers to handle the error gracefully.
3998-4011: LGTM! InitTableLastRangePartitionId handles non-hash-partitioned Sequences gracefully.The method now returns
falsewith an error log instead of asserting when the Sequences table is not hash-partitioned, allowing callers to handle the error appropriately.
3541-3570: LGTM! Internal routing methods correctly dispatch to local or remote shards.All internal routing methods (
ReadInternal,DeleteRangeInternal,FlushDataInternal,DropTableInternal,ScanNextInternal,ScanCloseInternal,BatchWriteRecordsInternal) consistently:
- Check
IsLocalShard(shard_id)to determine routing- Pass
shard_idto localdata_store_service_calls- Set up RPC with correct node index for remote calls
- Propagate
shard_idthrough all code pathsAlso applies to: 3596-3626, 3642-3669, 3686-3713, 3750-3788, 3816-3844, 4150-4189
3395-3407: LGTM! All public APIs and utility methods correctly implement shard routing.The remaining methods properly:
- Compute
shard_idfor their operations usingGetShardIdByPartitionId- Pass
shard_idthrough to internal routing functions- Handle both local and remote shard access
- Apply shard routing to catalog, range, statistics, and table operations
The shard-aware refactoring is complete and consistent across the entire client.
Also applies to: 3440-3488, 3490-3521, 3523-3539, 3572-3594, 3628-3640, 3715-3748, 3790-3814, 3846-3898, 3900-3951, 4057-4111, 4113-4148, 4270-4356, 4539-4587, 4589-4632
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
eloq_data_store_service/rocksdb_data_store_common.cpp (1)
274-282: Clarifyquery_worker_pool_lifecycle and align the Shutdown comment with actual behavior
Initialize()now either creates the pool or callsquery_worker_pool_->Initialize()when it already exists, whileShutdown()only stops the pool viaquery_worker_pool_->Shutdown()and leaves the unique_ptr non-null. The in-code comment inShutdown()says the pool “will be re-created in Initialize()”, but in practice it’s being reinitialized, not re-created, which can be confusing when reasoning about reuse andSubmitWorkafterShutdown().Since
ThreadWorkerPool::Shutdown()(seethread_worker_pool.cpp, lines 103-121) joins threads and clearsworkers_but keeps the object alive, the intended contract seems to be:
- No calls that enqueue work between
Shutdown()and the nextInitialize().- On reuse,
Initialize()must be called to recreate worker threads.I’d suggest at least aligning the comment with the actual behavior, and optionally documenting this contract in the header so misuse is harder:
- query_worker_pool_->Shutdown(); - // If the data store be reused, query_worker_pool_ will be re-created in - // Initialize(). + query_worker_pool_->Shutdown(); + // If the data store is reused, query_worker_pool_ will be reinitialized + // in Initialize().Optionally, you could also add an assertion or debug check inside
ThreadWorkerPool::SubmitWorkor at the start of operations here to fail fast if someone tries to submit work afterShutdown()but beforeInitialize().Also applies to: 287-301
eloq_data_store_service/data_store_service.h (1)
631-641: Align GetDataStore with shard_id bounds and ownership check
GetDataStoreshould ideally mirror the safety checks inIsOwnerOfShard/FetchDSShardStatus:
- Validate
shard_id < data_shards_.size()- Optionally verify
shard_id_ == shard_idbefore returning the storeFor example:
DataStore *GetDataStore(uint32_t shard_id) { - if (data_shards_.at(shard_id).shard_id_ == shard_id) - { - return data_shards_.at(shard_id).data_store_.get(); - } - else - { - return nullptr; - } + if (shard_id >= data_shards_.size()) + { + return nullptr; + } + auto &ds_ref = data_shards_[shard_id]; + if (ds_ref.shard_id_ == shard_id) + { + return ds_ref.data_store_.get(); + } + return nullptr; }This keeps ownership and bounds handling consistent and avoids accidental out-of-range access.
data_store_service_client.cpp (1)
2459-2487: Clarify EloqKv handling in MVCC archive code paths.Multiple
LOG(WARNING)statements (lines 2462, 2486-2487, 2631, 2742) indicate that EloqKv engine does not support MVCC features, yet the code contains logic to handle EloqKv tables in archive operations. If EloqKv truly doesn't support MVCC and these paths should never be reached, consider either:
- Adding assertions or returning errors early when EloqKv tables attempt MVCC operations
- Removing the EloqKv-specific handling if it's truly dead code
- Adding comments explaining why this fallback logic exists
This will make the intended behavior clearer and prevent confusion during future maintenance.
♻️ Duplicate comments (1)
data_store_service_client.cpp (1)
3104-3110: Replace debug-only asserts with runtime error checks.The
assertstatements at lines 3105 and 3109 will crash the process in debug builds if a bucket is not found or shard_id is missing fromdss_shard_ids_. This is inconsistent with the error handling at lines 3112-3114 that logs and returnsUINT32_MAX. Replace the asserts with runtime checks that log errors and fall through to the existing error path, ensuring consistent behavior across debug and release builds.- auto it = bucket_infos_.find(bucket_id); - assert(it != bucket_infos_.end()); - if (it != bucket_infos_.end()) + auto it = bucket_infos_.find(bucket_id); + if (it != bucket_infos_.end()) { uint32_t shard_id = it->second->BucketOwner(); - assert(dss_shard_ids_.find(shard_id) != dss_shard_ids_.end()); + if (dss_shard_ids_.find(shard_id) == dss_shard_ids_.end()) + { + LOG(ERROR) << "Shard not found for shard_id=" << shard_id; + return UINT32_MAX; + } return shard_id; }
🧹 Nitpick comments (12)
eloq_data_store_service/rocksdb_data_store_common.cpp (1)
1150-1161: Optional: reserve enough space inBuildKeyto avoid reallocations
BuildKeycurrently reservestable_name.size() + 2 + key.size(), which underestimates capacity by not accounting for the decimal representation ofpartition_id. This is harmless but can force an extra reallocation in hot paths.You could cheaply tighten this up by materializing the partition-id string once and using its size in the reservation:
std::string RocksDBDataStoreCommon::BuildKey(const std::string_view table_name, uint32_t partition_id, const std::string_view key) { std::string tmp_key; - tmp_key.reserve(table_name.size() + 2 + key.size()); + std::string partition_str = std::to_string(partition_id); + tmp_key.reserve(table_name.size() + 2 + partition_str.size() + key.size()); tmp_key.append(table_name); tmp_key.append(KEY_SEPARATOR); - tmp_key.append(std::to_string(partition_id)); + tmp_key.append(partition_str); tmp_key.append(KEY_SEPARATOR); tmp_key.append(key); return tmp_key; }Pure micro-optimization, feel free to defer if not latency-sensitive.
eloq_data_store_service/thread_worker_pool.cpp (3)
28-44: Ctor calling Initialize() constrains lifecycle; clarify or decoupleThe ctor unconditionally calls
Initialize(), so threads are always started on construction, even thoughInitialize()is also exposed publicly for reuse. That’s workable, but it means:
ThreadWorkerPoolcannot be constructed in a “cold” state and then explicitlyInitialize()d; it’s always hot.- The public
Initialize()is effectively only for post‑Shutdown()reuse.If the intent is to allow lazy startup, consider either:
- Removing the ctor call and letting owners call
Initialize()explicitly, or- Documenting that the public
Initialize()is only for reinitialization afterShutdown().
34-84: Initialize() reuse semantics are okay but busy-wait loop is wastefulThe guarded init logic (
while (shutdown_indicator_) sleep; if (workers_.empty()) spawn) correctly avoids racing withShutdown()and prevents double-spawn. However, the spin-sleep loop is a minor inefficiency on contended shutdown/start paths.If you want to avoid even brief busy-waiting, consider a small helper cv or reusing
work_queue_cv_to signal completion of shutdown instead of polling.
104-122: Callers must ensure Shutdown() is invoked before destruction
Shutdown()now cleanly joins workers, clearsworkers_, and resetsshutdown_indicator_so the pool can be reused. But the destructor is= defaultand does not callShutdown(), so destroying a pool with live threads remains undefined.Given this, either:
- Make it explicit in comments that owners are responsible for calling
Shutdown()before destruction, or- Consider invoking
Shutdown()in the destructor (defensively guarded) to enforce RAII semantics.eloq_data_store_service/data_store_service.h (3)
603-613: Guard write request counters against invalid shard_id
IncreaseWriteReqCount/DecreaseWriteReqCountalso usedata_shards_.at(shard_id)with no bounds check. Once you add a central shard-id guard (see previous comment), these should reuse it or at least early-return on invalidshard_idto avoid undefined behavior.You might also want to use
memory_order_relaxedhere unless some other code relies on the release semantics; they are just counters.
615-620: Verify IsOwnerOfShard semantics across all DSShardStatus values
IsOwnerOfShardnow returnstruefor any status that is notClosed:const auto &ds_ref = data_shards_.at(shard_id); return ds_ref.shard_status_.load(std::memory_order_acquire) != DSShardStatus::Closed;This means transient states such as
Starting(and any future non-Closed states) are treated as “owned”. Call sites likeRead/Scan*then separately gate onReadOnly/ReadWrite, so behavior becomes:
- non-Closed but not open → “owned but DB_NOT_OPEN”
- Closed → “not owner → REQUESTED_NODE_NOT_OWNER”
If that’s intentional, it’s fine; if not, consider tightening to explicit allowed states (e.g.,
ReadWriteandReadOnly) and/or also checkingshard_id_for extra safety, similar toGetDataStore.
671-704: DataShard::ShutDown lifecycle looks reasonable but document expectationsThe new
DataShardstruct cleanly encapsulates per-shard state and provides aShutDown()that:
- Calls
data_store_->Shutdown()without nullingdata_store_- Clears and destroys
scan_iter_cache_This is fine, but the comment about not nulling
data_store_hints at concurrency assumptions (reads may still use the pointer). It would help future maintainers if you:
- Document that callers must ensure no new reads are admitted once
ShutDown()is entered, and- Clarify whether
shard_status_is updated toClosedelsewhere as part of the shutdown protocol.No functional issues as-is, but the invariants are subtle.
data_store_service_client.h (4)
73-124: Ctor wiring for shard owners and bucket infos looks consistentThe updated ctor:
- Accepts
is_bootstrapandbind_data_shard_with_ngflags and stores them inneed_bootstrap_/bind_data_shard_with_ng_.- Builds
dss_shards_fromcluster_manager.GetAllShards()with a hard bound check (LOG(FATAL)ifshard_id >= dss_shards_.size()).- Populates
dss_shard_ids_and initializesbucket_infos_viaInitBucketsInfo.- Registers a config-update listener if
data_store_service_is present.This sets up the shard→node index mapping and bucket mapping coherently. Just ensure callers only pass a
DataStoreService*whose lifetime strictly outlives the client.
487-489: GetShardIdByPartitionId helper is well-aligned with bucket_infos_The new public helper:
uint32_t GetShardIdByPartitionId(int32_t partition_id, bool is_range_partition) const;matches the cpp implementation that:
- Maps
partition_idto a bucket id viaSharder,- Looks up
bucket_infos_,- Returns the owning shard id.
It’s a clean abstraction for callers and pairs well with
GetAllDataShards(). Just keep in mind that the implementation currently usesassertforbucket_infos_anddss_shard_ids_sanity; if you ever expect invalid configs in production, you may want non-assert error handling too.
670-676: UpdateShardOwner declaration is clear but document locking expectations
void UpdateShardOwner(uint32_t shard_id, uint32_t node_id);is a useful helper for adjusting ownership without exposing the underlying arrays.Given that
dss_shard_ids_is now guarded bydss_shard_ids_mutex_anddss_shards_is shared across threads, it would be good to document in the header thatUpdateShardOwneris responsible for doing any necessary locking internally (as is already done for other helpers in the cpp).
747-756: dss_shards_, dss_shard_ids_, and bucket_infos_ layout is sensibleThe shard/bucket metadata fields:
std::array<std::atomic<uint32_t>, 1000> dss_shards_; std::shared_mutex dss_shard_ids_mutex_; std::set<uint32_t> dss_shard_ids_; std::unordered_map<uint16_t, std::unique_ptr<txservice::BucketInfo>> bucket_infos_;mirror the per-shard layout on the service side and give you:
- Fast shard→node index lookup (
dss_shards_)- A thread-safe set of known local shard ids (
dss_shard_ids_)- A stable mapping from bucket id to bucket info (
bucket_infos_)Note: the comment above
dss_shards_still says “Now only support one shard”; consider updating it to describe the 0–999 shard range and relation to shard_count/topology config.data_store_service_client.cpp (1)
2554-2643: Consider removing or documenting deprecated archive fetch methods.Both
FetchArchives(lines 2554-2643) andFetchVisibleArchive(lines 2664-2751) log warnings stating they "should not be used because all archive versions are fetched from ccmap" and contain TODO comments about potential removal. If these methods are truly obsolete, consider removing them to reduce maintenance burden and code complexity. If they might be needed for backward compatibility or specific edge cases, document when and why they should be used instead of the preferred approach.Also applies to: 2664-2751
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
data_store_service_client.cpp(102 hunks)data_store_service_client.h(12 hunks)eloq_data_store_service/data_store_service.h(14 hunks)eloq_data_store_service/rocksdb_data_store_common.cpp(3 hunks)eloq_data_store_service/thread_worker_pool.cpp(2 hunks)eloq_data_store_service/thread_worker_pool.h(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-17T06:36:11.137Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 80
File: rocksdb_handler.cpp:1226-1335
Timestamp: 2025-09-17T06:36:11.137Z
Learning: In the EloqData store_handler codebase, the maximum bucket ID is guaranteed to be smaller than UINT16_MAX (65535), so bucket_id + 1 operations do not risk integer overflow when used in FetchBucketData methods.
Applied to files:
data_store_service_client.hdata_store_service_client.cpp
📚 Learning: 2025-09-17T11:08:35.322Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 83
File: data_store_service_client_closure.cpp:408-448
Timestamp: 2025-09-17T11:08:35.322Z
Learning: In the DataStoreServiceClient batch write system, the global coordinator (SyncPutAllData) manages concurrency at the partition level, not at the individual batch level. The global unfinished_request_cnt_ is only decremented when an entire partition is fully completed or failed, not after each individual batch within a partition.
Applied to files:
data_store_service_client.cpp
🧬 Code graph analysis (6)
eloq_data_store_service/thread_worker_pool.h (2)
eloq_data_store_service/rocksdb_data_store_common.cpp (2)
Initialize(163-285)Initialize(163-163)eloq_data_store_service/thread_worker_pool.cpp (2)
Initialize(34-84)Initialize(34-34)
eloq_data_store_service/rocksdb_data_store_common.cpp (3)
eloq_data_store_service/thread_worker_pool.cpp (2)
Shutdown(104-122)Shutdown(104-104)eloq_data_store_service/rocksdb_data_store.cpp (2)
Shutdown(69-83)Shutdown(69-69)eloq_data_store_service/rocksdb_cloud_data_store.cpp (2)
Shutdown(165-186)Shutdown(165-165)
eloq_data_store_service/thread_worker_pool.cpp (1)
eloq_data_store_service/rocksdb_data_store_common.cpp (2)
Initialize(163-285)Initialize(163-163)
data_store_service_client.h (4)
eloq_data_store_service/data_store_service_config.h (2)
DataStoreServiceClusterManager(317-513)uint32_t(107-110)eloq_data_store_service/data_store_service.h (1)
DataStoreService(180-694)data_store_service_client.cpp (24)
FindFreeNodeIndex(3269-3288)FindFreeNodeIndex(3269-3269)InitBucketsInfo(3132-3197)InitBucketsInfo(3132-3136)OnLeaderStart(2973-2990)OnLeaderStart(2973-2974)OnLeaderStop(2992-3005)OnLeaderStop(2992-2992)OnStartFollowing(3014-3063)OnStartFollowing(3014-3018)GetShardIdByPartitionId(3090-3115)GetShardIdByPartitionId(3090-3091)IsLocalShard(3080-3088)IsLocalShard(3080-3080)GetOwnerNodeIndexOfShard(3199-3209)GetOwnerNodeIndexOfShard(3199-3200)GetAllDataShards(3117-3130)GetAllDataShards(3117-3117)UpdateOwnerNodeIndexOfShard(3211-3267)UpdateOwnerNodeIndexOfShard(3211-3212)HandleShardingError(3290-3323)HandleShardingError(3290-3291)UpgradeShardVersion(3325-3372)UpgradeShardVersion(3325-3328)eloq_data_store_service/data_store_service_config.cpp (4)
GetShardIdByPartitionId(802-807)GetShardIdByPartitionId(802-803)HandleShardingError(950-992)HandleShardingError(950-951)
eloq_data_store_service/data_store_service.h (4)
eloq_data_store_service/data_store_service.cpp (12)
StartService(242-312)StartService(242-242)Read(399-432)Read(399-402)Read(434-477)Read(434-442)ScanClose(995-1028)ScanClose(995-998)ScanClose(1030-1062)ScanClose(1030-1035)TTLWrapperCache(72-77)TTLWrapperCache(79-88)eloq_data_store_service/rocksdb_data_store_common.cpp (5)
Read(436-497)Read(436-436)key(852-852)ScanClose(1067-1091)ScanClose(1067-1067)eloq_data_store_service/data_store_service_config.cpp (6)
PrepareShardingError(904-948)PrepareShardingError(904-905)IsOwnerOfShard(285-293)IsOwnerOfShard(285-285)IsOwnerOfShard(690-701)IsOwnerOfShard(690-691)eloq_data_store_service/data_store.h (1)
DataStore(42-143)
data_store_service_client.cpp (2)
data_store_service_client.h (4)
TxPort2DssPort(134-137)need_bootstrap_(688-859)KvPartitionIdOf(629-634)bind_data_shard_with_ng_(689-813)data_store_service_client_closure.cpp (2)
PartitionBatchCallback(493-538)PartitionBatchCallback(493-496)
🔇 Additional comments (15)
eloq_data_store_service/rocksdb_data_store_common.cpp (1)
445-462: Read path change to usereq->GetKey()+BuildKey(table_name, partition_id, key)looks soundThe new read path:
- Extracts
table_name,partition_id, andkeyfromReadRequest.- Builds the RocksDB key via
BuildKey(table_name, partition_id, key).- Uses the resulting
key_strfor thedb->Getcall and error logging.This is consistent with how
DeleteRangeandScanNextconstruct keys using the sameBuildKey/BuildKeyPrefixhelpers, and the lifetime oftable_name/keyis safe becausePoolableGuard reqkeeps the request alive for the duration of the lambda.No issues from a correctness or lifetime standpoint.
eloq_data_store_service/thread_worker_pool.h (1)
36-58: Public Initialize() API looks consistent with implementationExposing
Initialize()publicly matches the new reuse pattern inRocksDBDataStoreCommon::Initialize()and the cpp implementation, which is idempotent (checksworkers_.empty()). No issues at the declaration level.eloq_data_store_service/data_store_service.h (1)
521-529: Delegating PrepareShardingError to cluster_manager_ is the right fixForwarding
PrepareShardingErrortocluster_manager_.PrepareShardingErrorrestores the rich topology information (new primary, cluster config, shard version) needed for clients to reroute correctly. This addresses the earlier simplified implementation that broke client-side routing.data_store_service_client.h (11)
139-144: TxConfigsToDssClusterConfig parameterization is clearThe updated signature for
TxConfigsToDssClusterConfig(explicituint32_t node_idand NG leader map) makes the conversion from tx-service configs to DSS cluster config more explicit. No issues at the declaration level.
395-403: Leader/follower hooks correctly coordinate with DataStoreServiceThe new overrides:
OnLeaderStart/OnLeaderStopcallOpenDataStore(ng_id)/CloseDataStore(ng_id)whenbind_data_shard_with_ng_is true anddata_store_service_is non-null, then (for leader start)Connect().OnStartFollowingcloses the shard locally, updates cluster-manager primary and shard version, then callsSetupConfigandConnect().This wiring cleanly ties tx-service leadership changes to per-shard data store lifecycle when co-located. The control flow and conditions look sound.
501-507: Shard-aware Read() signature is consistent with DataStoreServiceThe internal
Readnow takes bothpartition_idandshard_id:void Read(const std::string_view kv_table_name, const int32_t partition_id, const uint32_t shard_id, const std::string_view key, void *callback_data, DataStoreCallback callback);This aligns with the service side (
DataStoreService::Readoverload withpartition_id+shard_id) and with the sharding config helpers inDataStoreServiceClusterManager. No issues with the interface.
510-525: BatchWriteRecords gains shard_id for proper routingSimilarly, adding
uint32_t shard_idto the internalBatchWriteRecordssignature matches the public service API and allows correct local-vs-remote routing per shard. The rest of the parameter list is unchanged, so call sites can be updated mechanically.
554-561: DeleteRange internal API now shard-aware and matches serviceThe extra
uint32_t shard_idparameter on the internalDeleteRangealigns with the shard-aware service declaration and makes routing consistent with reads and writes. No functional concerns here.
574-588: ScanNext internal path correctly includes shard_idAdding
shard_idto the internalScanNexthelper mirrors the public RPC and service overloads and is necessary for per-shard scan routing. The rest of the parameters (session, search conditions, batch size) are unchanged.
592-599: ScanClose internal path also properly carries shard_id
ScanClosenow includesint32_t partition_idanduint32_t shard_id, matchingDataStoreService::ScanCloseand ensuring scans are closed against the correct shard’s iterator cache.
627-634: KvPartitionIdOf hashing change is reasonableUsing
std::hash<std::string_view>()on the table name’s string_view and feeding that intoMapKeyHashToHashPartitionIdis a straightforward way to derive a stable hash partition id for system tables:auto hash_code = std::hash<std::string_view>()(sv); return txservice::Sharder::MapKeyHashToHashPartitionId(hash_code);Given that range/hash partitioning elsewhere already uses the sharder, this looks consistent.
648-653: IsLocalShard wrapper is thin and consistent
IsLocalShard(uint32_t shard_id)simply delegates todata_store_service_->IsOwnerOfShard(shard_id)when co-located, returning false otherwise. This is a minimal, clear abstraction over the service-side ownership check.
659-668: Shard metadata helpers and InitBucketsInfo declaration match cpp usageThe trio:
uint32_t GetOwnerNodeIndexOfShard(uint32_t shard_id) const;std::vector<uint32_t> GetAllDataShards();void InitBucketsInfo(const std::set<uint32_t> &node_groups, uint64_t version, std::unordered_map<uint16_t, std::unique_ptr<txservice::BucketInfo>> &ng_bucket_infos);matches the cpp implementations where:
GetOwnerNodeIndexOfShardchecksshard_idagainstdss_shards_.size()and returnsUINT32_MAXon overflow.GetAllDataShardsusesdss_shard_ids_mutex_for thread-safe iteration.InitBucketsInfopopulatesbucket_infos_deterministically from node groups.The declaration set looks coherent and thread-safety concerns around
dss_shard_ids_are addressed via the shared mutex.
685-690: New flags need_bootstrap_ and bind_data_shard_with_ng_ are well-scopedStoring
need_bootstrap_andbind_data_shard_with_ng_as simple booleans tied to ctor parameters is straightforward and keeps bootstrap/binding policy local to the client. Combined with the leadership callbacks, they cleanly gate shard–NG coupling behavior.data_store_service_client.cpp (1)
3132-3197: LGTM: Thread-safe and deterministic bucket assignment.The
InitBucketsInfoimplementation correctly usesstd::mt19937for thread-safe, deterministic random number generation, handles the emptynode_groupscase gracefully, and implements consistent hashing with virtual nodes. The deterministic behavior (same node group ID produces same bucket assignments) is crucial for multi-node consistency.
| DSShardStatus FetchDSShardStatus(uint32_t shard_id) | ||
| { | ||
| if (shard_id_ == shard_id) | ||
| if (data_shards_.at(shard_id).shard_id_ == shard_id) | ||
| { | ||
| return shard_status_; | ||
| return data_shards_.at(shard_id).shard_status_.load( | ||
| std::memory_order_acquire); | ||
| } | ||
| return DSShardStatus::Closed; | ||
| } |
There was a problem hiding this comment.
Add shard_id bounds check in FetchDSShardStatus to avoid out-of-range access
FetchDSShardStatus uses data_shards_.at(shard_id) without validating shard_id against data_shards_.size(). A malformed or misrouted request with shard_id >= 1000 will hit std::array::at out-of-range, which is at best a process abort.
Consider guarding all inline helpers that index data_shards_:
DSShardStatus FetchDSShardStatus(uint32_t shard_id)
{
- if (data_shards_.at(shard_id).shard_id_ == shard_id)
+ if (shard_id >= data_shards_.size())
+ {
+ return DSShardStatus::Closed;
+ }
+
+ if (data_shards_[shard_id].shard_id_ == shard_id)
{
- return data_shards_.at(shard_id).shard_status_.load(
- std::memory_order_acquire);
+ return data_shards_[shard_id].shard_status_.load(
+ std::memory_order_acquire);
}
return DSShardStatus::Closed;
}Apply the same pattern to IncreaseWriteReqCount, DecreaseWriteReqCount, IsOwnerOfShard, and GetDataStore to harden all shard-indexed accesses.
🤖 Prompt for AI Agents
In eloq_data_store_service/data_store_service.h around lines 582 to 590,
FetchDSShardStatus indexes data_shards_ with at() without bounds checking; add a
guard that verifies shard_id < data_shards_.size() before accessing
data_shards_.at(shard_id) and return DSShardStatus::Closed if out of range, then
use the same defensive pattern in IncreaseWriteReqCount, DecreaseWriteReqCount,
IsOwnerOfShard, and GetDataStore to validate shard_id and avoid out-of-range
accesses (perform size check first, only then call at() and proceed with
existing logic).
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
eloq_data_store_service/data_store_service.cpp (3)
404-432: Consider adding bounds validation for shard_id.Throughout the RPC handlers,
shard_idis extracted from requests and used directly withdata_shards_.at(shard_id)(lines 414, 451, 495, 586, etc.). Theat()method throwsstd::out_of_rangeif the index is invalid. WhileIsOwnerOfShard(shard_id)provides some validation, consider adding explicit bounds checking before dereferencing to prevent potential exceptions from malformed requests or configuration mismatches.if (shard_id >= data_shards_.size()) { brpc::ClosureGuard done_guard(done); auto *result = response->mutable_result(); result->set_error_code(::EloqDS::remote::DataStoreError::INVALID_SHARD_ID); result->set_error_msg("Invalid shard ID"); return; }
1010-1020: Remove assert in production error path.Line 1014 contains
assert(false)in an error path that can be reached if the shard status is neither ReadWrite nor ReadOnly. This will crash the service in production builds where asserts are enabled. The error condition is already being handled correctly by returning an error response.if (shard_status != DSShardStatus::ReadWrite && shard_status != DSShardStatus::ReadOnly) { - assert(false); brpc::ClosureGuard done_guard(done); auto *result = response->mutable_result();
1044-1054: Remove assert in production error path.Line 1049 contains
assert(false)in an error path similar to line 1014. This will crash the service if the shard status is neither ReadWrite nor ReadOnly during a scan close operation.if (shard_status != DSShardStatus::ReadWrite && shard_status != DSShardStatus::ReadOnly) { - assert(false); brpc::ClosureGuard done_guard(done); result->set_error_code(::EloqDS::remote::DataStoreError::DB_NOT_OPEN);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
eloq_data_store_service/data_store_service.cpp(45 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-17T11:08:35.322Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 83
File: data_store_service_client_closure.cpp:408-448
Timestamp: 2025-09-17T11:08:35.322Z
Learning: In the DataStoreServiceClient batch write system, the global coordinator (SyncPutAllData) manages concurrency at the partition level, not at the individual batch level. The global unfinished_request_cnt_ is only decremented when an entire partition is fully completed or failed, not after each individual batch within a partition.
Applied to files:
eloq_data_store_service/data_store_service.cpp
🧬 Code graph analysis (1)
eloq_data_store_service/data_store_service.cpp (4)
eloq_data_store_service/data_store_service.h (4)
IsOwnerOfShard(615-620)PrepareShardingError(525-529)IncreaseWriteReqCount(603-607)DecreaseWriteReqCount(609-613)eloq_data_store_service/data_store_service_config.cpp (6)
IsOwnerOfShard(285-293)IsOwnerOfShard(285-285)IsOwnerOfShard(690-701)IsOwnerOfShard(690-691)PrepareShardingError(904-948)PrepareShardingError(904-905)data_store_service_client.cpp (12)
Read(1087-1096)Read(1087-1092)Read(3528-3544)Read(3528-3533)DeleteRange(3577-3599)DeleteRange(3577-3584)ScanNext(3720-3753)ScanNext(3720-3734)ScanClose(3795-3819)ScanClose(3795-3800)BatchWriteRecords(4118-4153)BatchWriteRecords(4118-4131)eloq_data_store_service/rocksdb_data_store_common.cpp (15)
Read(436-497)Read(436-436)key(852-852)DeleteRange(359-434)DeleteRange(359-359)start_key(387-387)start_key(699-699)end_key(402-402)end_key(704-704)ScanNext(755-1065)ScanNext(755-755)ScanClose(1067-1091)ScanClose(1067-1067)BatchWriteRecords(499-661)BatchWriteRecords(499-500)
🔇 Additional comments (6)
eloq_data_store_service/data_store_service.cpp (6)
216-221: LGTM! Per-shard initialization looks correct.The constructor properly initializes each shard in the
data_shards_array with its ID and Closed status, preparing for multi-shard operation.
254-289: LGTM! Per-shard startup logic is sound.The startup flow correctly iterates through owned shards, creates per-shard data stores with appropriate error handling, and initializes per-shard scan iterator caches. The atomic status transitions using compare-exchange ensure thread safety.
375-378: Good defensive check for scan iterator cache.The null check before creating
scan_iter_cache_handles the case where a shard may be reconnected after partial initialization, preventing resource leaks.
1083-1132: LGTM! Scan iterator cache methods properly handle per-shard caches.All scan iterator cache methods consistently check for null cache before dereferencing, providing appropriate defensive behavior for partially initialized shards.
1893-1927: Excellent observability for shard opening operations.The timing measurements and comprehensive logging with shard IDs, status, and duration provide valuable debugging information for troubleshooting multi-shard startup issues.
2311-2330: Proper synchronization for read-write to read-only transition.The implementation correctly uses compare-exchange for atomic status transitions and waits for all in-flight write requests to complete before finalizing the switch to ReadOnly mode, ensuring data consistency.
429175b to
85d911c
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
eloq_data_store_service/data_store_service.cpp (1)
314-397: Avoid leaving shard_status_ stuck inStartingon data store open failure.In
ConnectAndStartDataStore, all early failure returns (CreateDataStore/Initialize/StartDBfailures) are taken after the shard status has been CAS'ed fromClosedtoStarting, but before it is updated to the targetopen_mode. On those paths:
shard_ref.shard_status_remainsStarting.- Subsequent calls will hit the
while (expect_status == DSShardStatus::Starting)loop (Lines 333–338) and spin forever, because nothing ever moves the shard out ofStarting.IsOwnerOfShardwill also treat this shard as owned (non‑Closed), butdata_store_may be null, so the only guard against dereference is the shard_status check in each RPC handler.Reset the status back to
Closed(or some explicit terminal error state) before returningfalseso the shard can be retried or treated cleanly as closed. For example:if (shard_ref.data_store_ == nullptr) { shard_ref.data_store_ = data_store_factory_->CreateDataStore( create_db_if_missing, data_shard_id, this, true); if (shard_ref.data_store_ == nullptr) { LOG(ERROR) << "Failed to create data store"; + DSShardStatus expected = DSShardStatus::Starting; + shard_ref.shard_status_.compare_exchange_strong( + expected, DSShardStatus::Closed, std::memory_order_release); return false; } } else { bool res = shard_ref.data_store_->Initialize(); if (!res) { LOG(ERROR) << "Failed to initialize data store"; + DSShardStatus expected = DSShardStatus::Starting; + shard_ref.shard_status_.compare_exchange_strong( + expected, DSShardStatus::Closed, std::memory_order_release); return false; } res = shard_ref.data_store_->StartDB(); if (!res) { LOG(ERROR) << "Failed to start db instance in data store service"; + DSShardStatus expected = DSShardStatus::Starting; + shard_ref.shard_status_.compare_exchange_strong( + expected, DSShardStatus::Closed, std::memory_order_release); return false; } }You can also factor this into a small helper to keep the pattern consistent across all failure exits.
♻️ Duplicate comments (4)
data_store_service_scanner.h (1)
207-221: Fix partition_id type mismatch in SinglePartitionScanner; new data_shard_id wiring looks good.
Resetnow takesint32_t partition_idanduint32_t data_shard_id, assigns both, andClear()resetsdata_shard_id_along with the rest of the state. That part is consistent with the per-shard scan design.However,
partition_id_is now declared asint32_t, whileGetPartitionId()still returnsuint32_t:uint32_t GetPartitionId() { return partition_id_; } ... int32_t partition_id_;This is inconsistent with the proto (
int32 partition_id) andinternal_request.hgetters, and it can silently reinterpret negative/sentinel values if they ever arise.Update the getter to return
int32_tto match the member and the rest of the stack, and adjust any callers that expect an unsigned type:- uint32_t GetPartitionId() + int32_t GetPartitionId() { return partition_id_; }The
data_shard_id_addition and its reset inClear()look correct.Also applies to: 235-238, 248-259, 264-267
eloq_data_store_service/ds_request.proto (1)
92-97: Avoid renumbering existing proto fields; BatchWriteRecordsRequest likely still breaks wire compatibility.
ReadRequest,DeleteRangeRequest, andScanRequestaddshard_idat new field numbers and switchpartition_idtoint32while keeping the original field numbers. Those changes are generally compatible as long as all code treats partition IDs as non-negative.- In
BatchWriteRecordsRequest, you now have:If earlier releases hadstring kv_table_name = 1; int32 partition_id = 2; uint32 shard_id = 3; bool skip_wal = 4; repeated Item items = 5;skip_wal = 3anditems = 4(which is very likely given the previous review), insertingshard_idat 3 and shifting the existing fields changes their wire numbers and will cause old and new clients/servers to misinterpret each other’s messages (e.g., old peers readingshard_idasskip_wal, or new peers reading oldskip_walasshard_id).To stay wire-compatible, keep the original numbers for existing fields and assign
shard_idan unused number instead, e.g.:string kv_table_name = 1; int32 partition_id = 2; bool skip_wal = 3; // keep original number repeated Item items = 4; // keep original number uint32 shard_id = 5; // or another unused numberPlease confirm the historical numbering from your existing deployed version and adjust accordingly. If you truly intend a breaking change, it should be versioned and coordinated across all dependent services.
Also applies to: 132-146, 161-168, 302-320
eloq_data_store_service/data_store_service.h (1)
582-590: Add shard_id bounds checks before indexing data_shards_All these helpers index
data_shards_with.at(shard_id)without checkingshard_id < data_shards_.size(). A malformed or misrouted request (or a bug in the caller’s shard routing) would turn into astd::out_of_rangeand likely abort the process. This was previously flagged and is still present.Recommend hardening them with a cheap bounds guard and switching to
operator[]once bounds are validated, e.g.:DSShardStatus FetchDSShardStatus(uint32_t shard_id) { - if (data_shards_.at(shard_id).shard_id_ == shard_id) + if (shard_id >= data_shards_.size()) + { + return DSShardStatus::Closed; + } + + const auto &ds_ref = data_shards_[shard_id]; + if (ds_ref.shard_id_ == shard_id) { - return data_shards_.at(shard_id).shard_status_.load( - std::memory_order_acquire); + return ds_ref.shard_status_.load(std::memory_order_acquire); } return DSShardStatus::Closed; } void IncreaseWriteReqCount(uint32_t shard_id) { - data_shards_.at(shard_id).ongoing_write_requests_.fetch_add( - 1, std::memory_order_release); + if (shard_id >= data_shards_.size()) + { + return; + } + data_shards_[shard_id].ongoing_write_requests_.fetch_add( + 1, std::memory_order_release); } void DecreaseWriteReqCount(uint32_t shard_id) { - data_shards_.at(shard_id).ongoing_write_requests_.fetch_sub( - 1, std::memory_order_release); + if (shard_id >= data_shards_.size()) + { + return; + } + data_shards_[shard_id].ongoing_write_requests_.fetch_sub( + 1, std::memory_order_release); } bool IsOwnerOfShard(uint32_t shard_id) const { - const auto &ds_ref = data_shards_.at(shard_id); + if (shard_id >= data_shards_.size()) + { + return false; + } + const auto &ds_ref = data_shards_[shard_id]; return ds_ref.shard_status_.load(std::memory_order_acquire) != DSShardStatus::Closed; } DataStore *GetDataStore(uint32_t shard_id) { - if (data_shards_.at(shard_id).shard_id_ == shard_id) + if (shard_id >= data_shards_.size()) + { + return nullptr; + } + if (data_shards_[shard_id].shard_id_ == shard_id) { - return data_shards_.at(shard_id).data_store_.get(); + return data_shards_[shard_id].data_store_.get(); } else { return nullptr; } }This keeps misrouted shard_ids from turning into crashes and is consistent with the bound checks you've added on the client side.
Also applies to: 603-607, 609-613, 615-620, 631-641
eloq_data_store_service/data_store_service.cpp (1)
1854-1891: Shard close still logs state‑transition failures without enforcing a closed terminal state.
CloseDataStorenow logs whenSwitchReadWriteToReadOnlyorSwitchReadOnlyToClosedfail, but it still treats those errors as a no‑op and returns without verifying thatds_ref.shard_status_actually reachesClosed. This means shards can remain inReadWriteorReadOnlyif a transition fails, with no retry or explicit error surfaced to callers.This behavior (log‑only, no fail‑fast) was already flagged in previous reviews; consider either:
- enforcing that a caller can detect failure (e.g., return a status), and/or
- adding bounded retries / a final state check to guarantee that
CloseDataStoreeither closes the shard or reports failure explicitly.
🧹 Nitpick comments (11)
eloq_data_store_service/rocksdb_data_store_common.cpp (1)
287-301: Consider documenting Shutdown() reuse semantics.The method shuts down the worker pool but intentionally does not null
query_worker_pool_ordata_store_service_, allowingInitialize()to reuse the pool. While the inline comment at line 293 hints at this, consider adding a brief doc comment on the method declaration in the header explaining the reuse pattern.In
rocksdb_data_store_common.h, add a doc comment:/** * @brief Shutdown the data store resources. * * Shuts down the query worker pool and erases scan iterators. * Does not null internal pointers to allow reuse via Initialize(). */ void Shutdown() override;data_store_service_client.h (2)
632-634: Prefer direct hash computation over temporary.The
KvPartitionIdOfmethod creates a temporary hash and then callsMapKeyHashToHashPartitionId. Consider computing the hash inline to avoid the intermediate variable.int32_t KvPartitionIdOf(const txservice::TableName &table) const { - std::string_view sv = table.StringView(); - auto hash_code = std::hash<std::string_view>()(sv); - return txservice::Sharder::MapKeyHashToHashPartitionId(hash_code); + return txservice::Sharder::MapKeyHashToHashPartitionId( + std::hash<std::string_view>()(table.StringView())); }
749-756: Document bucket_infos_ immutability and dss_shard_ids_ protection.Per past review discussion,
bucket_infos_size never changes after initialization, so no mutex is needed. However,dss_shard_ids_is mutable and protected bydss_shard_ids_mutex_. Add a brief comment clarifying the synchronization guarantees.std::array<std::atomic<uint32_t>, 1000> dss_shards_; std::atomic<uint64_t> dss_topology_version_{0}; + // Protects dss_shard_ids_ for concurrent access std::shared_mutex dss_shard_ids_mutex_; + // Mutable set of active shard IDs; access guarded by dss_shard_ids_mutex_ std::set<uint32_t> dss_shard_ids_; - // key is bucket id, value is bucket info. + // Immutable after construction; maps bucket_id to BucketInfo; no lock needed std::unordered_map<uint16_t, std::unique_ptr<txservice::BucketInfo>> bucket_infos_;eloq_data_store_service/thread_worker_pool.cpp (1)
31-84: ThreadWorkerPool lifecycle and re-init look sound; spin-wait could be tightened.Constructor +
Initialize()+Shutdown()form a consistent lifecycle: workers are started once per generation, block on the queue with a cv, and exit cleanly whenshutdown_indicator_is set, after whichShutdown()joins, clearsworkers_, and resets the flag so the pool can be reused.Two minor suggestions:
- If
Initialize()may be called from arbitrary threads, consider documenting that it’s single-threaded or guarding it with a mutex/std::call_onceto avoid accidental concurrent init.- The
while (shutdown_indicator_)loop is a busy wait; if you expect long shutdowns, consider replacing it with a condition variable or makingInitialize()only callable after the previousShutdown()returns.Also applies to: 104-122
data_store_service_client_closure.cpp (1)
245-267: FetchBucketDataCallback sharding and key/value handling look correct; just ensure AddDataItem matches new key semantics.
- Moving to
std::string key_str/value_strfromScanNextClosure::GetItemand pushing them viaAddDataItem(std::move(key_str), std::move(value_str), ts, false)is consistent;key_str/value_strare re-assigned each iteration, so moving them is safe.- TTL filtering (
ttl > 0 && ttl < now) skips expired entries, matching the previous behavior.- The partition/shard routing:
MapBucketIdToHashPartitionId(fetch_bucket_data_cc->bucket_id_)KvPartitionIdOf(partition_id, false)GetShardIdByPartitionId(kv_partition_id, false)
then feedingkv_partition_idanddata_shard_idintoScanNextis aligned with the new hash-partition + shard mapping. Given the earlier guarantee that bucket_id stays within a small range, theint32_tpartition_id is safe. Based on learnings.Just double-check that
FetchBucketDataCc::AddDataItemis now defined to accept the raw key string (not the previous TxKey/hash-part form) so downstream consumers interpret keys correctly.Also applies to: 278-285, 285-299
data_store_service_client.cpp (3)
224-227: Align ScheduleTimerTasks documentation with noop implementation
Connect()’sneed_bootstrap_guard and the newScheduleTimerTasks()body (justLOG(WARNING)+ return) are fine behaviorally, but the comment aboveScheduleTimerTasksstill says “Will assert and log an error if called.”To avoid confusion for future readers, consider updating the docstring to say that the method is currently a noop that logs a warning, and that timer-based tasks are not yet implemented.
Also applies to: 251-254
1808-1813: “should not be used (noop)” logs are misleading for live code paths
DropKvTableAsync,FetchArchives(both overloads), andFetchVisibleArchivenow log that they “should not be used” and/or are “noop”, but they still perform real KV operations (drop table, scans, archive decoding).If the intent is to deprecate these APIs and truly make them no-ops, they should return immediately after logging without issuing RPCs/reads. If, instead, you just want to discourage use but keep behavior intact, consider rephrasing the messages, e.g.:
"DropKvTableAsync is deprecated and may be removed; prefer DropKvTable""FetchArchives is deprecated; archives are normally fetched from ccmap"to avoid future confusion about whether these paths actually mutate state.
Also applies to: 1960-1962, 2120-2137, 2554-2563, 2629-2632, 2673-2676, 2739-2743
3090-3094: GetShardIdByPartitionId / InitBucketsInfo: document immutability and usageOne minor suggestion for maintainability: since
bucket_infos_anddss_shard_ids_are read without locks inGetShardIdByPartitionIdand other places, it would help to document the assumption that they are immutable after initialization (and that any future shard scaling must revisit this). A brief comment near their declarations in the header like:// Immutable after initialization; safe to read without locks. // If shard scaling is introduced, revisit thread-safety for updates. std::unordered_map<uint16_t, std::unique_ptr<txservice::BucketInfo>> bucket_infos_; std::set<uint32_t> dss_shard_ids_;would make the concurrency model clearer.
Also applies to: 3151-3166
eloq_data_store_service/data_store_service.cpp (2)
217-221: Defensively guarddata_shards_indexing by shard_id in StartService.
StartServiceindexesdata_shards_viadata_shards_[shard_id]for eachshard_idreturned fromcluster_manager_.GetShardsForThisNode()without any bound check. Every other access here uses.at().If configuration/topology bugs ever produce a shard id ≥
data_shards_.size(), this becomes undefined behavior. Adding an invariant check here would make failures diagnosable and safer:for (uint32_t shard_id : dss_shards) { - auto &ds_ref = data_shards_[shard_id]; + assert(shard_id < data_shards_.size()); + auto &ds_ref = data_shards_[shard_id];or use
.at(shard_id)to match the rest of the file.Also applies to: 254-260
903-959: Consider zeroing TTL on localReadDB_NOT_OPEN branch for consistency.In the local
Read(...)overload, when the shard status is neitherReadOnlynorReadWrite, you clearrecordandtsbut leave*ttluntouched before returningDB_NOT_OPEN. Downstream callers that read TTL unconditionally could observe a stale value from the previous successful read.To keep error cases consistent with “not found” semantics (where TTL is set to 0 in RocksDB path), you might also set
*ttl = 0;in this branch.eloq_data_store_service/internal_request.h (1)
329-470: Per‑shard GetShardId + DecreaseWriteReqCount pattern is coherent across all write‑like requests.For
FlushData*,DeleteRange*,CreateTable*,DropTable*,Scan*(write‑side only for scans that mutate iter cache), andCreateSnapshotForBackup*:
- Each base request type now exposes
virtual uint32_t GetShardId() const.- RPC implementations read shard id from the proto (
req_->shard_id()); local implementations storeshard_id_inResetand clear it inClear.SetFinishconsistently callsdata_store_service_->DecreaseWriteReqCount(...)with the shard id, aligning cleanly with the server‑sideIncreaseWriteReqCount(shard_id)in the correspondingDataStoreServiceentrypoints.This design should ensure per‑shard write counters are balanced even when multiple shards are active concurrently.
One minor nit (non‑blocking): some
SetFinishimplementations callDecreaseWriteReqCount(req_->shard_id())instead ofDecreaseWriteReqCount(GetShardId()). Using the interface everywhere would make the pattern slightly more uniform, but it's functionally equivalent.Also applies to: 492-654, 865-999, 1001-1134, 1136-1518, 1520-1682
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
data_store_service_client.cpp(102 hunks)data_store_service_client.h(12 hunks)data_store_service_client_closure.cpp(17 hunks)data_store_service_client_closure.h(34 hunks)data_store_service_scanner.cpp(3 hunks)data_store_service_scanner.h(4 hunks)eloq_data_store_service/CMakeLists.txt(0 hunks)eloq_data_store_service/data_store_service.cpp(45 hunks)eloq_data_store_service/data_store_service.h(14 hunks)eloq_data_store_service/data_store_service_util.h(1 hunks)eloq_data_store_service/ds_request.proto(5 hunks)eloq_data_store_service/internal_request.h(55 hunks)eloq_data_store_service/main.cpp(1 hunks)eloq_data_store_service/rocksdb_cloud_data_store.cpp(5 hunks)eloq_data_store_service/rocksdb_cloud_data_store_factory.h(1 hunks)eloq_data_store_service/rocksdb_data_store.cpp(1 hunks)eloq_data_store_service/rocksdb_data_store_common.cpp(3 hunks)eloq_data_store_service/rocksdb_data_store_common.h(1 hunks)eloq_data_store_service/thread_worker_pool.cpp(2 hunks)eloq_data_store_service/thread_worker_pool.h(1 hunks)
💤 Files with no reviewable changes (1)
- eloq_data_store_service/CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (1)
- eloq_data_store_service/rocksdb_cloud_data_store.cpp
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-17T06:36:11.137Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 80
File: rocksdb_handler.cpp:1226-1335
Timestamp: 2025-09-17T06:36:11.137Z
Learning: In the EloqData store_handler codebase, the maximum bucket ID is guaranteed to be smaller than UINT16_MAX (65535), so bucket_id + 1 operations do not risk integer overflow when used in FetchBucketData methods.
Applied to files:
data_store_service_client_closure.cppdata_store_service_client.cppdata_store_service_client.h
📚 Learning: 2025-09-17T11:08:35.322Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 83
File: data_store_service_client_closure.cpp:408-448
Timestamp: 2025-09-17T11:08:35.322Z
Learning: In the DataStoreServiceClient batch write system, the global coordinator (SyncPutAllData) manages concurrency at the partition level, not at the individual batch level. The global unfinished_request_cnt_ is only decremented when an entire partition is fully completed or failed, not after each individual batch within a partition.
Applied to files:
data_store_service_client_closure.cppeloq_data_store_service/data_store_service.cppdata_store_service_client.cpp
🧬 Code graph analysis (14)
eloq_data_store_service/thread_worker_pool.h (2)
eloq_data_store_service/rocksdb_data_store_common.cpp (2)
Initialize(163-285)Initialize(163-163)eloq_data_store_service/thread_worker_pool.cpp (2)
Initialize(34-84)Initialize(34-34)
eloq_data_store_service/thread_worker_pool.cpp (1)
eloq_data_store_service/rocksdb_data_store_common.cpp (2)
Initialize(163-285)Initialize(163-163)
eloq_data_store_service/rocksdb_data_store.cpp (2)
eloq_data_store_service/rocksdb_data_store_common.cpp (2)
Shutdown(287-301)Shutdown(287-287)eloq_data_store_service/rocksdb_cloud_data_store.cpp (2)
Shutdown(165-186)Shutdown(165-165)
data_store_service_scanner.h (1)
eloq_data_store_service/internal_request.h (13)
string_view(106-109)string_view(126-129)string_view(273-276)string_view(606-614)string_view(621-629)string_view(811-819)string_view(975-983)string_view(1110-1118)string_view(1406-1414)string_view(1421-1429)partition_id_(647-647)partition_id_(856-856)partition_id_(1504-1504)
data_store_service_scanner.cpp (1)
data_store_service_scanner.h (1)
data_shard_id_(266-266)
eloq_data_store_service/rocksdb_data_store_common.h (4)
eloq_data_store_service/rocksdb_data_store_common.cpp (2)
Shutdown(287-301)Shutdown(287-287)eloq_data_store_service/thread_worker_pool.cpp (2)
Shutdown(104-122)Shutdown(104-104)eloq_data_store_service/rocksdb_cloud_data_store.cpp (2)
Shutdown(165-186)Shutdown(165-165)eloq_data_store_service/rocksdb_data_store.cpp (2)
Shutdown(69-83)Shutdown(69-69)
eloq_data_store_service/internal_request.h (3)
eloq_data_store_service/data_store_service_config.h (2)
uint32_t(107-110)EloqDS(98-514)data_store_service_client_closure.h (33)
Reset(81-85)Reset(144-153)Reset(210-216)Reset(269-274)Reset(385-392)Reset(600-619)Reset(933-948)Reset(1154-1177)Reset(1393-1408)Reset(1598-1624)Reset(1627-1658)Reset(1660-1694)Reset(1933-2251)Reset(2328-2347)Reset(2606-2612)Reset(3002-3010)vector(1078-1081)vector(1121-1128)vector(2253-2256)string(804-809)string(2173-2178)string_view(311-368)string_view(752-755)string_view(767-770)string_view(792-797)string_view(1291-1294)string_view(1533-1536)string_view(2133-2136)string_view(2138-2141)string_view(2241-2246)string_view(2453-2456)EloqDS(39-42)EloqDS(44-184)eloq_data_store_service/data_store.h (1)
EloqDS(28-145)
eloq_data_store_service/rocksdb_cloud_data_store_factory.h (2)
eloq_data_store_service/rocksdb_cloud_data_store.cpp (2)
RocksDBCloudDataStore(138-155)RocksDBCloudDataStore(157-163)eloq_data_store_service/rocksdb_cloud_data_store.h (1)
RocksDBCloudDataStore(38-161)
eloq_data_store_service/data_store_service.h (6)
eloq_data_store_service/data_store_service.cpp (12)
StartService(242-312)StartService(242-242)Read(399-432)Read(399-402)Read(434-477)Read(434-442)ScanClose(995-1028)ScanClose(995-998)ScanClose(1030-1062)ScanClose(1030-1035)TTLWrapperCache(72-77)TTLWrapperCache(79-88)eloq_data_store_service/rocksdb_data_store_common.cpp (5)
Read(436-497)Read(436-436)key(852-852)ScanClose(1067-1091)ScanClose(1067-1067)data_store_service_client.cpp (6)
Read(1087-1096)Read(1087-1092)Read(3528-3544)Read(3528-3533)ScanClose(3795-3819)ScanClose(3795-3800)eloq_data_store_service/internal_request.h (10)
string_view(106-109)string_view(126-129)string_view(273-276)string_view(606-614)string_view(621-629)string_view(811-819)string_view(975-983)string_view(1110-1118)string_view(1406-1414)string_view(1421-1429)eloq_data_store_service/data_store_service_config.cpp (6)
PrepareShardingError(904-948)PrepareShardingError(904-905)IsOwnerOfShard(285-293)IsOwnerOfShard(285-285)IsOwnerOfShard(690-701)IsOwnerOfShard(690-691)eloq_data_store_service/data_store.h (1)
DataStore(42-143)
eloq_data_store_service/rocksdb_data_store_common.cpp (4)
eloq_data_store_service/thread_worker_pool.cpp (2)
Shutdown(104-122)Shutdown(104-104)eloq_data_store_service/rocksdb_cloud_data_store.cpp (2)
Shutdown(165-186)Shutdown(165-165)eloq_data_store_service/rocksdb_data_store.cpp (2)
Shutdown(69-83)Shutdown(69-69)eloq_data_store_service/data_store_service_util.h (1)
shard_id_(123-123)
eloq_data_store_service/data_store_service.cpp (4)
eloq_data_store_service/data_store_service.h (4)
IsOwnerOfShard(615-620)PrepareShardingError(525-529)IncreaseWriteReqCount(603-607)DecreaseWriteReqCount(609-613)eloq_data_store_service/data_store_service_config.cpp (6)
IsOwnerOfShard(285-293)IsOwnerOfShard(285-285)IsOwnerOfShard(690-701)IsOwnerOfShard(690-691)PrepareShardingError(904-948)PrepareShardingError(904-905)eloq_data_store_service/rocksdb_data_store_common.cpp (11)
Read(436-497)Read(436-436)key(852-852)start_key(387-387)start_key(699-699)end_key(402-402)end_key(704-704)ScanNext(755-1065)ScanNext(755-755)ScanClose(1067-1091)ScanClose(1067-1067)data_store_service_client.cpp (8)
Read(1087-1096)Read(1087-1092)Read(3528-3544)Read(3528-3533)ScanNext(3720-3753)ScanNext(3720-3734)ScanClose(3795-3819)ScanClose(3795-3800)
data_store_service_client_closure.h (2)
eloq_data_store_service/internal_request.h (13)
partition_id_(647-647)partition_id_(856-856)partition_id_(1504-1504)string_view(106-109)string_view(126-129)string_view(273-276)string_view(606-614)string_view(621-629)string_view(811-819)string_view(975-983)string_view(1110-1118)string_view(1406-1414)string_view(1421-1429)eloq_data_store_service/data_store_service_util.h (1)
shard_id_(123-123)
data_store_service_client.cpp (3)
data_store_service_client.h (4)
TxPort2DssPort(134-137)need_bootstrap_(688-859)KvPartitionIdOf(629-634)bind_data_shard_with_ng_(689-813)eloq_data_store_service/data_store_service.cpp (8)
BatchWriteRecords(853-901)BatchWriteRecords(853-857)BatchWriteRecords(1134-1199)BatchWriteRecords(1134-1147)Read(399-432)Read(399-402)Read(434-477)Read(434-442)data_store_service_client_closure.cpp (2)
PartitionBatchCallback(493-538)PartitionBatchCallback(493-496)
data_store_service_client.h (1)
data_store_service_client.cpp (24)
FindFreeNodeIndex(3269-3288)FindFreeNodeIndex(3269-3269)InitBucketsInfo(3132-3197)InitBucketsInfo(3132-3136)OnLeaderStart(2973-2990)OnLeaderStart(2973-2974)OnLeaderStop(2992-3005)OnLeaderStop(2992-2992)OnStartFollowing(3014-3063)OnStartFollowing(3014-3018)GetShardIdByPartitionId(3090-3115)GetShardIdByPartitionId(3090-3091)IsLocalShard(3080-3088)IsLocalShard(3080-3080)GetOwnerNodeIndexOfShard(3199-3209)GetOwnerNodeIndexOfShard(3199-3200)GetAllDataShards(3117-3130)GetAllDataShards(3117-3117)UpdateOwnerNodeIndexOfShard(3211-3267)UpdateOwnerNodeIndexOfShard(3211-3212)HandleShardingError(3290-3323)HandleShardingError(3290-3291)UpgradeShardVersion(3325-3372)UpgradeShardVersion(3325-3328)
🪛 Clang (14.0.6)
eloq_data_store_service/data_store_service_util.h
[error] 24-24: 'atomic' file not found
(clang-diagnostic-error)
🔇 Additional comments (35)
eloq_data_store_service/thread_worker_pool.h (1)
42-42: LGTM!Adding
Initialize()enables explicit lifecycle management, allowing the thread pool to be reused afterShutdown(). This aligns with the per-shard data store reuse pattern introduced in the broader PR.eloq_data_store_service/rocksdb_data_store_common.cpp (2)
279-282: LGTM!The else branch correctly reinitializes an existing worker pool, enabling data store reuse after shutdown. This aligns with the per-shard lifecycle management introduced across the PR.
447-458: LGTM!The updated Read path now extracts the key via
req->GetKey()and passes it directly toBuildKey(table_name, partition_id, key). This aligns with the shard-aware proto changes where keys are passed askey_strfields rather than embedded in request objects.data_store_service_client.h (1)
74-86: LGTM!The constructor parameters
is_bootstrapandbind_data_shard_with_ngcorrectly initialize the corresponding member variables, enabling flexible shard binding and bootstrap modes for different deployment scenarios.eloq_data_store_service/rocksdb_data_store.cpp (1)
69-83: LGTM!The Shutdown sequence correctly delegates to
RocksDBDataStoreCommon::Shutdown()first (handling worker pool and scan iterator cleanup), then acquires the lock and closes the database. The db_ pointer is properly nulled after deletion, preventing use-after-free.eloq_data_store_service/main.cpp (1)
321-322: LGTM!The simplified
StartService(bool)signature reflects the refactored per-shard initialization flow, where shard-specific setup is now handled internally rather than passed as parameters. This improves encapsulation.eloq_data_store_service/rocksdb_data_store_common.h (1)
165-165: LGTM!The
Shutdown()override enables centralized resource cleanup for query worker pools and scan iterators, supporting the per-shard lifecycle management pattern introduced in this PR.data_store_service_scanner.cpp (3)
64-67: LGTM!The
ScanNextcall now correctly propagatesdata_shard_id_to route scan requests to the appropriate shard, aligning with the per-shard architecture introduced in this PR.
109-111: LGTM!The
ScanClosecall now correctly includesdata_shard_id_for shard-aware session cleanup, consistent with the scan opening path.
417-419: LGTM!The per-partition shard_id computation correctly calls
GetShardIdByPartitionIdwithis_range_partition=falsefor hash partitions, then passesdata_shard_idtoReset. This ensures each partition scanner targets the correct shard.eloq_data_store_service/data_store_service_util.h (1)
24-30: No issues found; the atomic header is properly supported.The build environment is correctly configured with C++17 (eloq_data_store_service/CMakeLists.txt:27) and C++20 (build_eloq_store.cmake:3), both of which fully support the standard
<atomic>header. The static analysis warning appears to be a false positive, as the code is valid and should compile without issues.data_store_service_client_closure.cpp (3)
71-82: Key/logging changes to use ReadClosure::Key() are consistent.Using
read_closure->Key()instead ofKey().back()for logging and key_str extraction aligns with the refactored closure API and avoids truncating multi-part keys. Lifetime and ownership are unchanged vs the previous string_view-from-key-part pattern, so error logs (decode failures, key-not-found paths) andAddResultcalls remain safe.Also applies to: 96-107, 176-181, 352-355
570-632: ShardId propagation in Scan/Read callbacks is consistent with per-shard routing.Across callbacks (
FetchAllDatabaseCallback,DiscoverAllTableNamesCallback,FetchTableRangesCallback,FetchRangeSlicesCallback,FetchTableStatsCallback,LoadRangeSliceCallback,FetchArchivesCallback,FetchRecordArchivesCallback):
- Using
scan_next_closure->ShardId()(orread_closure->ShardId()in the range-slice flows) when issuing follow-upScanNext/Readcalls preserves the shard affinity of the original request across pagination and multi-step reads.SessionId()/kv_session_id_handling is unchanged, so scan sessions remain intact.- The new
kv_start_key_assignment viastd::string_view(key.Data(), key.Size())inLoadRangeSliceCallbacksimply copies the bytes into astd::stringand does not introduce lifetime issues.This wiring matches the new
shard_idfield inScanRequest/ReadRequestand looks correct end-to-end.Also applies to: 635-701, 819-833, 912-923, 1132-1145, 1239-1252, 1308-1321, 1417-1430, 1443-1456
493-533: Verification confirms the cast is architecturally safe, but the review's refactoring suggestions remain valid.The investigation found that
PartitionBatchCallbackis only ever invoked fromBatchWriteRecordsClosure::Run()(line 1754), which calls the callback withthis(the closure object) as the second parameter. Since both call sites (cpp:427 and cpp:529) register the callback exclusively withBatchWriteRecords, which always creates aBatchWriteRecordsClosure, the static cast toBatchWriteRecordsClosure*is guaranteed to be valid in the current architecture.However, the review's concerns and suggestions remain legitimate:
- The cast lacks null-checking or defensive validation
- The tight coupling to
BatchWriteRecordsClosurecould become a maintenance burden if the callback mechanism is refactored- Storing
data_shard_idinPartitionCallbackDatainstead would decouple the callback from the concrete closure type, improving architectural resilienceeloq_data_store_service/rocksdb_cloud_data_store_factory.h (1)
54-66: Per-shard cloud object_path_ is wired correctly; confirm the new default for empty paths is acceptable.
- Copying
cloud_config_toshard_cloud_configand appending"ds_<shard_id>"(after normalizing the trailing slash) cleanly isolates each shard’s RocksDB instance under its own prefix.- The
RocksDBCloudDataStoreconstructor call matches the updated signature (cloud_config, config, create_if_missing, tx_enable_cache_replacement, shard_id, data_store_service).One thing to verify: previously, an empty
object_path_likely meant “use the bucket root/prefix directly”. Now it is coerced to"rocksdb_cloud_object_path/ds_<shard_id>". If any existing deployments rely on an emptyobject_path_mapping to the old layout, this change will move data into a different prefix. If that’s intentional for the multi-shard rollout, great; otherwise, you may want to preserve the old behavior for empty paths and require explicit configuration for the new layout.Also applies to: 67-75
data_store_service_scanner.h (1)
284-285: Using txservice::total_hash_partitions as HASH_PARTITION_COUNT is a good centralization.Replacing local or conditional definitions with
txservice::total_hash_partitionsensures the hash-partition scanner stays in sync with the global partition-count configuration and removes the previous compile-time branching.eloq_data_store_service/data_store_service.h (3)
190-228: Shard-aware RPC/local APIs look consistentThe added
shard_idparameter andint32_tpartition_idacrossRead,BatchWriteRecords,DeleteRange,ScanNext, andScanClosematch the implementations indata_store_service.cppand the client-side call sites. Routing now consistently uses(partition_id, shard_id)and aligns withinternal_request.handrocksdb_data_store_common.cppusage. No issues here.Also applies to: 256-268, 320-327, 403-417, 440-445
525-529: Good: PrepareShardingError now preserves full topology infoDelegating
PrepareShardingErrortocluster_manager_.PrepareShardingErrorrestores the richer error contract (error type, new cluster config/primary, versioning) instead of a generic message. This keeps client-side sharding/failover logic working.
678-701: Verify shard_id_ initialization for DataShard helpers
DataShard::shard_id_defaults toUINT32_MAXand is used as a guard in bothFetchDSShardStatusandGetDataStore(== shard_idcheck). InStartServiceyou initializedata_store_,scan_iter_cache_, andshard_status_, but the snippet provided doesn’t showshard_id_being set there.If
shard_id_is only assigned in other open paths (e.g.,OpenDataStore/ConnectAndStartDataStore), shards created duringStartServicemight still appear “unbound” to these helpers even thoughdata_store_is valid. Please double-check that every path that successfully opens a shard setsshard_id_consistently, or relax the equality check if it’s no longer needed as an ownership guard.Also applies to: 703-703
data_store_service_client.cpp (9)
151-211: TxConfigsToDssClusterConfig: per-node topology mapping looks correctThe refactored
TxConfigsToDssClusterConfigbuildsnodes_maponce, adds only candidate nodes as shard members, updates the primary viang_leaders, and setsThisNode+ switches the shard toReadWritewhen this node is leader. This is a clean separation of concerns and avoids redundantleader_id == node_idchecks from the earlier version.One small nit: you call
nodes_map.find(leader_id)twice; caching the iterator after the assert would avoid the duplicate lookup, but this is purely cosmetic.
335-341: PutAll: shard routing via GetShardIdByPartitionId – confirm partition_id semantics
PutAllnow routes each partition’s batches using:
partition_state->partition_idset fromKvPartitionIdOf(...)for both hash and range tables, andGetShardIdByPartitionId(partition_state->partition_id, is_range_partitioned)whereis_range_partitioned = !table_name.IsHashPartitioned().This is structurally sound, but correctness hinges on
KvPartitionIdOf(partition_id, /*is_range=*/true/false)returning the same partition id flavor (hashvsrange) thattxservice::Sharder::Map{HashPartition,RangeId}ToBucketIdexpect. IfKvPartitionIdOfalready normalizes to the “KV-partition id” used everywhere else, then this is fine; if it mixes range ids with hash-partition ids, you could misroute to the wrong shard.Please double-check the
KvPartitionIdOfoverloads against theShardermapping to ensure the combination(partition_state->partition_id, is_range_partitioned)always lands on the intended shard.Also applies to: 349-375, 404-420
592-605: Shard-aware metadata and maintenance calls are wired consistentlyAll the catalog/statistics/range/DB maintenance paths now consistently:
- compute a KV partition via
KvPartitionIdOf(...), and- derive
shard_idviaGetShardIdByPartitionId(kv_partition_id, /*is_range=*/false)(ortruewhere appropriate),- then pass both into
Read,ScanNext,BatchWriteRecords, orDeleteRange.This matches the server-side
DataStoreServicesignatures and the new bucket-based shard mapping semantics. I don’t see mismatches betweenis_range_partitionflags and the table’s partitioning mode in these call sites.Also applies to: 619-635, 650-686, 816-821, 870-873, 903-911, 934-949, 973-1005, 1050-1055, 1133-1153, 1384-1408, 1445-1455, 1510-1516, 1618-1661, 1676-1719, 1736-1755, 1760-1787, 1859-1862
2973-2990: Leader/follower hooks: shard–node-group binding assumption
OnLeaderStart,OnLeaderStop, andOnStartFollowingnow conditionally open/close shards on the localDataStoreServicewhenbind_data_shard_with_ng_is true, and they treatng_idas thedata_shard_id(DSSshard id) throughout. They also update the cluster manager’s primary node and shard version and then callSetupConfig+Connect().This is a clean and explicit coupling, but it assumes a 1:1 mapping between node-group id and data shard id. Please confirm that this invariant is guaranteed by the control plane; if it ever changes (e.g., multiple DSS shards per node group), these hooks will need to derive the proper shard set instead of using
ng_iddirectly.Also applies to: 2992-3003, 3014-3063
3080-3085: IsLocalShard is a good abstraction over server-side ownership
IsLocalShardsimply delegates todata_store_service_->IsOwnerOfShard(shard_id), centralizing shard ownership semantics in the server. All local-vs-remote routing branches (Read/DeleteRange/ScanNext/ScanClose/FlushData/DropTable/CreateSnapshotForBackup) use this helper consistently, which keeps the client side free of duplicated status logic.No issues here.
3090-3115: Bucket-based shard mapping & InitBucketsInfo: invariants look soundThe new shard lookup:
- maps
partition_idto abucket_idviatxservice::Sharder::MapRangeIdToBucketIdorMapHashPartitionIdToBucketIddepending onis_range_partition, and- uses
bucket_infos_[bucket_id]->BucketOwner()to pick a shard, validating that the shard id exists indss_shard_ids_.
InitBucketsInfo:
- rejects empty
node_groupsearly,- generates up to 64 virtual node positions per node group using local
std::mt19937RNGs (no shared global state), and- fills
ng_bucket_infosso that everybucket_idin[0, total_range_buckets)is assigned to some group, wrapping to the first virtual node when iterating past the last random position.Given the learnings that
bucket_idis guaranteed< UINT16_MAX, the use ofuint16_there is safe, and the ring-walk logic matches the comment (“first virtual node larger than bucket_id” with wrap-around).A couple of minor notes:
GetShardIdByPartitionIdreturnsUINT32_MAXon a missing bucket; callers that propagate this as ashard_idshould treat it as an error instead of blindly routing (see separate comment onGetOwnerNodeIndexOfShard).- The comment near
dss_shard_ids_andbucket_infos_could explicitly state that these structures are immutable after initialization, justifying unsynchronized reads outsideGetAllDataShards().Overall, the design looks correct.
Also applies to: 3117-3130, 3132-3197
4003-4016: InitTableLastRangePartitionId: safer handling of Sequences partitioningThe updated
InitTableLastRangePartitionIdnow:
- computes
kv_partition_id/data_shard_idusingGetShardIdByPartitionIdonly ifSequences::table_name_is hash-partitioned, and- logs an error and returns
falseotherwise.This removes the previous unconditional
assert(false)and unreachable code, which is a clear improvement for robustness. The write to the sequences KV table then uses(kv_partition_id, data_shard_id)consistently.Looks good.
Also applies to: 4030-4041
3548-3560: Shard-aware local/remote routing in data path looks correctIn
ReadInternal,DeleteRangeInternal,ScanNextInternal,ScanCloseInternal, andBatchWriteRecordsInternalyou now route based onIsLocalShard(shard_id)and threadshard_idthrough to either:
- the local
DataStoreService(Read,DeleteRange,ScanNext,ScanClose,BatchWriteRecords), or- remote RPCs using
GetOwnerNodeIndexOfShard(shard_id).Combined with the earlier changes to client-facing APIs (always passing both
partition_idandshard_id), this yields a coherent per-shard data path.Aside from the missing guard on
GetOwnerNodeIndexOfShardin the remote branches (covered in a separate comment), the routing logic itself looks correct.Also applies to: 3604-3615, 3758-3777, 3825-3833, 4160-4176
4547-4573: Catalog upsert/delete now shard-aware without changing semantics
UpsertCatalogandDeleteCatalognow computeshard_idviaGetShardIdByPartitionId(partition_id, false)(withpartition_idfixed at 0), and pass it alongside the partition intoBatchWriteRecords. This matches how other catalog operations (e.g.,FetchTable,DiscoverAllTableNames) derive shard ids and doesn’t alter the logical behavior—catalogs are still effectively stored in a single shard.No issues here.
Also applies to: 4597-4624
eloq_data_store_service/data_store_service.cpp (1)
1893-1927: Per‑shardOpenDataStorelogging looks good.The new
OpenDataStorehelper cleanly gates onds_ref.shard_status_ == Closed, delegates toConnectAndStartDataStore, and logs success/failure with shard id, current status, and duration. This should be very helpful for diagnosing shard‑level open latency.eloq_data_store_service/internal_request.h (1)
36-68: Shard‑aware write request interface and accounting look consistent.Introducing
GetShardId()onWriteRecordsRequestand plumbing it throughWriteRecordsRpcRequestandWriteRecordsLocalRequest(with a storedshard_id_) gives a clean way to:
- route writes by shard in the store, and
- call
DecreaseWriteReqCount(shard_id)inSetFinishto mirror the per‑shardIncreaseWriteReqCount(shard_id)inDataStoreService.The Clear/Reset implementations correctly initialize
shard_id_(toUINT32_MAXand then to the real ID) and all overrides use the protoshard_idfield or the storedshard_id_consistently.Also applies to: 70-179, 181-327
data_store_service_client_closure.h (5)
439-537: ReadClosure’s shard‑aware routing and accessors look correct.The
ReadClosurechanges:
- Add explicit
partition_id_(int32_t) andshard_id_(uint32_t) members, reset to sentinel values inClear().- Populate both fields in
Reset(...), and use them inPrepareRequestto setpartition_idandshard_idon the proto.- Use
shard_id_when callingUpdateOwnerNodeIndexOfShardon RPC failures and sharding errors.- Expose
PartitionId(),ShardId(), andKey()accessors, which align with how other closures surface this context.This keeps the client’s routing state per‑shard and per‑partition, and correctly ties retry logic to the shard id.
Also applies to: 592-899
1130-1366: DeleteRangeClosure now correctly carries shard_id through request and retry paths.For
DeleteRangeClosure:
Reset(...)stores bothpartition_id_(int32_t) andshard_id_(uint32_t).PrepareRequestsets bothpartition_idandshard_idon the outgoing proto.- On RPC failure,
Run()usesshard_id_withUpdateOwnerNodeIndexOfShard, matching the new per‑shard owner mapping.- New accessors
PartitionId()andShardId()expose this state to callers.The Clear/reset logic keeps shard_id_ in a known sentinel state when the closure returns to the pool, which is good for debugging.
1585-1892: BatchWriteRecordsClosure’s per‑shard state is wired consistently.
BatchWriteRecordsClosurenow:
- Tracks
partition_id_as int32_t andshard_id_as uint32_t, reset to sentinel values inClear()/Reset().- Accepts
shard_idin both overloads ofReset(...)and stores it.- Sets
partition_idandshard_idon the proto inPrepareRequest.- Uses
shard_id_when callingUpdateOwnerNodeIndexOfShardon RPC failures.- Exposes
PartitionId()andShardId()const accessors for external use.This matches the server‑side expectation that batch writes are routed by shard id and aligns retry/sharding updates with the correct shard.
1894-2294: ScanNextClosure correctly threads shard_id and partition_id through scan requests.In
ScanNextClosure:
partition_id_is now int32_t andshard_id_is uint32_t, both reset inClear().Reset(...)stores both andPrepareRequestsetspartition_idandshard_idon theScanRequestproto.- On RPC failures,
Run()usesshard_id_withUpdateOwnerNodeIndexOfShard, so sharding metadata is updated for the correct shard.- New accessors
PartitionId()andShardId()provide this context to callers, which is useful for higher‑level routing/metrics.Session id and search‑condition handling remain unchanged, so scan semantics should be preserved while becoming shard‑aware.
2982-3022: FetchArchivesCallbackData and CreateSnapshotForBackupCallbackData type adjustments align with upstream changes.
FetchArchivesCallbackDatanow storespartition_id_asint32_t, andCreateSnapshotForBackupCallbackDataremains unchanged except for minor formatting. This lines up with the int32_t partition id change in internal_request.h and maintains consistent typing across client and server.
| uint32_t DataStoreServiceClient::GetOwnerNodeIndexOfShard( | ||
| uint32_t shard_id) const | ||
| { | ||
| if (shard_id >= dss_shards_.size()) | ||
| { | ||
| LOG(ERROR) << "shard_id " << shard_id << " exceeds array bounds"; | ||
| return UINT32_MAX; |
There was a problem hiding this comment.
Critical: guard GetOwnerNodeIndexOfShard error paths before indexing dss_nodes_
GetOwnerNodeIndexOfShard and UpdateOwnerNodeIndexOfShard now log an error and return an out-of-band value when shard_id >= dss_shards_.size():
if (shard_id >= dss_shards_.size())
{
LOG(ERROR) << "shard_id " << shard_id << " exceeds array bounds";
return UINT32_MAX; // or false
}However, all the call sites assume the returned node index is valid and immediately index into dss_nodes_:
ReadInternal(remote path)DeleteRangeInternalFlushDataInternalDropTableInternalScanNextInternalScanCloseInternalCreateSnapshotForBackupInternal(remote branch)
If shard_id is ever invalid or if GetShardIdByPartitionId falls back to UINT32_MAX, these paths will end up doing dss_nodes_[UINT32_MAX].Channel(), which is undefined behavior and likely a crash. Previously, the LOG(FATAL) made this fail-fast; after downgrading to LOG(ERROR) you need an explicit guard.
A minimal pattern for each remote branch would be:
- uint32_t node_index = GetOwnerNodeIndexOfShard(read_closure->ShardId());
+ uint32_t node_index = GetOwnerNodeIndexOfShard(read_closure->ShardId());
+ if (node_index == UINT32_MAX)
+ {
+ // Fill error into result/closure and complete the RPC instead of crashing.
+ read_closure->PrepareError(
+ ::EloqDS::remote::DataStoreError::REQUESTED_NODE_NOT_OWNER,
+ "Invalid shard_id or missing owner node");
+ read_closure->Run();
+ return;
+ }
read_closure->SetRemoteNodeIndex(node_index);
auto *channel = dss_nodes_[node_index].Channel();and similarly for the other closures (DeleteRangeClosure, FlushDataClosure, DropTableClosure, ScanNextClosure, CreateSnapshotForBackupClosure).
This preserves your softer error-handling while avoiding out-of-bounds access when the shard topology is inconsistent or the mapping fails.
Also applies to: 3214-3218, 3330-3334, 3366-3367, 3548-3574, 3604-3630, 3662-3673, 3707-3717, 3781-3792, 3837-3848, 2918-2930
🤖 Prompt for AI Agents
In data_store_service_client.cpp around lines 3199-3205 (and similarly at
3214-3218, 3330-3334, 3366-3367, 3548-3574, 3604-3630, 3662-3673, 3707-3717,
3781-3792, 3837-3848, and 2918-2930) the function now returns UINT32_MAX on
invalid shard_id but callers immediately index dss_nodes_ with that value; add
an explicit guard after calling GetOwnerNodeIndexOfShard (and after
GetShardIdByPartitionId where it can return UINT32_MAX) to detect the
out-of-band value, log an error, set the RPC/closure to failed (or invoke the
appropriate error callback), and return early from the remote branch so you
never do dss_nodes_[owner_idx].Channel(); apply this pattern to each listed
closure/remote path to prevent out-of-bounds access while preserving the softer
error handling.
| for (auto &[shard_id, shard] : all_shards) | ||
| { | ||
| if (shard_id >= dss_shards_.size()) | ||
| { | ||
| LOG(FATAL) << "Shard id " << shard_id | ||
| << " is out of range, should expand the hard-coded " | ||
| "dss_shards_ size."; | ||
| } | ||
| uint32_t node_idx = FindFreeNodeIndex(); | ||
| auto &node_ref = dss_nodes_[node_idx]; | ||
| node_ref.Reset(shard.nodes_[0].host_name_, | ||
| shard.nodes_[0].port_, | ||
| shard.version_); | ||
| dss_shards_[shard_id].store(shard_id); | ||
| dss_shards_[shard_id].store(node_idx); | ||
| dss_shard_ids_.insert(shard_id); | ||
| } | ||
|
|
||
| // init bucket infos | ||
| InitBucketsInfo(dss_shard_ids_, 0, bucket_infos_); |
There was a problem hiding this comment.
🧩 Analysis chain
Good: Fatal error on shard_id bounds violation during construction.
The constructor now validates shard_id < dss_shards_.size() and logs a fatal error if exceeded. This prevents undefined behavior during initialization. Combined with bounds checking in runtime accessors like GetOwnerNodeIndexOfShard (see data_store_service_client.cpp:3198-3208), the array access pattern is safer.
However, as noted in past reviews, ensure config validation limits shard_count to ≤ 1000 during parsing to prevent invalid configurations from reaching this point. Verify that data_store_service_config.cpp enforces this constraint.
🏁 Script executed:
#!/bin/bash
# Verify shard_count validation in config parsing
echo "=== Checking config validation for shard_count bounds ==="
rg -n "shard_count.*1000|shard_count.*>.*999|MAX.*SHARD" eloq_data_store_service/data_store_service_config.cpp -C 3
echo ""
echo "=== Checking for shard_count assignment/validation ==="
ast-grep --pattern 'shard_count_ = $_' Length of output: 595
I need to examine the actual validation logic around these shard_count assignments to determine if the review comment's concern is valid.
🏁 Script executed:
#!/bin/bash
# Examine context around shard_count assignment from config file
echo "=== Context around shard_count assignment (lines 65-75) ==="
sed -n '65,75p' eloq_data_store_service/data_store_service_config.cpp
echo ""
echo "=== Context around shard_count assignment from shards_map (lines 180-190) ==="
sed -n '180,190p' eloq_data_store_service/data_store_service_config.cpp
echo ""
echo "=== Searching for validation functions or checks on shard_count ==="
rg -n "shard_count" eloq_data_store_service/data_store_service_config.cpp | head -20
echo ""
echo "=== Looking for dss_shards_ initialization to determine its size ==="
rg -n "dss_shards_.*size\|dss_shards_\s*\(" eloq_data_store_service/data_store_service_client.h -C 2Length of output: 1391
🏁 Script executed:
#!/bin/bash
# Examine context around shard_count assignments
echo "=== Context around config file read (lines 65-75) ==="
sed -n '65,75p' eloq_data_store_service/data_store_service_config.cpp
echo ""
echo "=== Context around shards_map assignment (lines 180-190) ==="
sed -n '180,190p' eloq_data_store_service/data_store_service_config.cpp
echo ""
echo "=== Search for any validation or bounds checks on shard_count ==="
rg -n "shard_count.*[<>]|shard_count.*validate|shard_count.*check" eloq_data_store_service/ -C 2
echo ""
echo "=== Looking for MAX_SHARD or similar constants ==="
rg -n "MAX.*SHARD|SHARD.*MAX|1000" eloq_data_store_service/data_store_service_client.h -B 2 -A 2
echo ""
echo "=== Check dss_shards_ declaration and size ==="
rg -n "dss_shards_\s*\[|std::array.*dss_shards_" eloq_data_store_service/data_store_service_client.h -C 1Length of output: 1517
I need to locate the correct path for the header file and dss_shards_ declaration. Let me search for the file.
🏁 Script executed:
#!/bin/bash
# Find the correct path for data_store_service_client.h
echo "=== Finding data_store_service_client.h ==="
fd -t f "data_store_service_client.h"
echo ""
echo "=== Finding dss_shards_ declaration ==="
rg -n "dss_shards_\s*\[|std::array.*dss_shards_|dss_shards_\s*;" -C 2
echo ""
echo "=== Looking for array size constants (1000, MAX) ==="
rg -n "1000|MAX_SHARD" . -C 1 | head -30Length of output: 4792
Config validation for shard_count is missing—add bounds check during parsing.
Verification confirms the concern: data_store_service_config.cpp (lines 69-70) reads shard_count from the config file with no upper bounds validation. While the constructor adds a runtime guard (shard_id < dss_shards_.size()), dss_shards_ is fixed at 1000 elements (line 749 of data_store_service_client.h), and invalid configs can be parsed and stored without detection.
Add validation during config parsing to reject shard_count > 1000 before it reaches the client, ensuring invalid configurations fail fast at parse time rather than at runtime.
🤖 Prompt for AI Agents
In data_store_service_config.cpp around lines 69-70, add validation to reject
parsed shard_count values greater than the hard limit (1000) before storing them
in the config: after reading shard_count, check if shard_count == 0 or
shard_count > 1000 and return/log a clear parsing/configuration error (or throw)
so the config load fails fast; update any config parsing return path to
propagate the failure to callers so invalid configs are not accepted and later
cause runtime FATALs in data_store_service_client.h.
data store service support multi shard fix error on leader changed change total_range_buckets and total_hash_partitions to 1024 get shard_id base kv_partition_id update shards object path on s3, "object_path/ds_0, object_path/ds_1 ..." Don't save the bucket_id as the prefix of key in datastore. Remove the compile option 'USE_ONE_ELOQDSS_PARTITION_ENABLED'. Add log to print open rocksdb cloud time Replace srand/rand with std::mt19937 to calcualte the distribution of buckets fix RocksdbCloudDataStore shutdown change TTLWrapperCache to unique_ptr on data_shard add arg bind_data_shard_with_ng don't set data_store_ to null on shutdown
511ee98 to
dab3aa8
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
eloq_data_store_service/data_store_service.cpp (1)
352-385: Race betweenConnectAndStartDataStoreandProcessSyncFileCachearoundis_file_sync_running_
ConnectAndStartDataStorewaits for file cache sync to finish before opening a shard:while (shard_ref.is_file_sync_running_.load(std::memory_order_relaxed)) { bthread_usleep(10000); }but
ProcessSyncFileCacheonly sets this flag after some initial work:// after building directory iterator ds_ref.is_file_sync_running_.store(true, std::memory_order_release); for (const auto &entry : dir_ite) { if (ds_ref.shard_status_.load(...) != DSShardStatus::Closed) { ... ds_ref.is_file_sync_running_.store(false, std::memory_order_release); ... return; } ... }A timing window exists where:
SyncFileCacheRPC has been accepted for a Closed shard butProcessSyncFileCachehas not yet setis_file_sync_running_ = true;ConnectAndStartDataStoreseesis_file_sync_running_ == false, skips the wait, and starts opening the RocksDB instance;ProcessSyncFileCachethen starts deleting/downloading SST files indb_pathconcurrently with the DB open.This can lead to undefined behavior or corruption at the filesystem/DB level.
Consider tightening the contract by:
- Setting
is_file_sync_running_totruebefore queuing work (inSyncFileCache) or at the very start ofProcessSyncFileCache, and- Clearing it in a single RAII guard at all exit paths (success and error), e.g.:
void DataStoreService::ProcessSyncFileCache(SyncFileCacheLocalRequest *req, uint32_t shard_id) { auto &ds_ref = data_shards_.at(shard_id); ds_ref.is_file_sync_running_.store(true, std::memory_order_release); auto guard = gsl::finally([&ds_ref] { ds_ref.is_file_sync_running_.store(false, std::memory_order_release); }); ... }and changing
SyncFileCacheto passshard_idexplicitly. This ensuresConnectAndStartDataStorenever races with an in‑progress file sync.Also applies to: 1405-1477
♻️ Duplicate comments (3)
eloq_data_store_service/ds_request.proto (1)
135-149: Fix BatchWriteRecordsRequest field renumbering to preserve wire compatibility.
BatchWriteRecordsRequestcurrently renumbers existing fields:
skip_wal: 3 → 4items: 4 → 5- New
shard_id: 3Changing field numbers of existing fields breaks compatibility with any deployed code that still uses the old schema. Protobuf field numbers are part of the serialization contract and must be stable.
You should keep the original numbers for existing fields and assign
shard_ida new, unused tag (e.g., 6), for example:string kv_table_name = 1; int32 partition_id = 2; bool skip_wal = 3; // keep original tag repeated Item items = 4; // keep original tag uint32 shard_id = 6; // new field numberThen wire all new code to use the new
shard_idfield.data_store_service_scanner.h (1)
235-238: Align GetPartitionId() return type with the int32_t member.
SinglePartitionScannernow stores:int32_t partition_id_;but
GetPartitionId()still returnsuint32_t. This causes an implicit signed→unsigned conversion on every call and is inconsistent with:
- The proto definitions (
int32 partition_id), and- Other interfaces like
internal_request.hthat exposeint32_t partition_id.Even though current partition IDs are non‑negative and within range, keeping the types aligned avoids subtle bugs and makes the API clearer.
Recommend changing the getter to:
int32_t GetPartitionId() { return partition_id_; }(and updating any callers that depend on an unsigned return type).
Also applies to: 264-267
eloq_data_store_service/data_store_service.h (1)
603-611: Add bounds checks before indexingdata_shards_to avoid out-of-range abortsThe helper methods:
FetchDSShardStatus(uint32_t shard_id),IncreaseWriteReqCount(uint32_t shard_id),DecreaseWriteReqCount(uint32_t shard_id),IsOwnerOfShard(uint32_t shard_id) const,GetDataStore(uint32_t shard_id),all use
data_shards_.at(shard_id)without validating thatshard_id < data_shards_.size(). Since many RPC handlers takeshard_iddirectly from the request and call these helpers, a malformed or misrouted request with a large shard id will triggerstd::array::atout-of-range and likely terminate the process.To harden the service, add bounds checks and treat out-of-range shard ids as “closed / not owner”, e.g.:
DSShardStatus FetchDSShardStatus(uint32_t shard_id) { if (shard_id >= data_shards_.size()) { return DSShardStatus::Closed; } const auto &ds_ref = data_shards_[shard_id]; if (ds_ref.shard_id_ == shard_id) { return ds_ref.shard_status_.load(std::memory_order_acquire); } return DSShardStatus::Closed; } void IncreaseWriteReqCount(uint32_t shard_id) { if (shard_id >= data_shards_.size()) { return; } data_shards_[shard_id].ongoing_write_requests_.fetch_add( 1, std::memory_order_release); } // Apply similar guards in DecreaseWriteReqCount, IsOwnerOfShard, and GetDataStore.This preserves existing semantics for valid shard ids while making the service robust against bad input.
Also applies to: 624-634, 636-641, 652-662
🧹 Nitpick comments (6)
eloq_data_store_service/rocksdb_cloud_data_store_factory.h (1)
54-66: Shard-specificobject_path_handling changes behavior; double-check defaults and reportingThe per‑shard copy/normalization looks good, but there are two behavioral points worth confirming:
Line 58: when
cloud_config_.object_path_is empty, you now force"rocksdb_cloud_object_path/"as the base prefix before appendingds_<shard_id>. If existing deployments relied on an emptyobject_path_meaning “bucket root” (or some other implicit default), this will redirect new shards into a different S3 subprefix and may not see existing data. If backward compatibility is important, consider either:
- Keeping the empty‑string semantics and only appending
ds_<shard_id>(e.g.,"ds_0","ds_1", …), or- Making the new default prefix an explicit config option rather than hard‑coding it here.
Lines 100‑103:
GetS3ObjectPath()still exposes the unshardedcloud_config_.object_path_, whileRocksDBCloudDataStoreinstances now use the per‑shardshard_cloud_config.object_path_(which includes/ds_<shard_id>). If any callers assumeGetS3ObjectPath()reflects the actual DB path, this discrepancy could be confusing. It may be worth either documenting that this returns the base (unsharded) prefix, or updating it to make the sharding semantics explicit where it’s used.Also applies to: 68-68, 100-103
rocksdb_handler.cpp (1)
2105-2129: ng_id parameter currently unused in RocksDBHandler leader/follower callbacks
OnLeaderStart(uint32_t ng_id, ...)andOnStartFollowing(uint32_t ng_id, ...)correctly adopt the new interface but ignoreng_idin their bodies. If this is only a placeholder for future sharding logic, consider explicitly marking it unused (e.g.,/*ng_id*/or[[maybe_unused]]) to avoid compiler warnings in stricter builds.Also applies to: 2174-2199
eloq_data_store_service/thread_worker_pool.cpp (1)
28-32: Re-init semantics are reasonable; consider explicit work-queue clearing policy.The new
Initialize()plus the updatedShutdown()give you a reusable pool object:
- Ctor immediately calls
Initialize(), so first-use behavior is unchanged.Shutdown()now joins all workers, clearsworkers_, and resetsshutdown_indicator_tofalse, allowingInitialize()to spin up a fresh set of threads on the same pool instance.Two points to be explicit about:
Initialize()spin-waits whileshutdown_indicator_is true. This is fine under the intended usage (no concurrentInitialize()/Shutdown()), but worth documenting as non–thread safe for concurrent callers.
Shutdown()does not clearwork_queue_. Any unprocessed work present whenshutdown_indicator_is set will remain in the deque, and after a subsequentInitialize()those leftovers will be picked up by the new workers. If the intent is “drain then forget” vs. “drain across restarts”, you may want to either:
- explicitly clear
work_queue_inShutdown()under the same mutex, or- add a comment that this cross-restart delivery is intentional.
Given the current usage in
RocksDBDataStoreCommon::Initialize(), this is more of a behavioral clarification than a bug.Also applies to: 34-84, 104-122
eloq_data_store_service/rocksdb_data_store_common.cpp (2)
274-283: Query worker pool reuse and datastore Shutdown sequencing look soundThe new reuse path in
Initialize()(re‑initializing an existingquery_worker_pool_instead of always recreating it) plus the explicitShutdown()that stops the pool and then erases scan iterators align well with the per‑shard open/close flow driven byConnectAndStartDataStoreandDataShard::ShutDown. This should allow clean reopen of a shard without leaking threads or leaving stale iterators behind.If
ThreadWorkerPool::Initialize()has any preconditions afterShutdown(), it’d be worth documenting them; otherwise this pattern looks fine as‑is.Also applies to: 287-301
436-459: Guard against negative partition ids inReadwhen casting touint32_t
Read()now obtains the partition viaint32_t GetPartitionId()(through the interface) and immediately stores it in auint32_t:uint32_t partition_id = req->GetPartitionId();If a caller ever passes a negative
partition_id, this will silently wrap to a large unsigned value and produce an unexpected key prefix.Consider defensively asserting or clamping before the cast, e.g.:
int32_t part = req->GetPartitionId(); assert(part >= 0); uint32_t partition_id = static_cast<uint32_t>(part);or return a
DB_NOT_OPEN/ argument error if the invariantpartition_id >= 0is not guaranteed elsewhere.eloq_data_store_service/internal_request.h (1)
471-655: Read/Scan/CreateSnapshot requests now correctly carry shard id; only minor robustness nitThe expanded interfaces:
ReadRequestwithGetKey(),GetPartitionId(), andGetShardId(),ScanRequestwithGetPartitionId()andGetShardId(),CreateSnapshotForBackupRequestwithGetShardId(),and their implementations in the RPC and local variants (
ReadRpcRequest/ReadLocalRequest,ScanRpcRequest/ScanLocalRequest,CreateSnapshotForBackupRpcRequest/CreateSnapshotForBackupLocalRequest) are consistent with:
ds_request.protofield usage (key_str,partition_id,shard_id, etc.),- The datastore APIs (
RocksDBDataStoreCommon::Read,ScanNext,ScanClose, snapshot methods),- The per-shard write-counting logic (snapshot
SetFinishusesDecreaseWriteReqCount(shard_id)).One small optional improvement:
ScanLocalRequest::GetSearchConditionsassumessearch_conditions_is non-null, whereasGetSearchConditionsSize()already guards on null. To make it self-contained and future-proof, you could add:if (!search_conditions_) { return nullptr; }at the top of
GetSearchConditions.Overall, the shard-aware request interfaces look correct.
Also applies to: 663-863, 865-1134, 1136-1518, 1520-1682
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (22)
data_store_service_client.cpp(102 hunks)data_store_service_client.h(12 hunks)data_store_service_client_closure.cpp(17 hunks)data_store_service_client_closure.h(34 hunks)data_store_service_scanner.cpp(3 hunks)data_store_service_scanner.h(4 hunks)eloq_data_store_service/CMakeLists.txt(0 hunks)eloq_data_store_service/data_store_service.cpp(54 hunks)eloq_data_store_service/data_store_service.h(14 hunks)eloq_data_store_service/data_store_service_util.h(1 hunks)eloq_data_store_service/ds_request.proto(5 hunks)eloq_data_store_service/internal_request.h(55 hunks)eloq_data_store_service/main.cpp(1 hunks)eloq_data_store_service/rocksdb_cloud_data_store.cpp(5 hunks)eloq_data_store_service/rocksdb_cloud_data_store_factory.h(1 hunks)eloq_data_store_service/rocksdb_data_store.cpp(1 hunks)eloq_data_store_service/rocksdb_data_store_common.cpp(3 hunks)eloq_data_store_service/rocksdb_data_store_common.h(1 hunks)eloq_data_store_service/thread_worker_pool.cpp(2 hunks)eloq_data_store_service/thread_worker_pool.h(1 hunks)rocksdb_handler.cpp(2 hunks)rocksdb_handler.h(1 hunks)
💤 Files with no reviewable changes (1)
- eloq_data_store_service/CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (2)
- eloq_data_store_service/rocksdb_cloud_data_store.cpp
- eloq_data_store_service/main.cpp
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-17T06:36:11.137Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 80
File: rocksdb_handler.cpp:1226-1335
Timestamp: 2025-09-17T06:36:11.137Z
Learning: In the EloqData store_handler codebase, the maximum bucket ID is guaranteed to be smaller than UINT16_MAX (65535), so bucket_id + 1 operations do not risk integer overflow when used in FetchBucketData methods.
Applied to files:
data_store_service_client_closure.cppdata_store_service_client.hdata_store_service_client.cpp
📚 Learning: 2025-09-17T11:08:35.322Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 83
File: data_store_service_client_closure.cpp:408-448
Timestamp: 2025-09-17T11:08:35.322Z
Learning: In the DataStoreServiceClient batch write system, the global coordinator (SyncPutAllData) manages concurrency at the partition level, not at the individual batch level. The global unfinished_request_cnt_ is only decremented when an entire partition is fully completed or failed, not after each individual batch within a partition.
Applied to files:
data_store_service_client_closure.cppeloq_data_store_service/data_store_service.cppdata_store_service_client.cpp
🧬 Code graph analysis (17)
data_store_service_client_closure.cpp (1)
data_store_service_client.cpp (1)
callback_data(2576-2582)
data_store_service_scanner.h (1)
eloq_data_store_service/internal_request.h (9)
string_view(106-109)string_view(126-129)string_view(273-276)string_view(606-614)string_view(621-629)string_view(811-819)partition_id_(647-647)partition_id_(856-856)partition_id_(1504-1504)
eloq_data_store_service/rocksdb_data_store_common.h (5)
eloq_data_store_service/rocksdb_cloud_data_store.cpp (2)
Shutdown(167-188)Shutdown(167-167)eloq_data_store_service/rocksdb_data_store.cpp (2)
Shutdown(69-83)Shutdown(69-69)eloq_data_store_service/rocksdb_data_store_common.cpp (2)
Shutdown(287-301)Shutdown(287-287)eloq_data_store_service/thread_worker_pool.cpp (2)
Shutdown(104-122)Shutdown(104-104)rocksdb_handler.cpp (4)
Shutdown(2324-2359)Shutdown(2324-2324)Shutdown(3405-3437)Shutdown(3405-3405)
eloq_data_store_service/rocksdb_cloud_data_store_factory.h (2)
eloq_data_store_service/rocksdb_cloud_data_store.cpp (2)
RocksDBCloudDataStore(140-157)RocksDBCloudDataStore(159-165)eloq_data_store_service/rocksdb_cloud_data_store.h (1)
RocksDBCloudDataStore(38-175)
rocksdb_handler.h (2)
data_store_service_client.cpp (4)
OnLeaderStart(2973-2990)OnLeaderStart(2973-2974)OnStartFollowing(3014-3063)OnStartFollowing(3014-3018)rocksdb_handler.cpp (4)
OnLeaderStart(2105-2129)OnLeaderStart(2105-2105)OnStartFollowing(2174-2199)OnStartFollowing(2174-2178)
rocksdb_handler.cpp (1)
data_store_service_client.cpp (4)
OnLeaderStart(2973-2990)OnLeaderStart(2973-2974)OnStartFollowing(3014-3063)OnStartFollowing(3014-3018)
eloq_data_store_service/rocksdb_data_store_common.cpp (4)
eloq_data_store_service/rocksdb_cloud_data_store.cpp (2)
Shutdown(167-188)Shutdown(167-167)eloq_data_store_service/rocksdb_data_store.cpp (2)
Shutdown(69-83)Shutdown(69-69)eloq_data_store_service/thread_worker_pool.cpp (2)
Shutdown(104-122)Shutdown(104-104)eloq_data_store_service/data_store_service_util.h (1)
shard_id_(123-123)
eloq_data_store_service/thread_worker_pool.h (2)
eloq_data_store_service/rocksdb_data_store_common.cpp (2)
Initialize(163-285)Initialize(163-163)eloq_data_store_service/thread_worker_pool.cpp (2)
Initialize(34-84)Initialize(34-34)
eloq_data_store_service/internal_request.h (2)
eloq_data_store_service/data_store_service_config.h (1)
uint32_t(107-110)data_store_service_client_closure.h (31)
Reset(81-85)Reset(144-153)Reset(210-216)Reset(269-274)Reset(385-392)Reset(600-619)Reset(933-948)Reset(1154-1177)Reset(1393-1408)Reset(1598-1624)Reset(1627-1658)Reset(1660-1694)Reset(1933-2251)Reset(2328-2347)Reset(2606-2612)Reset(3002-3010)vector(1078-1081)vector(1121-1128)vector(2253-2256)string(804-809)string(2173-2178)string_view(311-368)string_view(752-755)string_view(767-770)string_view(792-797)string_view(1291-1294)string_view(1533-1536)string_view(2133-2136)string_view(2138-2141)string_view(2241-2246)string_view(2453-2456)
data_store_service_scanner.cpp (1)
data_store_service_scanner.h (1)
data_shard_id_(266-266)
eloq_data_store_service/data_store_service.cpp (2)
eloq_data_store_service/data_store_service.h (4)
IsOwnerOfShard(636-641)PrepareShardingError(546-550)IncreaseWriteReqCount(624-628)DecreaseWriteReqCount(630-634)data_store_service_client.cpp (10)
Read(1087-1096)Read(1087-1092)Read(3528-3544)Read(3528-3533)DeleteRange(3577-3599)DeleteRange(3577-3584)ScanNext(3720-3753)ScanNext(3720-3734)ScanClose(3795-3819)ScanClose(3795-3800)
data_store_service_client_closure.h (2)
eloq_data_store_service/internal_request.h (13)
partition_id_(647-647)partition_id_(856-856)partition_id_(1504-1504)string_view(106-109)string_view(126-129)string_view(273-276)string_view(606-614)string_view(621-629)string_view(811-819)string_view(975-983)string_view(1110-1118)string_view(1406-1414)string_view(1421-1429)eloq_data_store_service/data_store_service_util.h (1)
shard_id_(123-123)
eloq_data_store_service/thread_worker_pool.cpp (1)
eloq_data_store_service/rocksdb_data_store_common.cpp (2)
Initialize(163-285)Initialize(163-163)
eloq_data_store_service/data_store_service.h (5)
eloq_data_store_service/data_store_service.cpp (12)
StartService(265-350)StartService(265-265)Read(443-476)Read(443-446)Read(478-521)Read(478-486)ScanClose(1039-1072)ScanClose(1039-1042)ScanClose(1074-1106)ScanClose(1074-1079)TTLWrapperCache(78-83)TTLWrapperCache(85-94)data_store_service_client.cpp (6)
Read(1087-1096)Read(1087-1092)Read(3528-3544)Read(3528-3533)ScanClose(3795-3819)ScanClose(3795-3800)eloq_data_store_service/rocksdb_data_store_common.cpp (5)
Read(436-497)Read(436-436)key(852-852)ScanClose(1067-1091)ScanClose(1067-1067)eloq_data_store_service/data_store_service_config.cpp (6)
PrepareShardingError(918-962)PrepareShardingError(918-919)IsOwnerOfShard(286-294)IsOwnerOfShard(286-286)IsOwnerOfShard(691-702)IsOwnerOfShard(691-692)eloq_data_store_service/data_store.h (1)
DataStore(42-143)
data_store_service_client.h (1)
data_store_service_client.cpp (22)
FindFreeNodeIndex(3269-3288)FindFreeNodeIndex(3269-3269)InitBucketsInfo(3132-3197)InitBucketsInfo(3132-3136)OnLeaderStart(2973-2990)OnLeaderStart(2973-2974)OnLeaderStop(2992-3005)OnLeaderStop(2992-2992)OnStartFollowing(3014-3063)OnStartFollowing(3014-3018)GetShardIdByPartitionId(3090-3115)GetShardIdByPartitionId(3090-3091)GetOwnerNodeIndexOfShard(3199-3209)GetOwnerNodeIndexOfShard(3199-3200)GetAllDataShards(3117-3130)GetAllDataShards(3117-3117)UpdateOwnerNodeIndexOfShard(3211-3267)UpdateOwnerNodeIndexOfShard(3211-3212)HandleShardingError(3290-3323)HandleShardingError(3290-3291)UpgradeShardVersion(3325-3372)UpgradeShardVersion(3325-3328)
eloq_data_store_service/rocksdb_data_store.cpp (3)
eloq_data_store_service/rocksdb_cloud_data_store.cpp (2)
Shutdown(167-188)Shutdown(167-167)eloq_data_store_service/rocksdb_data_store_common.cpp (2)
Shutdown(287-301)Shutdown(287-287)eloq_data_store_service/thread_worker_pool.cpp (2)
Shutdown(104-122)Shutdown(104-104)
data_store_service_client.cpp (2)
data_store_service_client.h (4)
TxPort2DssPort(134-137)need_bootstrap_(688-859)KvPartitionIdOf(629-634)bind_data_shard_with_ng_(689-813)eloq_data_store_service/data_store_service.cpp (16)
BatchWriteRecords(897-945)BatchWriteRecords(897-901)BatchWriteRecords(1178-1243)BatchWriteRecords(1178-1191)Read(443-476)Read(443-446)Read(478-521)Read(478-486)ScanNext(947-1003)ScanNext(947-962)ScanNext(1005-1037)ScanNext(1005-1008)DeleteRange(613-660)DeleteRange(613-617)DeleteRange(662-715)DeleteRange(662-669)
🪛 Clang (14.0.6)
eloq_data_store_service/data_store_service_util.h
[error] 24-24: 'atomic' file not found
(clang-diagnostic-error)
🔇 Additional comments (28)
eloq_data_store_service/thread_worker_pool.h (1)
42-42: Initialize() addition matches reusable pool designExposing
Initialize()publicly is consistent withRocksDBDataStoreCommon::Initialize()reusing an existingThreadWorkerPoolafterShutdown(). Given the cpp implementation only starts threads whenworkers_.empty()andshutdown_indicator_is false, repeatedInitialize()/Shutdown()cycles look safe and idempotent.eloq_data_store_service/rocksdb_data_store_common.h (1)
165-166: Common Shutdown() centralizes worker‑pool and scan‑iter cleanupAdding
RocksDBDataStoreCommon::Shutdown()to stopquery_worker_pool_and callForceEraseScanIters(shard_id_)cleanly separates common lifecycle from DB‑specific teardown done in derived classes. GivenThreadWorkerPool::Shutdown()is idempotent and the pointer is kept for reuse viaInitialize(), this override looks correct.rocksdb_handler.h (1)
531-537: Handler interface updated consistently to be ng‑awareThe added
ng_idparameter inOnLeaderStart/OnStartFollowingmatches the updatedDataStoreHandler/DataStoreServiceClientinterfaces, keeping RocksDBHandler aligned with the new per‑NG callback contract. No further changes needed here.data_store_service_client_closure.h (6)
499-505: Archive read callback now uses signed partition_id consistentlySwitching
ReadBaseForArchiveCallbackData::AddResultandpartition_id_toint32_taligns this callback with the rest of the codebase (internal requests and proto using signed partition IDs). The change is local and only widens the accepted domain; looks safe.Also applies to: 532-537
600-619: ReadClosure shard‑aware routing and idempotent reset look correct
Reset(...)now capturesint32_t partition_id_,uint32_t shard_id_, andstd::string_view key_, andPrepareRequestsets bothpartition_idandshard_idpluskey_stron the RPC.- Shard re‑routing on transport failure now correctly calls
UpdateOwnerNodeIndexOfShard(shard_id_, remote_node_index_, ...), matching the new shard‑based ownership model.Clear()initializingpartition_id_toINT32_MAXandshard_id_toUINT32_MAXgives a clear “invalid” sentinel for debugging and reuse without impacting normal flows.Overall, the refactor cleanly separates partition vs shard and uses shard_id consistently in retry logic.
Also applies to: 631-639, 661-663, 688-689, 757-770, 886-890
1154-1177: DeleteRangeClosure correctly propagates shard_id into RPC and retriesThe updated
Reset(...)signature and state storingint32_t partition_id_plusuint32_t shard_id_, and wiringshard_id_intoDeleteRangeRequest::set_shard_id(...)andUpdateOwnerNodeIndexOfShard(shard_id_, ...), bring DeleteRange into line with the shard‑centric routing model. The newShardId()accessor is also useful for local handling paths. No functional issues spotted.Also applies to: 1197-1201, 1226-1229, 1296-1305, 1356-1358
1605-1607: BatchWriteRecordsClosure shard‑aware write path looks consistent
- Both single‑record and batched
Reset(...)overloads now accept and store auint32_t shard_id, used inBatchWriteRecordsRequest::set_shard_id(shard_id_).- Retry on transport failure updates ownership via
UpdateOwnerNodeIndexOfShard(shard_id_, remote_node_index_, ...), matching the other closures.Clear()usesINT32_MAX/UINT32_MAXsentinels and the newPartitionId()/ShardId()const accessors expose the stored routing context cleanly.The changes are cohesive with the rest of the shard‑aware client logic.
Also applies to: 1629-1631, 1660-1664, 1675-1681, 1714-1716, 1777-1778, 1850-1858, 1868-1871
1913-1915: ScanNextClosure: shard_id threading and retry behavior are wired correctly
Reset(...)now takesint32_t partition_idanduint32_t shard_idand stores them;PrepareRequestwrites both fields intoScanRequest(partition_id + shard_id).- On RPC/channel errors,
UpdateOwnerNodeIndexOfShard(shard_id_, remote_node_index_, ...)plus retry mirrors the pattern used in Read/Delete/Write.- Accessors
PartitionId()/ShardId()keep callers shard‑aware for local scans as well.Search condition handling and session_id logic remain unchanged; only routing context has been extended, which looks correct.
Also applies to: 1935-1938, 1956-1958, 2008-2009, 2055-2056, 2123-2126, 2128-2131, 2275-2277
2931-2950: FetchArchivesCallbackData partition_id type now matches other components
FetchArchivesCallbackDatanow storespartition_id_asint32_t, matching other request/closure/types. Since this value is only forwarded and compared elsewhere, the signedness change should be a no‑op behaviorally and simplifies interop with updated protos and internal requests.eloq_data_store_service/data_store_service_util.h (1)
24-29: Additional standard headers are fine; static‑analysis error is likely tooling‑relatedIncluding
<atomic>,<map>,<memory>,<set>, and<unordered_map>here is standard practice and prepares this utility header for richer data structures/state. The reported'atomic' file not foundfrom clang static analysis is almost certainly due to how the tool is invoked (e.g., wrong language mode or include paths) rather than an issue with these includes themselves.eloq_data_store_service/rocksdb_data_store.cpp (1)
69-83: Revised Shutdown() ordering cleanly separates common vs DB teardownCalling
RocksDBDataStoreCommon::Shutdown()first to stop the query worker pool and erase scan iterators, then takingdb_mux_and closing/deletingdb_, gives a sensible and race‑resistant shutdown sequence: no new work can be enqueued once the pool is stopped, and all DB access is serialized during close. The destructor’s conditionalShutdown()(which may run after an explicit Shutdown) is safe, since both the worker‑pool Shutdown and DB close paths are effectively idempotent after the first call.data_store_service_scanner.cpp (2)
64-81: Shard-aware ScanNext/ScanClose wiring looks consistent.
SinglePartitionScannernow threadsdata_shard_id_intoclient->ScanNextandclient->ScanClose, which matches the new shard-aware client surface and the addeddata_shard_id_member on the scanner. GivenResetalways setsdata_shard_id_before any fetch/close, these calls look safe and correct.Also applies to: 102-115
413-423: Confirm partition_id → shard_id mapping correctness in Init().Here
part_cnt(0…HASH_PARTITION_COUNT-1) is:
- cast to
int32_tand passed toGetShardIdByPartitionId, and- also stored as
partition_id_viaReset.This assumes that the logical hash-partition id and the KV partition id expected by
GetShardIdByPartitionIdare identical. That matches the existing usage pattern but is worth double‑checking againstKvPartitionIdOfsemantics to avoid subtle misrouting if those ever diverge.eloq_data_store_service/ds_request.proto (2)
164-171: New shard_id fields in DeleteRangeRequest and ScanRequest look compatible.For
DeleteRangeRequestandScanRequest,shard_idis appended at new, higher tag numbers (6 and 12 respectively), and existing fields keep their original numbers. This is the correct pattern for evolving the schema with additional routing context; older peers will ignoreshard_id, and newer peers will default it to 0 when talking to older senders.Also applies to: 305-323
95-100: Review comment is incorrect—no cardinality change to key_str occurred.The review claims
ReadRequest.key_strwas changed fromrepeated bytestobytes(singular), but evidence shows this never happened:
- Current proto file:
bytes key_str = 3;(singular)- Multi-shard commit (dfd01f7):
bytes key_str = 3;(identical, singular)- C++ code: Uses
set_key_str()andkey_str()(singular accessors), notadd_key_str()or repeated patternsThe field has always been singular. The review may have conflated the actual
BatchWriteRecordsRequestfield renumbering (skip_wal and items) with a non-existent change toReadRequest.key_str.Likely an incorrect or invalid review comment.
data_store_service_client_closure.cpp (5)
71-82: Key handling via ReadClosure::Key() looks safe and clearer.Using
read_closure->Key()(and storing it asstd::string_view) instead of accessingKey().back():
- Avoids the somewhat opaque “last element” semantics and directly reflects the single-key nature of these callbacks.
- Makes log messages and archive decode errors include the exact packed key bytes, which is useful for debugging.
Assuming
ReadClosure::Key()already returns a stable view for the lifetime of the callback (as it did before), these changes are behaviorally safe.Also applies to: 96-106, 177-181
278-299: Shard-aware pagination for FetchBucketData is wired correctly.When the bucket isn’t drained and
bucket_data_items_is still empty, you now:
- Map
bucket_id_→ hash partition (MapBucketIdToHashPartitionId),- Map that to a KV partition via
KvPartitionIdOf,- Derive
data_shard_idwithGetShardIdByPartitionId(kv_partition_id, false), and- Use both
kv_partition_idanddata_shard_idin the recursiveScanNextcall.Given the retrieved learning that bucket IDs are bounded well below
UINT16_MAX, theint32_tintermediates are safe. This preserves the invariant that all subsequent scans for this bucket stay on the same KV partition and shard.
516-532: PartitionBatchCallback now reuses the original shard_id; this matches partition-scoped coordination.The callback now casts
closuretoBatchWriteRecordsClosure, readsShardId()once, and reuses thatdata_shard_idfor all follow‑upBatchWriteRecordscalls for the same partition.This keeps all batches for a partition on the same shard and aligns with the existing design where the global coordinator tracks completion at the partition level, not per-batch. As long as
PartitionBatchCallbackis only ever invoked with aBatchWriteRecordsClosure(which matches the call sites), this change is correct.
617-631: ShardId propagation across ScanNext/Read continuations looks coherent.All the changed continuations (
FetchAllDatabaseCallback,DiscoverAllTableNamesCallback,FetchTableRangesCallback,FetchRangeSlicesCallback,FetchTableStatsCallback,LoadRangeSliceCallback,FetchArchivesCallback,FetchRecordArchivesCallback) now:
- Extract
ShardId()from the currentScanNextClosureorReadClosure, and- Pass it into subsequent
ScanNext/Readcalls along with the same partition/session id.This is the right pattern for multi-shard routing: once the initial request determined the shard, all pagination and follow-up reads stay pinned to that shard rather than re-deriving it. Provided the initial closures are constructed with the correct shard_id (per
DataStoreServiceClientchanges), this wiring looks consistent.Also applies to: 686-700, 819-833, 917-923, 1013-1020, 1132-1145, 1239-1252, 1308-1321, 1417-1431, 1443-1456
245-276: Verify: FetchBucketDataCc and AddDataItem cannot be located in the repository.The review comment raises concerns about a key format change, but critical components cannot be found for verification:
FetchBucketDataCcclass definition not found in any codebase filesAddDataItemmethod signature not found in headersDecodeKvKeyForHashPartfunction confirmed NOT to existAdditionally, the reviewed code at lines 264-265 uses the same pattern already established in
rocksdb_handler.cpp:1383, suggesting this is not a novel change. The commented code shows a previousTxKeyvariant, butrocksdb_handler.cppalready uses rawkey_str—indicating the current pattern may have been working.Please confirm:
- Whether
FetchBucketDataCcis defined externally or generated- What
AddDataItemexpects downstream (raw KV key or wrapped form)- Whether the
rocksdb_handler.cppusage (same pattern as lines 264-265) has been validateddata_store_service_scanner.h (2)
207-221: SinglePartitionScanner now carries data_shard_id; lifecycle/reset behavior is sound.
Resetnow initializes bothpartition_id_(int32_t) anddata_shard_id_(uint32_t), andClear()zeroes them along with the rest of the scanner state. Combined with the use indata_store_service_scanner.cpp(Init → Reset → FetchNextBatch/ScanClose), this ensures:
data_shard_id_is always set before use, and- pooled scanners are fully scrubbed when returned to the
ObjectPool.This looks correct for per-shard scanner reuse.
Also applies to: 248-259, 264-273
279-286: HASH_PARTITION_COUNT tied to txservice::total_hash_partitions is a good centralization.Defining:
static constexpr uint32_t HASH_PARTITION_COUNT = txservice::total_hash_partitions;removes the previous local duplication/ifdefs and keeps the scanner aligned with the system-wide partition count. This simplifies configuration and reduces the chance of mismatched constants.
eloq_data_store_service/data_store_service.cpp (4)
76-154: TTLWrapperCache lifecycle and concurrency look correctThe TTL-based iterator cache:
- Uses a dedicated
ThreadWorkerPoolwith a single worker to runTTLCheckWorker().- Protects
ttl_wrapper_cache_,in_use_, andlast_access_time_behindbthread::Mutex, both in the worker and inEmplace/Borrow/Return/Clear/ForceEraseIters.- Shuts down the worker in
~TTLWrapperCache()before callingClear(), ensuring no background thread races with destruction.Given
ForceEraseIters()is only used from shutdown paths (e.g.,ForceEraseScanItersfromRocksDBDataStoreCommon::Shutdown), the combination ofForceEraseIters()followed byClear()is safe (the map is already empty, so theInUse()==falseassertion doesn’t fire).Overall this design is reasonable for managing scan iterators per shard.
Also applies to: 188-207, 1127-1176
443-945: Per-shard gating and ownership checks across read and write paths are consistentThe refactor of
Read,FlushData,DeleteRange,CreateTable,DropTable, andBatchWriteRecordsto:
- Derive
shard_idfrom the RPC/local request,- Guard with
IsOwnerOfShard(shard_id)andPrepareShardingError,- Then load
ds_ref.shard_status_and require appropriate states (ReadOnly/ReadWritefor reads/scans,ReadWriteonly for writes), and- Call into
ds_ref.data_store_with per‑shard request objectsis internally consistent and matches the new multi‑shard design. The write paths pair
IncreaseWriteReqCount(shard_id)with the correspondingDecreaseWriteReqCountcalls inSetFinish()of the request wrappers, maintaining the per‑shard in‑flight write counters.
1341-1397: Shard-status checks in SyncFileCache and ProcessSyncFileCache are sensibleThe new
SyncFileCache/ProcessSyncFileCacheimplementation:
- Only processes requests when the target shard’s status is
Closed, treating non-Closed shards as non-standby and logging a warning.- Validates the storage path via
data_store_factory_before mutating the filesystem.- During directory cleanup and S3 download, repeatedly checks that
ds_ref.shard_status_remainsClosedand aborts if the shard is opened mid-operation.- Properly finishes the RPC via
SyncFileCacheLocalRequest::Finish()and, on the S3 path, ensuresis_file_sync_running_is reset.Aside from the
is_file_sync_running_race noted separately, the shard-status gating and I/O error handling here look correct and aligned with the intended “only sync when DB is closed on a standby” semantics.Also applies to: 1459-1477, 1505-1512, 1571-1579
2888-2987: FileCacheSyncWorker multi-shard iteration and primary-only sync behavior look correctThe new
FileCacheSyncWorker:
- Iterates all entries in
data_shards_and only considers shards whoseshard_status_isReadWrite, i.e., primaries.- Skips shards with
data_store_ == nullptr, thendynamic_casts toRocksDBCloudDataStoreand continues only if the cast succeeds.- Collects cached SST files via
CollectCachedSstFiles, then sends aSyncFileCacheRPC with the per-shardshard_idand file list to each non-self member node.This aligns with the expectation that primaries push cache state to their standbys. The per-shard status check prevents followers or closed shards from attempting to sync, and the
dynamic_castkeeps this path specific to cloud-backed stores.eloq_data_store_service/internal_request.h (1)
36-327: Shard-aware write requests and per-shard write counters are wired correctlyThe additions of
GetShardId()toWriteRecordsRequest,FlushDataRequest, andDeleteRangeRequest, plus the corresponding implementations in:
WriteRecordsRpcRequest/WriteRecordsLocalRequest,FlushDataRpcRequest/FlushDataLocalRequest,DeleteRangeRpcRequest/DeleteRangeLocalRequest,ensure that each write request now carries its shard id explicitly. The
SetFinishmethods consistently call:data_store_service_->DecreaseWriteReqCount(GetShardId()); // or DecreaseWriteReqCount(shard_id_);which matches the per-shard
IncreaseWriteReqCount(shard_id)calls inDataStoreService. Clear()/Reset() correctly initializeshard_id_toUINT32_MAXand then to the real shard onReset, preventing stale counters.This looks coherent with the new multi-shard design.
Also applies to: 329-469
eloq_data_store_service/data_store_service.h (1)
692-728: Per-shard DataShard encapsulation and shutdown logic align with multi-shard designThe new
DataShardstruct cleanly encapsulates all per-shard state:
shard_id_,data_store_,shard_status_,ongoing_write_requests_,scan_iter_cache_(TTLWrapperCache instance),is_file_sync_running_to coordinate with file-cache sync.
DataShard::ShutDown()callsdata_store_->Shutdown()and clears the scan iterator cache (destroying its TTL worker) without nullingdata_store_, which is appropriate given the 2‑phase close semantics (status moves to Closed and the pointer remains untilDataStoreServiceitself is destroyed). This matches the usage in the service destructor and the shard lifecycle helpers.
| uint32_t DataStoreServiceClient::GetShardIdByPartitionId( | ||
| int32_t partition_id, bool is_range_partition) const | ||
| { | ||
| return IsLocalShard(GetShardIdByPartitionId(partition_id)); | ||
| uint16_t bucket_id; | ||
| if (is_range_partition) | ||
| { | ||
| bucket_id = txservice::Sharder::MapRangeIdToBucketId(partition_id); | ||
| } | ||
| else | ||
| { | ||
| bucket_id = | ||
| txservice::Sharder::MapHashPartitionIdToBucketId(partition_id); | ||
| } | ||
|
|
||
| auto it = bucket_infos_.find(bucket_id); | ||
| assert(it != bucket_infos_.end()); | ||
| if (it != bucket_infos_.end()) | ||
| { | ||
| uint32_t shard_id = it->second->BucketOwner(); | ||
| assert(dss_shard_ids_.find(shard_id) != dss_shard_ids_.end()); | ||
| return shard_id; | ||
| } | ||
| LOG(ERROR) << "Bucket not found for partition_id=" << partition_id | ||
| << " (bucket_id=" << bucket_id << ")"; | ||
| return UINT32_MAX; | ||
| } |
There was a problem hiding this comment.
Do not return a sentinel shard_id that callers ignore
GetShardIdByPartitionId now logs and returns UINT32_MAX when a bucket lookup fails, but every caller still assumes the value is usable and immediately indexes dss_shards_/dss_nodes_. In a bootstrap configuration with zero shards (or any inconsistent topology), we hit the LOG(ERROR) path, return UINT32_MAX, and then execute dss_nodes_[UINT32_MAX].Channel() in the remote paths (e.g. ReadInternal, BatchWriteRecordsInternal), which is undefined behaviour.
Previously this path LOG(FATAL)'d, so we never proceeded with an invalid shard_id. Please either keep the fatal guard or propagate a real failure to callers and abort the request before touching dss_nodes_. For example:
- auto it = bucket_infos_.find(bucket_id);
- assert(it != bucket_infos_.end());
- if (it != bucket_infos_.end())
- {
- uint32_t shard_id = it->second->BucketOwner();
- assert(dss_shard_ids_.find(shard_id) != dss_shard_ids_.end());
- return shard_id;
- }
- LOG(ERROR) << "Bucket not found for partition_id=" << partition_id
- << " (bucket_id=" << bucket_id << ")";
- return UINT32_MAX;
+ auto it = bucket_infos_.find(bucket_id);
+ if (it == bucket_infos_.end())
+ {
+ LOG(FATAL) << "Bucket not found for partition_id=" << partition_id
+ << " (bucket_id=" << bucket_id << ")";
+ }
+ uint32_t shard_id = it->second->BucketOwner();
+ if (dss_shard_ids_.find(shard_id) == dss_shard_ids_.end())
+ {
+ LOG(FATAL) << "Shard " << shard_id
+ << " not present in dss_shard_ids_ for partition_id="
+ << partition_id << " (bucket_id=" << bucket_id << ")";
+ }
+ return shard_id;Without reinstating a hard guard (or updating every caller to bail when UINT32_MAX is returned), we're still one invalid config away from corrupting memory.
Related PR:
eloqdata/tx_service#199
eloqdata/eloqdoc#298
eloqdata/eloqsql#158
eloqdata/eloqkv#265
Summary by CodeRabbit
New Features
Refactor
Chores
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.