Conversation
WalkthroughReplaces on-disk DSS config with an in‑memory DSS cluster config in Changes
Sequence Diagram(s)sequenceDiagram
actor Operator
participant RedisSvc as redis_service.cpp
participant ConfigMaker as Config builder / Peer fetch
participant DataStore as DataStoreService
Operator->>RedisSvc: start node
RedisSvc->>RedisSvc: determine native_ng_id and node identity
RedisSvc->>RedisSvc: compute dss_node_id and dss_leader_id (bootstrap/single-node logic)
alt eloq_dss_peer_node set
RedisSvc->>ConfigMaker: SetThisNode(...) & FetchConfigFromPeer(...)
ConfigMaker-->>RedisSvc: ds_config populated / error
else no peer configured
RedisSvc->>ConfigMaker: TxConfigsToDssClusterConfig(dss_node_id, native_ng_id, ng_configs, dss_leader_id, ds_config)
ConfigMaker-->>RedisSvc: ds_config populated
end
RedisSvc->>DataStore: StartService(bootstrap, dss_leader_id, dss_node_id)
DataStore-->>RedisSvc: started
alt bootstrap or single-node
RedisSvc->>RedisSvc: store_hd_.Connect() immediately
else non-bootstrap
RedisSvc->>RedisSvc: defer store_hd_.Connect() until role assigned
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
f73f40c to
1f595a7
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/redis_service.cpp (1)
1161-1168: Connection to store_hd_ is missing for non-bootstrap/non-single-node cases—store_hd_ will remain unconnected despite being used.The code calls
store_hd_->AppendPreBuiltTable()(line 1158) before the Connect condition, and for non-bootstrap/non-single-node scenarios, theConnect()call at line 1163 is skipped entirely. Additionally,RegisterKvMetrics()(line 1481) andSetTxService()(line 1484) are called unconditionally without any guarantee thatstore_hd_was connected.The comment at line 1162 states "connect when become leader or follower," but this deferred connection mechanism does not exist in the codebase—there are no
OnLeaderStart,OnFollower, or equivalent callbacks that invokeConnect(). This means non-bootstrap/non-single-node nodes will have an unconnected store handler used for table operations, metrics registration, and transaction service binding, which will likely cause failures.Either move
AppendPreBuiltTable()inside the Connect conditional block, or implement the promised deferred connection logic in the leadership/follower initialization code path.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/redis_service.cpp(4 hunks)store_handler(1 hunks)tx_service(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tx_service
- store_handler
🧰 Additional context used
🧬 Code graph analysis (1)
src/redis_service.cpp (1)
include/redis_service.h (1)
node_id(137-137)
🔇 Additional comments (1)
src/redis_service.cpp (1)
1020-1020: Add error handling for TxConfigsToDssClusterConfig and verify empty file path handling.The redesign switches from on-disk to in-memory DSS cluster configuration. While the empty
dss_config_file_pathmay be intentional, you should verify thatEloqDS::DataStoreServicecorrectly handles an empty config file path parameter (line 1137).Additionally, line 1048 calls
TxConfigsToDssClusterConfig()without checking for errors—this could silently fail and produce an invalidds_config. Add error handling or validation after this call to ensure the config is properly initialized.#!/bin/bash # Check if TxConfigsToDssClusterConfig has a return value or error reporting mechanism rg -n "TxConfigsToDssClusterConfig" --type=cpp --type=h -B2 -A8Also applies to: 1048-1048, 1135-1139
| EloqDS::DataStoreServiceClient::TxConfigsToDssClusterConfig( | ||
| dss_node_id, | ||
| native_ng_id, | ||
| ng_configs, | ||
| dss_leader_id, | ||
| ds_config); | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add error handling for TxConfigsToDssClusterConfig.
The call to TxConfigsToDssClusterConfig lacks error handling, unlike the peer fetch path (lines 1038-1044). If this function can fail, add appropriate error checking and return false on failure to maintain consistency with the peer fetch error handling.
🤖 Prompt for AI Agents
In src/redis_service.cpp around lines 1048 to 1054, the call to
EloqDS::DataStoreServiceClient::TxConfigsToDssClusterConfig has no error
handling; mirror the peer-fetch path at lines 1038–1044 by checking the
function's return value, logging a clear error (including context such as
dss_node_id/native_ng_id) when it fails, and returning false from the
surrounding function on failure so behavior is consistent with the peer fetch
error handling.
27fef8f to
49de4ff
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/redis_service.cpp (1)
1048-1054: Missing error handling on TxConfigsToDssClusterConfig.Mirror the peer‑fetch path: check the return status, log context, and fail fast on error.
Apply this diff:
- EloqDS::DataStoreServiceClient::TxConfigsToDssClusterConfig( - dss_node_id, - native_ng_id, - ng_configs, - dss_leader_id, - ds_config); + if (!EloqDS::DataStoreServiceClient::TxConfigsToDssClusterConfig( + dss_node_id, + native_ng_id, + ng_configs, + dss_leader_id, + ds_config)) { + LOG(ERROR) << "TxConfigsToDssClusterConfig failed: " + << "dss_node_id=" << dss_node_id + << " native_ng_id=" << native_ng_id + << " leader_id=" << dss_leader_id; + return false; + }
🧹 Nitpick comments (5)
src/redis_service.cpp (5)
1024-1031: Ensure DSS node ID matches final DSS topology.Setting
dss_node_id = node_idassumes DSS IDs mirror Tx IDs. After building or fetchingds_config, derive/validate the DSS node id fromds_config(e.g., “this node” entry) to avoid ID drift, especially in the peer‑fetch path. Add a sanity check and log both IDs.
1032-1045: Peer‑fetch: validate “this node” presence and improve error context.After
FetchConfigFromPeer(...), verifyds_configcontains this node and matcheslocal_ip/TxPort2DssPort(local_tx_port). On failure, log ip, dss_port,node_id, and proposeddss_node_idto ease diagnosis.
1020-1021: Empty dss_config_file_path: confirm supported semantics.
dss_config_file_pathis set to""but passed intoDataStoreService. Ensure an empty path is explicitly supported (no attempts to read/write a file). If not, gate usage or add a clear comment.
1056-1058: Remove stale commented‑out file path.The legacy on‑disk path is dead code. Delete to reduce noise.
1161-1164: Deferred Connect requires a role‑change hook; clarify single‑node flow.Connecting only for
bootstrap || is_single_nodeimplies laterConnect()on role changes (leader or follower). Point to the actual callbacks that invoke it; otherwise this can leave the client disconnected in steady state. Also, re the earlier reviewer’s question at Line 1163: verify whether single‑node triggersOnLeaderStart(or equivalent). If not, the explicitConnect()here is the only activation; document this behavior.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/redis_service.cpp(4 hunks)store_handler(1 hunks)tx_service(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tx_service
- store_handler
🧰 Additional context used
🧬 Code graph analysis (1)
src/redis_service.cpp (1)
include/redis_service.h (1)
node_id(137-137)
🔇 Additional comments (1)
src/redis_service.cpp (1)
1144-1144: StartService parameters: UNKNOWN leader on multi‑node path.On non‑bootstrap, non‑single‑node starts,
dss_leader_idremainsUNKNOWN. ConfirmStartService(..., dss_leader_id, dss_node_id)tolerates this and that leadership will be resolved via DSS membership changes. Also confirmdss_node_idmatchesds_config(see earlier comment).
Summary by CodeRabbit