Skip to content

Support multi dss shards#158

Merged
lzxddz merged 2 commits intoeloqdata:eloq-10.6.10from
lzxddz:dss_multi_shard-rebase
Nov 21, 2025
Merged

Support multi dss shards#158
lzxddz merged 2 commits intoeloqdata:eloq-10.6.10from
lzxddz:dss_multi_shard-rebase

Conversation

@lzxddz
Copy link
Collaborator

@lzxddz lzxddz commented Nov 9, 2025

Summary by CodeRabbit

  • Refactor

    • Simplified data-store startup/config to use a unified bootstrap/single-node flag and clearer leader mapping; startup flow streamlined and logging level now respects build-time debug/info settings.
  • Chores

    • Updated storage subsystem submodule references.
  • Tests

    • Adjusted multi-node and fault-injection tests to target different nodes and updated expected restart outcomes.

✏️ Tip: You can customize this high-level summary in your review settings.

@lzxddz lzxddz self-assigned this Nov 9, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 9, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Replace explicit DSS leader/node IDs with a node-centric config flow: parse cluster config to derive node_id and ng_leaders, call TxConfigsToDssClusterConfig(node_id,...), change StartService and DataStoreServiceClient signatures to accept a bootstrap/single-node flag first, bump two storage/eloq submodules, and update tests to target different nodes.

Changes

Cohort / File(s) Summary
DataStore startup/config wiring
storage/eloq/ha_eloq.cc
Replace direct use of dss_leader_id/dss_node_id with node-centric flow: parse cluster config, derive node_id, build ng_leaders, call TxConfigsToDssClusterConfig(node_id, ng_configs, ng_leaders, ds_config), construct DataStoreServiceClient(bootstrap_flag, catalog_factory, ds_config, ds_peer_node.empty(), data_store_service_.get()), and call StartService(bootstrap_flag) (removed explicit leader/node ID params). Add compile-time guard around AWS loggingOptions.logLevel.
Submodule pointer updates
storage/eloq/store_handler
storage/eloq/tx_service
Bump submodule commit references for store_handler and tx_service. No functional code in this diff.
Tests / fixtures
storage/eloq/mysql-test/mono_multi/t/fault_inject_for_upsert_1.test
storage/eloq/mysql-test/mono_multi/r/fault_inject_for_upsert_1.result
storage/eloq/mysql-test/mono_multi/t/multi_nodes.test
storage/eloq/mysql-test/mono_multi/r/multi_nodes.result
Adjust test comments, debug session node IDs, connection targets, and expected restart/expect files to reference different node IDs/connections (changes between server-2 and server-3).

Sequence Diagram(s)

sequenceDiagram
    participant HA as ha_eloq.cc
    participant CFG as TxConfigsToDssClusterConfig
    participant DSC as DataStoreServiceClient
    participant DS as DataStoreService

    rect rgb(245,250,255)
    Note over HA,CFG: Bootstrap path parses cluster config,\nderive node_id and ng_leaders
    HA->>CFG: TxConfigsToDssClusterConfig(node_id, ng_configs, ng_leaders, ds_config)
    CFG-->>HA: ds_cluster_config
    end

    rect rgb(240,255,245)
    Note over HA,DSC: Construct client with bootstrap/single-node flag first
    HA->>DSC: DataStoreServiceClient(bootstrap_flag, catalog_factory, ds_config, ds_peer_empty, service)
    DSC-->>HA: client_instance
    end

    rect rgb(255,250,240)
    Note over HA,DS: Start service with bootstrap/single-node flag (no explicit leader/node IDs)
    HA->>DS: StartService(bootstrap_flag)
    DS-->>HA: started
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review ng_leaders population and node_id derivation in both bootstrap and non-bootstrap flows.
  • Verify all call sites and dependency injection updated to the new DataStoreServiceClient signature.
  • Confirm StartService overload changes match service expectations.
  • Check tests for correct node-target and expectation adjustments.

Possibly related PRs

Suggested reviewers

  • xiexiaoy

Poem

🐰 I hopped through configs, found each node,
traced leaders on a quiet road.
One flag to start, no IDs to chase,
bootstrap tucked in a snug little space.
Nibble, commit — a tidy code embrace. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Support multi dss shards' directly relates to the main architectural changes in the PR, which refactor DSS (DataStoreService) bootstrap and configuration handling to support multiple distributed data store shards.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c38de00 and 522adc8.

📒 Files selected for processing (7)
  • storage/eloq/ha_eloq.cc (5 hunks)
  • storage/eloq/mysql-test/mono_multi/r/fault_inject_for_upsert_1.result (1 hunks)
  • storage/eloq/mysql-test/mono_multi/r/multi_nodes.result (1 hunks)
  • storage/eloq/mysql-test/mono_multi/t/fault_inject_for_upsert_1.test (1 hunks)
  • storage/eloq/mysql-test/mono_multi/t/multi_nodes.test (1 hunks)
  • storage/eloq/store_handler (1 hunks)
  • storage/eloq/tx_service (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
storage/eloq/ha_eloq.cc (1)

2662-2667: is_single_node is hard-coded true; breaks multi-shard/peer flows

is_single_node never flips to false, so StartService(...) and the Client ctor always see true. On cloud/EloqStore this can incorrectly bootstrap/create on multi-node runs. Compute it from ng_configs and ds_peer_node before first use.

Apply near the first use (before StartService / Client construction):

-    bool is_single_node= true;
+    bool is_single_node= true;  // will be recomputed after topology is known
     std::string ds_peer_node= eloq_dss_peer_node;
     std::string ds_branch_name= eloq_dss_branch_name;

And just before building DSS config / starting service (right after you populate ds_config):

+    // Single-node when no peer is specified and only one NG in topology.
+    if (!ds_peer_node.empty()) {
+      is_single_node = false;
+    } else {
+      is_single_node = (ng_configs.size() == 1);
+    }
🧹 Nitpick comments (3)
storage/eloq/ha_eloq.cc (3)

2760-2772: Seed ng_leaders for single-NG non-bootstrap; pass consistent topology

When not bootstrapping and running a single NG locally (no peer), ng_leaders stays empty, which can leave leader choice ambiguous. Seed it explicitly. This also aligns the input to TxConfigsToDssClusterConfig.

       std::unordered_map<uint32_t, uint32_t> ng_leaders;
       if (opt_bootstrap)
       {
         // For bootstrap, start all data store shards in current node.
         for (auto &ng : ng_configs)
         {
           ng_leaders.emplace(ng.first, node_id);
         }
       }
+      else if (ng_configs.size() == 1 && ds_peer_node.empty())
+      {
+        // Single‑NG local run: elect local node as leader explicitly.
+        ng_leaders.emplace(ng_configs.begin()->first, dss_node_id);
+      }
 
-      EloqDS::DataStoreServiceClient::TxConfigsToDssClusterConfig(
-          dss_node_id, ng_configs, ng_leaders, ds_config);
+      EloqDS::DataStoreServiceClient::TxConfigsToDssClusterConfig(
+          dss_node_id, ng_configs, ng_leaders, ds_config);

2725-2736: Remove now-unused dss_leader_id

dss_leader_id is set but never used after the API changes; keep the code clean to avoid warnings.

-    uint32_t dss_leader_id= EloqDS::UNKNOWN_DSS_LEADER_NODE_ID;
+    // no explicit DSS leader needed here; leader election comes from ng_leaders
@@
-    if (opt_bootstrap || is_single_node)
-    {
-      dss_leader_id= node_id;
-    }

2593-2614: Minor: break early once local node is found

Once node_id/native_ng_id are set, break out of both loops to avoid needless scans. Based on learnings.

   for (auto &pair : ng_configs)
   {
     auto &ng_nodes= pair.second;
     for (size_t i= 0; i < ng_nodes.size(); i++)
     {
       if (ng_nodes[i].host_name_ == local_ip &&
           ng_nodes[i].port_ == local_port)
       {
         node_id= ng_nodes[i].node_id_;
         found= true;
         if (ng_nodes[i].is_candidate_)
         {
           // found native_ng_id.
           native_ng_id= pair.first;
-          break;
+          // break inner and outer loops
+          i = ng_nodes.size();
+          goto __found_local__;
         }
       }
     }
   }
+__found_local__:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a785d29 and a11e3fc.

📒 Files selected for processing (3)
  • storage/eloq/ha_eloq.cc (3 hunks)
  • storage/eloq/store_handler (1 hunks)
  • storage/eloq/tx_service (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2571-2623
Timestamp: 2025-11-07T07:10:40.624Z
Learning: Repo: eloqdata/eloqsql PR: 154
File/Area: storage/eloq/ha_eloq.cc (eloq_init_func cluster membership resolution)
Learning: In this repo’s ng_config, the local node is guaranteed to be a candidate (NodeConfig::is_candidate_ == true). Therefore, assigning native_ng_id only when is_candidate_ is true is acceptable; the key improvement is to break early from both loops once the local node is found.
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2770-2781
Timestamp: 2025-11-07T05:12:43.190Z
Learning: In eloqdata/eloqsql, storage/eloq/ha_eloq.cc (KV_ELOQDS path): for DATA_STORE_TYPE_ELOQDSS_ROCKSDB (local RocksDB), each eloqsql instance owns its own RocksDB instance, and passing create_if_missing=true to DataStoreService::StartService is intentional and safe.
📚 Learning: 2025-11-07T07:10:40.624Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2571-2623
Timestamp: 2025-11-07T07:10:40.624Z
Learning: Repo: eloqdata/eloqsql PR: 154
File/Area: storage/eloq/ha_eloq.cc (eloq_init_func cluster membership resolution)
Learning: In this repo’s ng_config, the local node is guaranteed to be a candidate (NodeConfig::is_candidate_ == true). Therefore, assigning native_ng_id only when is_candidate_ is true is acceptable; the key improvement is to break early from both loops once the local node is found.

Applied to files:

  • storage/eloq/ha_eloq.cc
📚 Learning: 2025-11-07T05:12:43.190Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2770-2781
Timestamp: 2025-11-07T05:12:43.190Z
Learning: In eloqdata/eloqsql, storage/eloq/ha_eloq.cc (KV_ELOQDS path): for DATA_STORE_TYPE_ELOQDSS_ROCKSDB (local RocksDB), each eloqsql instance owns its own RocksDB instance, and passing create_if_missing=true to DataStoreService::StartService is intentional and safe.

Applied to files:

  • storage/eloq/ha_eloq.cc
📚 Learning: 2025-09-25T11:41:40.111Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 127
File: storage/eloq/build_log_service.cmake:97-124
Timestamp: 2025-09-25T11:41:40.111Z
Learning: The `storage/eloq/build_log_service.cmake` file is specifically for the open log service build and only supports ROCKSDB as its log state backend. Unlike the enterprise version in `build_eloq_log_service.cmake`, it does not need to handle cloud-specific variants like ROCKSDB_CLOUD_S3 or ROCKSDB_CLOUD_GCS.

Applied to files:

  • storage/eloq/ha_eloq.cc
🔇 Additional comments (4)
storage/eloq/tx_service (1)

1-1: Verify submodule commit hash and alignment with multi-DSS refactoring.

The submodule pointer has been updated to a new commit. Per the PR objectives and the AI summary, this update should include API changes for multi-DSS shard support, such as:

  • DataStoreServiceClient constructor now accepting an additional bootstrap/standalone boolean parameter
  • TxConfigsToDssClusterConfig method signature refactored to accept ng_leaders map instead of separate leader/node identifiers
  • Simplified StartService calls with fewer arguments

Please confirm that:

  1. The new commit hash (819ffecaae4346b66b759580063930f6c1ec99d1) exists and is valid in the upstream submodule repository
  2. The submodule changes contain the expected API signature updates referenced in the PR objectives
  3. Related integration points in storage/eloq/ha_eloq.cc are compatible with these new APIs
storage/eloq/store_handler (1)

1-1: Submodule pointer update—request clarification on scope.

This PR file contains only a submodule pointer change for storage/eloq/store_handler. However, the AI summary indicates that the primary functional changes for "Support multi dss shards" are in storage/eloq/ha_eloq.cc (e.g., TxConfigsToDssClusterConfig signature refactoring, ng_leaders map usage, StartService call simplification), which is not provided for review.

Submodule pointer updates alone cannot be meaningfully reviewed without visibility into the actual code changes in the target commit.

Could you clarify:

  1. Are the functional code changes in storage/eloq/ha_eloq.cc (and potentially other files) part of this PR and intended for review?
  2. Is the store_handler submodule update necessary to support the multi-dss shard functionality, or is it an ancillary dependency update?
  3. What is the relationship between this submodule update and the multi-dss shard feature?

Providing the actual functional code changes would enable a complete review of the topology refactoring and bootstrap logic changes.

storage/eloq/ha_eloq.cc (2)

2800-2801: Submodule verification required for constructor signatures

The review comment requests verification of DataStoreServiceClient constructor consistency in the submodule headers. The class is defined in the storage/eloq/store_handler submodule (https://github.com/eloqdata/store_handler.git), which is not cloned in this environment.

The constructor usage at lines 2800-2801 shows a 4-parameter call: EloqDS::DataStoreServiceClient(bool, catalog_factory, ds_config, data_store_service_.get()). To complete verification of whether headers match and no older overloads remain, inspect store_handler/data_store_service_client.h locally to confirm the constructor signature and that no deprecated overloads are present.


2782-2789: is_single_node is never recomputed; verify if hardcoded true is intentional

Verification shows is_single_node is initialized to true at line 2663 and never reassigned. The review comment expects this value to be "recomputed" before line 2788, but the code does not recompute it—it remains hardcoded to true for all usages (lines 2732, 2788, 2799).

While this may be correct by design (since non-peer deployments with ng_configs.size() > 1 error out, making them effectively single-node), this contradicts the review's explicit assumption. Confirm whether hardcoding is_single_node= true is intentional or if it should be derived from cluster topology (e.g., ng_configs.size() == 1).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a11e3fc and fcd3636.

📒 Files selected for processing (1)
  • storage/eloq/ha_eloq.cc (3 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2571-2623
Timestamp: 2025-11-07T07:10:40.624Z
Learning: Repo: eloqdata/eloqsql PR: 154
File/Area: storage/eloq/ha_eloq.cc (eloq_init_func cluster membership resolution)
Learning: In this repo’s ng_config, the local node is guaranteed to be a candidate (NodeConfig::is_candidate_ == true). Therefore, assigning native_ng_id only when is_candidate_ is true is acceptable; the key improvement is to break early from both loops once the local node is found.
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2770-2781
Timestamp: 2025-11-07T05:12:43.190Z
Learning: In eloqdata/eloqsql, storage/eloq/ha_eloq.cc (KV_ELOQDS path): for DATA_STORE_TYPE_ELOQDSS_ROCKSDB (local RocksDB), each eloqsql instance owns its own RocksDB instance, and passing create_if_missing=true to DataStoreService::StartService is intentional and safe.
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 127
File: storage/eloq/build_log_service.cmake:97-124
Timestamp: 2025-09-25T11:41:40.111Z
Learning: The `storage/eloq/build_log_service.cmake` file is specifically for the open log service build and only supports ROCKSDB as its log state backend. Unlike the enterprise version in `build_eloq_log_service.cmake`, it does not need to handle cloud-specific variants like ROCKSDB_CLOUD_S3 or ROCKSDB_CLOUD_GCS.
📚 Learning: 2025-11-07T07:10:40.624Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2571-2623
Timestamp: 2025-11-07T07:10:40.624Z
Learning: Repo: eloqdata/eloqsql PR: 154
File/Area: storage/eloq/ha_eloq.cc (eloq_init_func cluster membership resolution)
Learning: In this repo’s ng_config, the local node is guaranteed to be a candidate (NodeConfig::is_candidate_ == true). Therefore, assigning native_ng_id only when is_candidate_ is true is acceptable; the key improvement is to break early from both loops once the local node is found.

Applied to files:

  • storage/eloq/ha_eloq.cc
📚 Learning: 2025-11-07T05:12:43.190Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2770-2781
Timestamp: 2025-11-07T05:12:43.190Z
Learning: In eloqdata/eloqsql, storage/eloq/ha_eloq.cc (KV_ELOQDS path): for DATA_STORE_TYPE_ELOQDSS_ROCKSDB (local RocksDB), each eloqsql instance owns its own RocksDB instance, and passing create_if_missing=true to DataStoreService::StartService is intentional and safe.

Applied to files:

  • storage/eloq/ha_eloq.cc
📚 Learning: 2025-09-25T11:41:40.111Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 127
File: storage/eloq/build_log_service.cmake:97-124
Timestamp: 2025-09-25T11:41:40.111Z
Learning: The `storage/eloq/build_log_service.cmake` file is specifically for the open log service build and only supports ROCKSDB as its log state backend. Unlike the enterprise version in `build_eloq_log_service.cmake`, it does not need to handle cloud-specific variants like ROCKSDB_CLOUD_S3 or ROCKSDB_CLOUD_GCS.

Applied to files:

  • storage/eloq/ha_eloq.cc
🔇 Additional comments (3)
storage/eloq/ha_eloq.cc (3)

2770-2770: ROCKSDB local DSS: StartService(true) is correct

Passing true here to always create-if-missing for per-instance RocksDB matches intended behavior. Looks good. Based on learnings


2783-2785: DataStoreServiceClient bootstrap flag: verified as correctly updated

Verified that only one DataStoreServiceClient instantiation exists in the repository at storage/eloq/ha_eloq.cc:2782, and it correctly uses the updated signature with (opt_bootstrap || is_single_node) as the first parameter, consistent with the is_single_node recomputation from this PR.


2772-2772: Semantics verified: boolean correctly controls bootstrap vs. join behavior

After examining the code context, the StartService boolean parameter semantics are confirmed correct:

  • ROCKSDB branch (line 2770): StartService(true) with explicit comment "we always set create_if_missing to true" for non-shared local storage (confirmed safe by prior learning)
  • Non-ROCKSDB branch (line 2772): StartService((opt_bootstrap || is_single_node)) correctly implements the intended behavior:
    • true when bootstrapping or running in single-node mode → creates the data store
    • false otherwise → joins an existing shared store without lazy creation

The false value will not cause unintended lazy creation in shared stores; it correctly signals "join existing cluster." The same pattern is used when initializing DataStoreServiceClient, confirming the semantic consistency across the codebase.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
storage/eloq/ha_eloq.cc (2)

2663-2664: Guard and make is_single_node computation robust (no behavior change).

Defensively handle empty configs and consider all nodes; keeps semantics clear and avoids accidental UB if configs were ever empty.

-    bool is_single_node=
-        (ng_configs.size() == 1 && ng_configs.begin()->second.size() == 1);
+    bool is_single_node = false;
+    if (!ng_configs.empty()) {
+      size_t total_nodes = 0;
+      for (const auto &ng : ng_configs) total_nodes += ng.second.size();
+      is_single_node = (ng_configs.size() == 1 && total_nodes == 1);
+    }

2745-2776: Recompute is_single_node after loading tmp_ng_configs; avoid re‑reading if not needed.

  • Recalculate the flag from tmp_ng_configs to prevent drift if it differs from the earlier ng_configs.
  • Minor: If ng_configs already contains all NGs, you can skip re-reading to tmp_ng_configs and reuse it.
-      if (opt_bootstrap || is_single_node)
+      if (opt_bootstrap || is_single_node)
       {
         // Must parse all node groups to generate data store shards configs.
         std::unordered_map<uint32_t, std::vector<NodeConfig>> tmp_ng_configs;
         if (!txservice::ReadClusterConfigFile(
                 cluster_config_file, tmp_ng_configs, cluster_config_version))
         {
           // Read cluster topology from general config file in this case
           auto parse_res= txservice::ParseNgConfig(
               eloq_ip_list, eloq_standby_ip_list, eloq_voter_ip_list,
               tmp_ng_configs, eloq_node_group_replica_num, 0);
           if (!parse_res)
           {
             LOG(ERROR)
                 << "Failed to extract cluster configs from ip_port_list.";
             DBUG_RETURN(eloq_init_abort());
           }
         }
+        // Recompute single-node from the fully populated topology.
+        {
+          size_t total_nodes = 0;
+          for (const auto &ng : tmp_ng_configs) total_nodes += ng.second.size();
+          is_single_node = (tmp_ng_configs.size() == 1 && total_nodes == 1);
+        }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fcd3636 and e3e6fbf.

📒 Files selected for processing (1)
  • storage/eloq/ha_eloq.cc (4 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2571-2623
Timestamp: 2025-11-07T07:10:40.624Z
Learning: Repo: eloqdata/eloqsql PR: 154
File/Area: storage/eloq/ha_eloq.cc (eloq_init_func cluster membership resolution)
Learning: In this repo’s ng_config, the local node is guaranteed to be a candidate (NodeConfig::is_candidate_ == true). Therefore, assigning native_ng_id only when is_candidate_ is true is acceptable; the key improvement is to break early from both loops once the local node is found.
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2770-2781
Timestamp: 2025-11-07T05:12:43.190Z
Learning: In eloqdata/eloqsql, storage/eloq/ha_eloq.cc (KV_ELOQDS path): for DATA_STORE_TYPE_ELOQDSS_ROCKSDB (local RocksDB), each eloqsql instance owns its own RocksDB instance, and passing create_if_missing=true to DataStoreService::StartService is intentional and safe.
📚 Learning: 2025-11-07T07:10:40.624Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2571-2623
Timestamp: 2025-11-07T07:10:40.624Z
Learning: Repo: eloqdata/eloqsql PR: 154
File/Area: storage/eloq/ha_eloq.cc (eloq_init_func cluster membership resolution)
Learning: In this repo’s ng_config, the local node is guaranteed to be a candidate (NodeConfig::is_candidate_ == true). Therefore, assigning native_ng_id only when is_candidate_ is true is acceptable; the key improvement is to break early from both loops once the local node is found.

Applied to files:

  • storage/eloq/ha_eloq.cc
📚 Learning: 2025-11-07T05:12:43.190Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2770-2781
Timestamp: 2025-11-07T05:12:43.190Z
Learning: In eloqdata/eloqsql, storage/eloq/ha_eloq.cc (KV_ELOQDS path): for DATA_STORE_TYPE_ELOQDSS_ROCKSDB (local RocksDB), each eloqsql instance owns its own RocksDB instance, and passing create_if_missing=true to DataStoreService::StartService is intentional and safe.

Applied to files:

  • storage/eloq/ha_eloq.cc
📚 Learning: 2025-09-25T11:41:40.111Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 127
File: storage/eloq/build_log_service.cmake:97-124
Timestamp: 2025-09-25T11:41:40.111Z
Learning: The `storage/eloq/build_log_service.cmake` file is specifically for the open log service build and only supports ROCKSDB as its log state backend. Unlike the enterprise version in `build_eloq_log_service.cmake`, it does not need to handle cloud-specific variants like ROCKSDB_CLOUD_S3 or ROCKSDB_CLOUD_GCS.

Applied to files:

  • storage/eloq/ha_eloq.cc
🔇 Additional comments (2)
storage/eloq/ha_eloq.cc (2)

2798-2798: StartService flag logic looks right; RocksDB path preserved.

Using (opt_bootstrap || is_single_node) for non-RocksDB aligns bootstrap vs multi-node behavior; RocksDB stays true as intended. Please run a multi-shard, multi-node smoke test with and without dss_peer_node to validate startup modes.

Based on learnings.


2809-2811: Constructor parameter order change verified — no stragglers found in codebase.

The single DataStoreServiceClient instantiation in the repository (storage/eloq/ha_eloq.cc:2808) has been updated with the new parameter order: (opt_bootstrap || is_single_node) first. No additional call sites with outdated parameter ordering exist. The codebase-level change is complete and consistent.

Submodule compatibility and CI coverage remain external concerns that should be validated through your integration testing and release process.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
storage/eloq/ha_eloq.cc (2)

2667-2668: Safer single‑node heuristic

Current check inspects only the first NG’s size. Prefer counting total nodes across all NGs to avoid false positives if topology shape changes.

-    bool is_single_node=
-        (ng_configs.size() == 1 && ng_configs.begin()->second.size() == 1);
+    size_t total_nodes = 0;
+    for (const auto &ng : ng_configs) total_nodes += ng.second.size();
+    bool is_single_node = (ng_configs.size() == 1 && total_nodes == 1);

2749-2785: Bootstrap vs non‑bootstrap leader map: confirm empty map semantics

Populating ng_leaders only for (opt_bootstrap || is_single_node) makes sense. Please confirm TxConfigsToDssClusterConfig treats an empty ng_leaders as “derive leaders from tx topology” and doesn’t default to local. If that’s not guaranteed, we should pass the computed leaders for non‑bootstrap as well.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e3e6fbf and 1c3bc1c.

📒 Files selected for processing (2)
  • storage/eloq/ha_eloq.cc (5 hunks)
  • storage/eloq/store_handler (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • storage/eloq/store_handler
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2571-2623
Timestamp: 2025-11-07T07:10:40.624Z
Learning: Repo: eloqdata/eloqsql PR: 154
File/Area: storage/eloq/ha_eloq.cc (eloq_init_func cluster membership resolution)
Learning: In this repo’s ng_config, the local node is guaranteed to be a candidate (NodeConfig::is_candidate_ == true). Therefore, assigning native_ng_id only when is_candidate_ is true is acceptable; the key improvement is to break early from both loops once the local node is found.
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2770-2781
Timestamp: 2025-11-07T05:12:43.190Z
Learning: In eloqdata/eloqsql, storage/eloq/ha_eloq.cc (KV_ELOQDS path): for DATA_STORE_TYPE_ELOQDSS_ROCKSDB (local RocksDB), each eloqsql instance owns its own RocksDB instance, and passing create_if_missing=true to DataStoreService::StartService is intentional and safe.
📚 Learning: 2025-11-07T07:10:40.624Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2571-2623
Timestamp: 2025-11-07T07:10:40.624Z
Learning: Repo: eloqdata/eloqsql PR: 154
File/Area: storage/eloq/ha_eloq.cc (eloq_init_func cluster membership resolution)
Learning: In this repo’s ng_config, the local node is guaranteed to be a candidate (NodeConfig::is_candidate_ == true). Therefore, assigning native_ng_id only when is_candidate_ is true is acceptable; the key improvement is to break early from both loops once the local node is found.

Applied to files:

  • storage/eloq/ha_eloq.cc
📚 Learning: 2025-11-07T05:12:43.190Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2770-2781
Timestamp: 2025-11-07T05:12:43.190Z
Learning: In eloqdata/eloqsql, storage/eloq/ha_eloq.cc (KV_ELOQDS path): for DATA_STORE_TYPE_ELOQDSS_ROCKSDB (local RocksDB), each eloqsql instance owns its own RocksDB instance, and passing create_if_missing=true to DataStoreService::StartService is intentional and safe.

Applied to files:

  • storage/eloq/ha_eloq.cc
📚 Learning: 2025-09-25T11:41:40.111Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 127
File: storage/eloq/build_log_service.cmake:97-124
Timestamp: 2025-09-25T11:41:40.111Z
Learning: The `storage/eloq/build_log_service.cmake` file is specifically for the open log service build and only supports ROCKSDB as its log state backend. Unlike the enterprise version in `build_eloq_log_service.cmake`, it does not need to handle cloud-specific variants like ROCKSDB_CLOUD_S3 or ROCKSDB_CLOUD_GCS.

Applied to files:

  • storage/eloq/ha_eloq.cc
🔇 Additional comments (4)
storage/eloq/ha_eloq.cc (4)

2301-2305: AWS log level switch looks good

Debug in non-DBUG builds and Info otherwise is appropriate here.


2800-2800: RocksDB StartService(true) is correct

Always create_if_missing for local RocksDB aligns with prior repo guidance. Based on learnings


2802-2802: StartService flag wiring

Passing (opt_bootstrap || is_single_node) for non‑RocksDB variants is consistent with the new flow.


2813-2815: DataStoreServiceClient ctor update

New leading bootstrap/single‑node flag is correctly threaded here.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
storage/eloq/ha_eloq.cc (1)

2730-2731: Single-node recompute and ng_leaders population look correct; confirm multi-NG single-host intent

  • Initializing is_single_node=false and recomputing it before use fixes the prior bootstrap/single-node misclassification.
  • Populating ng_leaders only on bootstrap or strictly 1 NG with 1 node is consistent.

Question: In a “single host but multi-NG” topology (multi DSS shards on one node), do we still want single-node behavior (and local ng_leaders)? Current check (ng_configs.size()==1 && size()==1) treats that as multi-node, which may be intended. Please confirm. If you do want to treat “all shards on one host” as single-node, compute total_nodes across NGs instead.

Example tweak (only if desired):

-        is_single_node=
-            (ng_configs.size() == 1 && ng_configs.begin()->second.size() == 1);
+        size_t total_nodes = 0;
+        for (const auto& ng : ng_configs) total_nodes += ng.second.size();
+        is_single_node = (total_nodes == 1);

Also applies to: 2748-2824

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c3bc1c and 60515d4.

📒 Files selected for processing (3)
  • storage/eloq/ha_eloq.cc (5 hunks)
  • storage/eloq/store_handler (1 hunks)
  • storage/eloq/tx_service (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • storage/eloq/tx_service
  • storage/eloq/store_handler
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2571-2623
Timestamp: 2025-11-07T07:10:40.633Z
Learning: Repo: eloqdata/eloqsql PR: 154
File/Area: storage/eloq/ha_eloq.cc (eloq_init_func cluster membership resolution)
Learning: In this repo’s ng_config, the local node is guaranteed to be a candidate (NodeConfig::is_candidate_ == true). Therefore, assigning native_ng_id only when is_candidate_ is true is acceptable; the key improvement is to break early from both loops once the local node is found.
📚 Learning: 2025-11-07T07:10:40.633Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2571-2623
Timestamp: 2025-11-07T07:10:40.633Z
Learning: Repo: eloqdata/eloqsql PR: 154
File/Area: storage/eloq/ha_eloq.cc (eloq_init_func cluster membership resolution)
Learning: In this repo’s ng_config, the local node is guaranteed to be a candidate (NodeConfig::is_candidate_ == true). Therefore, assigning native_ng_id only when is_candidate_ is true is acceptable; the key improvement is to break early from both loops once the local node is found.

Applied to files:

  • storage/eloq/ha_eloq.cc
📚 Learning: 2025-11-07T05:12:43.208Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2770-2781
Timestamp: 2025-11-07T05:12:43.208Z
Learning: In eloqdata/eloqsql, storage/eloq/ha_eloq.cc (KV_ELOQDS path): for DATA_STORE_TYPE_ELOQDSS_ROCKSDB (local RocksDB), each eloqsql instance owns its own RocksDB instance, and passing create_if_missing=true to DataStoreService::StartService is intentional and safe.

Applied to files:

  • storage/eloq/ha_eloq.cc
📚 Learning: 2025-09-25T11:41:40.111Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 127
File: storage/eloq/build_log_service.cmake:97-124
Timestamp: 2025-09-25T11:41:40.111Z
Learning: The `storage/eloq/build_log_service.cmake` file is specifically for the open log service build and only supports ROCKSDB as its log state backend. Unlike the enterprise version in `build_eloq_log_service.cmake`, it does not need to handle cloud-specific variants like ROCKSDB_CLOUD_S3 or ROCKSDB_CLOUD_GCS.

Applied to files:

  • storage/eloq/ha_eloq.cc
🔇 Additional comments (3)
storage/eloq/ha_eloq.cc (3)

2301-2305: AWS SDK log level conditional — OK

Switching to Debug when DBUG is enabled and Info otherwise is sensible for build-time verbosity.


2833-2843: StartService flagging — matches storage type requirements

  • RocksDB path uses StartService(true) to allow create-if-missing; non-RocksDB uses (opt_bootstrap || is_single_node). This aligns with prior guidance for local RocksDB instances.

Based on learnings


2851-2854: DataStoreServiceClient ctor updated — verify all call sites

New signature with the bootstrap/single-node flag is used here. Please ensure all other bindings (incl. non-MySQL utilities/tests) were updated to the new ctor.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
storage/eloq/ha_eloq.cc (1)

2811-2824: Compute single‑node robustly across all NGs

Current check only inspects the first NG’s size. Prefer counting total nodes across all NGs to avoid edge cases (e.g., multiple NG entries but only one actual node).

-        is_single_node=
-            (ng_configs.size() == 1 && ng_configs.begin()->second.size() == 1);
+        size_t total_nodes = 0;
+        for (const auto &ng : ng_configs) total_nodes += ng.second.size();
+        is_single_node = (total_nodes == 1);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c3bc1c and 60515d4.

📒 Files selected for processing (3)
  • storage/eloq/ha_eloq.cc (5 hunks)
  • storage/eloq/store_handler (1 hunks)
  • storage/eloq/tx_service (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • storage/eloq/store_handler
  • storage/eloq/tx_service
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2571-2623
Timestamp: 2025-11-07T07:10:40.633Z
Learning: Repo: eloqdata/eloqsql PR: 154
File/Area: storage/eloq/ha_eloq.cc (eloq_init_func cluster membership resolution)
Learning: In this repo’s ng_config, the local node is guaranteed to be a candidate (NodeConfig::is_candidate_ == true). Therefore, assigning native_ng_id only when is_candidate_ is true is acceptable; the key improvement is to break early from both loops once the local node is found.
📚 Learning: 2025-11-07T07:10:40.633Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2571-2623
Timestamp: 2025-11-07T07:10:40.633Z
Learning: Repo: eloqdata/eloqsql PR: 154
File/Area: storage/eloq/ha_eloq.cc (eloq_init_func cluster membership resolution)
Learning: In this repo’s ng_config, the local node is guaranteed to be a candidate (NodeConfig::is_candidate_ == true). Therefore, assigning native_ng_id only when is_candidate_ is true is acceptable; the key improvement is to break early from both loops once the local node is found.

Applied to files:

  • storage/eloq/ha_eloq.cc
📚 Learning: 2025-11-07T05:12:43.208Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2770-2781
Timestamp: 2025-11-07T05:12:43.208Z
Learning: In eloqdata/eloqsql, storage/eloq/ha_eloq.cc (KV_ELOQDS path): for DATA_STORE_TYPE_ELOQDSS_ROCKSDB (local RocksDB), each eloqsql instance owns its own RocksDB instance, and passing create_if_missing=true to DataStoreService::StartService is intentional and safe.

Applied to files:

  • storage/eloq/ha_eloq.cc
📚 Learning: 2025-09-25T11:41:40.111Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 127
File: storage/eloq/build_log_service.cmake:97-124
Timestamp: 2025-09-25T11:41:40.111Z
Learning: The `storage/eloq/build_log_service.cmake` file is specifically for the open log service build and only supports ROCKSDB as its log state backend. Unlike the enterprise version in `build_eloq_log_service.cmake`, it does not need to handle cloud-specific variants like ROCKSDB_CLOUD_S3 or ROCKSDB_CLOUD_GCS.

Applied to files:

  • storage/eloq/ha_eloq.cc
🔇 Additional comments (4)
storage/eloq/ha_eloq.cc (4)

2301-2305: AWS SDK log level guarded by DBUG flag — looks good

Debug builds get verbose logs; release sticks to Info. No issues spotted.


2748-2809: Bootstrap ng_leaders derivation — logic looks correct

Parsing full topology, locating current node, and assigning all NG leaders to this node during bootstrap is sound. Early breaks after ‘found’ reduce scan cost.


2839-2842: StartService flags: RocksDB path set to true — OK; others use (bootstrap || single‑node)

Unconditionally passing true for DATA_STORE_TYPE_ELOQDSS_ROCKSDB to allow create_if_missing is consistent with repository guidance. Non‑RocksDB path now respects bootstrap/single‑node mode — good.

Based on learnings


2852-2853: Constructor parameter signature verified at single visible call site

The one DataStoreServiceClient constructor invocation found in the codebase (line 2851 in storage/eloq/ha_eloq.cc) already has the bool parameter (opt_bootstrap || is_single_node) positioned first, matching the expected signature. However, the class definition could not be located during verification, so manual confirmation that all call sites across the full codebase (including any in header files or conditionally compiled sections) are correctly updated is recommended.

Comment on lines 2728 to 2731
std::string dss_config_file_path= "";
EloqDS::DataStoreServiceClusterManager ds_config;
uint32_t dss_leader_id= EloqDS::UNKNOWN_DSS_LEADER_NODE_ID;

// use tx node id as the dss node id
// since they are deployed together
uint32_t dss_node_id= node_id;
if (opt_bootstrap || is_single_node)
{
dss_leader_id= node_id;
}
bool is_single_node= false;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bug: user-provided DSS config file path is ignored (shadowed variable)

Local std::string dss_config_file_path is hardcoded to "", shadowing the global sysvar eloq_dss_config_file_path. This drops any configured path at startup.

Apply this fix:

-    std::string dss_config_file_path= "";
+    // Honor the sysvar if set; otherwise leave empty.
+    std::string dss_config_file_path =
+        (eloq_dss_config_file_path && std::strlen(eloq_dss_config_file_path) > 0)
+            ? std::string(eloq_dss_config_file_path)
+            : std::string();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
std::string dss_config_file_path= "";
EloqDS::DataStoreServiceClusterManager ds_config;
uint32_t dss_leader_id= EloqDS::UNKNOWN_DSS_LEADER_NODE_ID;
// use tx node id as the dss node id
// since they are deployed together
uint32_t dss_node_id= node_id;
if (opt_bootstrap || is_single_node)
{
dss_leader_id= node_id;
}
bool is_single_node= false;
// Honor the sysvar if set; otherwise leave empty.
std::string dss_config_file_path =
(eloq_dss_config_file_path && std::strlen(eloq_dss_config_file_path) > 0)
? std::string(eloq_dss_config_file_path)
: std::string();
EloqDS::DataStoreServiceClusterManager ds_config;
bool is_single_node= false;
🤖 Prompt for AI Agents
In storage/eloq/ha_eloq.cc around lines 2728 to 2731, a local variable
std::string dss_config_file_path is initialized to "" which shadows the global
sysvar eloq_dss_config_file_path and drops any user-provided path; remove the
hardcoded local initialization and either use the global
eloq_dss_config_file_path directly or initialize the local variable from that
global (e.g., std::string dss_config_file_path = eloq_dss_config_file_path) so
the configured path is preserved, and ensure no other local variable with the
same name exists in this scope.

@lzxddz lzxddz force-pushed the dss_multi_shard-rebase branch 2 times, most recently from 25ec8be to 372f40a Compare November 14, 2025 09:46
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
storage/eloq/mysql-test/mono_multi/t/fault_inject_for_upsert_1.test (1)

1-1: Update comment to reflect node_id change.

The comment references "remote node 1" but line 19 now targets node_id=2. Update the comment to maintain consistency with the actual test behavior.

Apply this diff:

-# This case is to test what will happen if remote node 1 crash before it acquire and send message.
+# This case is to test what will happen if remote node 2 crash before it acquire and send message.
storage/eloq/mysql-test/mono_multi/t/multi_nodes.test (1)

48-48: Update comment to match the updated test scenario.

This comment states "The record is located on server 2" but line 41 indicates the row is on server-3, and the test now restarts server-3 (line 44). Update this comment to reflect the current test behavior.

Apply this diff:

-# The record is located on server 2, validation will be failed for node term changed after server being restarted.
+# The record is located on server-3, validation will be failed for node term changed after server being restarted.
♻️ Duplicate comments (1)
storage/eloq/ha_eloq.cc (1)

2728-2731: Bug: user-provided DSS config file path is still ignored (local var shadows sysvar).

Local std::string dss_config_file_path is set to "", discarding eloq_dss_config_file_path. Honor the sysvar.

Apply this fix:

-    std::string dss_config_file_path= "";
+    // Honor sysvar if set; otherwise leave empty.
+    std::string dss_config_file_path =
+        (eloq_dss_config_file_path && std::strlen(eloq_dss_config_file_path) > 0)
+            ? std::string(eloq_dss_config_file_path)
+            : std::string();
🧹 Nitpick comments (1)
storage/eloq/ha_eloq.cc (1)

2301-2305: Reasonable AWS SDK log-level guard; consider less verbosity in prod.

Debug in non-DBUG builds is avoided; for production you might prefer Warning to reduce noise and I/O. Optional only.

-  aws_options.loggingOptions.logLevel= Aws::Utils::Logging::LogLevel::Info;
+  aws_options.loggingOptions.logLevel= Aws::Utils::Logging::LogLevel::Warn;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60515d4 and 372f40a.

📒 Files selected for processing (7)
  • storage/eloq/ha_eloq.cc (5 hunks)
  • storage/eloq/mysql-test/mono_multi/r/fault_inject_for_upsert_1.result (1 hunks)
  • storage/eloq/mysql-test/mono_multi/r/multi_nodes.result (1 hunks)
  • storage/eloq/mysql-test/mono_multi/t/fault_inject_for_upsert_1.test (1 hunks)
  • storage/eloq/mysql-test/mono_multi/t/multi_nodes.test (1 hunks)
  • storage/eloq/store_handler (1 hunks)
  • storage/eloq/tx_service (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • storage/eloq/store_handler
🚧 Files skipped from review as they are similar to previous changes (1)
  • storage/eloq/tx_service
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2571-2623
Timestamp: 2025-11-07T07:10:40.633Z
Learning: Repo: eloqdata/eloqsql PR: 154
File/Area: storage/eloq/ha_eloq.cc (eloq_init_func cluster membership resolution)
Learning: In this repo’s ng_config, the local node is guaranteed to be a candidate (NodeConfig::is_candidate_ == true). Therefore, assigning native_ng_id only when is_candidate_ is true is acceptable; the key improvement is to break early from both loops once the local node is found.
📚 Learning: 2025-11-07T07:10:40.633Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2571-2623
Timestamp: 2025-11-07T07:10:40.633Z
Learning: Repo: eloqdata/eloqsql PR: 154
File/Area: storage/eloq/ha_eloq.cc (eloq_init_func cluster membership resolution)
Learning: In this repo’s ng_config, the local node is guaranteed to be a candidate (NodeConfig::is_candidate_ == true). Therefore, assigning native_ng_id only when is_candidate_ is true is acceptable; the key improvement is to break early from both loops once the local node is found.

Applied to files:

  • storage/eloq/mysql-test/mono_multi/r/multi_nodes.result
  • storage/eloq/ha_eloq.cc
📚 Learning: 2025-11-07T05:12:43.208Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2770-2781
Timestamp: 2025-11-07T05:12:43.208Z
Learning: In eloqdata/eloqsql, storage/eloq/ha_eloq.cc (KV_ELOQDS path): for DATA_STORE_TYPE_ELOQDSS_ROCKSDB (local RocksDB), each eloqsql instance owns its own RocksDB instance, and passing create_if_missing=true to DataStoreService::StartService is intentional and safe.

Applied to files:

  • storage/eloq/ha_eloq.cc
📚 Learning: 2025-09-25T11:41:40.111Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 127
File: storage/eloq/build_log_service.cmake:97-124
Timestamp: 2025-09-25T11:41:40.111Z
Learning: The `storage/eloq/build_log_service.cmake` file is specifically for the open log service build and only supports ROCKSDB as its log state backend. Unlike the enterprise version in `build_eloq_log_service.cmake`, it does not need to handle cloud-specific variants like ROCKSDB_CLOUD_S3 or ROCKSDB_CLOUD_GCS.

Applied to files:

  • storage/eloq/ha_eloq.cc
🔇 Additional comments (7)
storage/eloq/mysql-test/mono_multi/r/multi_nodes.result (1)

24-24: LGTM! Result file correctly reflects test update.

The connection change from serv_2 to serv_3 aligns with the corresponding test file modification and reflects the updated node targeting behavior for the restart scenario.

storage/eloq/mysql-test/mono_multi/t/fault_inject_for_upsert_1.test (1)

19-21: LGTM! Node targeting changes are consistent.

The update from node_id=1 to node_id=2 and the corresponding restart expectation change from mysqld.2.expect to mysqld.3.expect are consistent with the comment on line 18 indicating the row will be on "server3(ng2)".

storage/eloq/mysql-test/mono_multi/r/fault_inject_for_upsert_1.result (1)

9-9: LGTM! Result file correctly reflects test update.

The debug flag change to node_id=2 correctly mirrors the corresponding test file modification.

storage/eloq/mysql-test/mono_multi/t/multi_nodes.test (1)

41-44: LGTM! Comments and code changes are consistent.

The updates correctly reflect the shift to targeting server-3 for the restart scenario. The comments clearly explain the test behavior: the row is located on server-3, which is then restarted to trigger the node term change.

storage/eloq/ha_eloq.cc (3)

2750-2826: Single-node heuristic and ng_leaders usage now correct.

You recompute is_single_node from ng_configs and only populate ng_leaders for bootstrap/single-node paths; bootstrap builds tmp_ng_configs and maps all NGs to this node. This fixes the earlier “forced local-leader” issue. Looks good.

  • Please run a multi-NG, multi-node smoke test (bootstrap=false) and confirm DSS shards map to the correct leaders from Tx topology (empty ng_leaders map path).
  • Minor: after finding the local node (Lines 2600-2618), consider breaking outer loop too to avoid extra scans. Based on learnings.

2837-2844: RocksDB create_if_missing and single-node StartService flag usage

  • RocksDB: StartService(true) is intentional for local RocksDB instances. Good. As per learnings.
  • Other backends: gating StartService with (opt_bootstrap || is_single_node) matches the new topology logic.

Based on learnings


2854-2856: The DataStoreServiceClient class definition cannot be found in the repository. Without access to the constructor signature, I cannot verify whether the arguments match the expected types and order.

Verify DataStoreServiceClient constructor signature exists in the codebase or external dependency.

The class is instantiated at lines 2854-2856 with five arguments: (opt_bootstrap || is_single_node), catalog_factory, ds_config, ds_peer_node.empty(), and data_store_service_.get(). The constructor definition is not present in this repository, so please:

  1. Confirm the class definition and its constructor signature
  2. Verify the argument types and order match the call site
  3. Confirm that ds_peer_node.empty() correctly represents "has local DSS bound" semantics

@lzxddz lzxddz force-pushed the dss_multi_shard-rebase branch from 372f40a to c38de00 Compare November 17, 2025 09:22
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
storage/eloq/ha_eloq.cc (1)

2516-2554: Nit: hm_port may be uninitialized

If eloq_hm_ip lacks a port and FORK_HM_PROCESS is not defined, hm_port isn’t set before use. Initialize it to a safe default.

-  uint16_t hm_port;
+  uint16_t hm_port = 0;
♻️ Duplicate comments (1)
storage/eloq/ha_eloq.cc (1)

2728-2731: Bug: user-provided DSS config path is ignored (local var shadows sysvar)

dss_config_file_path is hardcoded to "", dropping eloq_dss_config_file_path.

Apply this fix:

-    std::string dss_config_file_path= "";
+    // Honor the sysvar if set; otherwise leave empty.
+    std::string dss_config_file_path =
+        (eloq_dss_config_file_path && std::strlen(eloq_dss_config_file_path) > 0)
+            ? std::string(eloq_dss_config_file_path)
+            : std::string();

Also ensure is available for std::strlen if not already included:

// near other includes
#include <cstring>
🧹 Nitpick comments (1)
storage/eloq/ha_eloq.cc (1)

2599-2618: Minor: break outer loop after finding local node

After setting node_id/native_ng_id, break the outer NG loop to avoid redundant scanning and accidental overrides.

-  for (auto &pair : ng_configs)
+  for (auto &pair : ng_configs)
   {
     auto &ng_nodes= pair.second;
     for (size_t i= 0; i < ng_nodes.size(); i++)
     {
       if (ng_nodes[i].host_name_ == local_ip &&
           ng_nodes[i].port_ == local_port)
       {
         node_id= ng_nodes[i].node_id_;
         found= true;
         if (ng_nodes[i].is_candidate_)
         {
           native_ng_id= pair.first;
-          break;
+          break;
         }
       }
     }
+    if (found) break;
   }

Based on learnings

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 372f40a and c38de00.

📒 Files selected for processing (7)
  • storage/eloq/ha_eloq.cc (5 hunks)
  • storage/eloq/mysql-test/mono_multi/r/fault_inject_for_upsert_1.result (1 hunks)
  • storage/eloq/mysql-test/mono_multi/r/multi_nodes.result (1 hunks)
  • storage/eloq/mysql-test/mono_multi/t/fault_inject_for_upsert_1.test (1 hunks)
  • storage/eloq/mysql-test/mono_multi/t/multi_nodes.test (1 hunks)
  • storage/eloq/store_handler (1 hunks)
  • storage/eloq/tx_service (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • storage/eloq/mysql-test/mono_multi/t/multi_nodes.test
  • storage/eloq/store_handler
  • storage/eloq/mysql-test/mono_multi/r/fault_inject_for_upsert_1.result
  • storage/eloq/mysql-test/mono_multi/t/fault_inject_for_upsert_1.test
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2571-2623
Timestamp: 2025-11-07T07:10:40.633Z
Learning: Repo: eloqdata/eloqsql PR: 154
File/Area: storage/eloq/ha_eloq.cc (eloq_init_func cluster membership resolution)
Learning: In this repo’s ng_config, the local node is guaranteed to be a candidate (NodeConfig::is_candidate_ == true). Therefore, assigning native_ng_id only when is_candidate_ is true is acceptable; the key improvement is to break early from both loops once the local node is found.
📚 Learning: 2025-11-07T07:10:40.633Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2571-2623
Timestamp: 2025-11-07T07:10:40.633Z
Learning: Repo: eloqdata/eloqsql PR: 154
File/Area: storage/eloq/ha_eloq.cc (eloq_init_func cluster membership resolution)
Learning: In this repo’s ng_config, the local node is guaranteed to be a candidate (NodeConfig::is_candidate_ == true). Therefore, assigning native_ng_id only when is_candidate_ is true is acceptable; the key improvement is to break early from both loops once the local node is found.

Applied to files:

  • storage/eloq/mysql-test/mono_multi/r/multi_nodes.result
  • storage/eloq/ha_eloq.cc
📚 Learning: 2025-09-25T11:41:40.111Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 127
File: storage/eloq/build_log_service.cmake:97-124
Timestamp: 2025-09-25T11:41:40.111Z
Learning: The `storage/eloq/build_log_service.cmake` file is specifically for the open log service build and only supports ROCKSDB as its log state backend. Unlike the enterprise version in `build_eloq_log_service.cmake`, it does not need to handle cloud-specific variants like ROCKSDB_CLOUD_S3 or ROCKSDB_CLOUD_GCS.

Applied to files:

  • storage/eloq/ha_eloq.cc
📚 Learning: 2025-11-07T05:12:43.208Z
Learnt from: githubzilla
Repo: eloqdata/eloqsql PR: 154
File: storage/eloq/ha_eloq.cc:2770-2781
Timestamp: 2025-11-07T05:12:43.208Z
Learning: In eloqdata/eloqsql, storage/eloq/ha_eloq.cc (KV_ELOQDS path): for DATA_STORE_TYPE_ELOQDSS_ROCKSDB (local RocksDB), each eloqsql instance owns its own RocksDB instance, and passing create_if_missing=true to DataStoreService::StartService is intentional and safe.

Applied to files:

  • storage/eloq/ha_eloq.cc
🔇 Additional comments (6)
storage/eloq/mysql-test/mono_multi/r/multi_nodes.result (1)

24-24: Connection change is properly configured and verified.

The change from serv_2 to serv_3 is correct and properly verified:

  • The corresponding .test file (line 44) contains the matching --connection serv_3; command
  • serv_3 is dynamically created by mono_init.inc when $mono_server_count= 3 is set in the test file (lines 10-11)
  • This follows the established pattern used throughout the test suite
storage/eloq/tx_service (1)

1-1: Submodule changes verified and aligned with multi-shard support requirements.

The integration of the updated submodule pointer is correct:

  • TxConfigsToDssClusterConfig (lines 2809-2810, 2824-2825): Correctly called with signature (node_id, ng_configs, ng_leaders, ds_config)
  • StartService (lines 2841, 2843): Correctly receives bootstrap/single-node flag as boolean parameter ✓
  • Multi-shard configuration: ng_configs and ng_leaders structures properly support node-centric multi-shard DSS deployment ✓

The parent code implementation confirms the submodule changes support the stated objectives.

storage/eloq/ha_eloq.cc (4)

2301-2305: AWS SDK log-level guard looks good

Sets Debug with asserts-on builds and Info otherwise. No action needed.


2750-2831: is_single_node recompute/leader mapping: logic OK; verify ds_peer path

The bootstrap and non-bootstrap branches recompute is_single_node and only fill ng_leaders for bootstrap or single-node. Good. In the ds_peer_node path, is_single_node is derived from ng_configs (TX topology), not from the fetched ds_config; please verify multi-NG/multi-node deployments with a peer-provided config don’t incorrectly enter single-node mode.

Would you confirm with a quick run that:

  • ds_peer_node set + ng_configs spanning >1 NG or >1 node → is_single_node is false, ng_leaders remains empty, and StartService(false) is used?

2837-2844: StartService flags match storage type intent

  • RocksDB (local) forced true is correct (create_if_missing semantics).
  • Others use (opt_bootstrap || is_single_node), which aligns with the new cluster config flow.

Based on learnings


2854-2856: Manual verification required: DataStoreServiceClient constructor definition not found in codebase

The review comment asks to verify the parameter order/semantics of DataStoreServiceClient constructor, specifically whether the 4th argument (ds_peer_node.empty()) correctly maps to its intended parameter meaning.

The call site at storage/eloq/ha_eloq.cc:2853-2855 passes:

  • Arg 1: (opt_bootstrap || is_single_node) (bool)
  • Arg 2: catalog_factory
  • Arg 3: ds_config
  • Arg 4: ds_peer_node.empty() (bool)
  • Arg 5: data_store_service_.get() (DataStoreService*)

However, the DataStoreServiceClient class definition could not be located in the accessible codebase, despite the include directive pointing to store_handler/data_store_service_client.h. The class definition may be in an uncloned submodule, a build artifact, or external dependency.

Please confirm:

  1. The constructor parameter order matches the call site arguments
  2. The 4th parameter semantics (appears to mean "is peer node empty" → likely represents whether to register locally-only)
  3. All other call sites in the codebase are similarly updated

@lzxddz lzxddz requested a review from githubzilla November 17, 2025 10:52
@lzxddz lzxddz force-pushed the dss_multi_shard-rebase branch from bd93901 to acca682 Compare November 18, 2025 08:06
dss client add arg bind_data_shard_with_ng

update mono_multi test cases for the node of key located on changed
@lzxddz lzxddz force-pushed the dss_multi_shard-rebase branch from acca682 to 522adc8 Compare November 21, 2025 03:32
@lzxddz lzxddz merged commit 0405b32 into eloqdata:eloq-10.6.10 Nov 21, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants