Skip to content

Fix build#131

Merged
lokax merged 3 commits intoeloqdata:eloq-10.6.10from
lokax:hash-part-scan
Nov 3, 2025
Merged

Fix build#131
lokax merged 3 commits intoeloqdata:eloq-10.6.10from
lokax:hash-part-scan

Conversation

@lokax
Copy link
Collaborator

@lokax lokax commented Oct 9, 2025

Summary by CodeRabbit

  • Refactor

    • Improved bucketed scan replay with stronger state management, resumable plan-driven progression, and updated resume/yield behavior
    • Swapped the range-partition scanner implementation to alter scanning behavior for range queries
    • Harmonized public API types for pushdown condition handling for more consistent type scoping
  • Chores

    • Updated storage-related submodule pointers

@coderabbitai
Copy link

coderabbitai bot commented Oct 9, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Replaces the CCM scanner used for range partitions with HashParitionCcScanner; refactors bucketed scan replay to use BucketScanSavePoint and a plan-driven loop; moves several DataStore types from txservice::store to txservice; adds sys/types.h; advances two storage submodule pointers.

Changes

Cohort / File(s) Summary
Scanner Type Update
storage/eloq/eloq_catalog_factory.cpp
Replaces TemplateCcScanner<EloqKey, txservice::RangeRecord> with HashParitionCcScanner<EloqKey, txservice::RangeRecord> in CreateRangeCcmScanner.
Bucketed Scan Replay & Pushdown API
storage/eloq/ha_eloq.cc, storage/eloq/ha_eloq.h
Adds #include <sys/types.h>; introduces BucketScanSavePoint and converts replay to a plan-driven loop using plan size/current index; ScanOpenTxRequest/ScanBatchTxRequest updated to accept savepoint/plan context; replaces txservice::store::DataStoreSearchCond/txservice::store::DataStoreDataType with txservice::DataStoreSearchCond/txservice::DataStoreDataType; updates BindPushedCond and PushedConditionString signatures/usages.
Submodule Pointer Advances
storage/eloq/store_handler, storage/eloq/tx_service
Advances submodule commits (store_handler, tx_service); no direct source-code logic changes in this diff.

Sequence Diagram(s)

sequenceDiagram
    participant Worker as Scan Worker
    participant SavePoint as BucketScanSavePoint
    participant Plan as BucketScanPlan
    participant Batch as ScanBatchTxRequest
    participant Scanner as CCM Scanner\n(HashParitionCcScanner)

    rect rgba(100,150,220,0.06)
      Note over Worker,SavePoint: initialize save point & plan
      Worker->>SavePoint: create & pin save_point
      SavePoint->>Plan: expose plan and currentIndex
    end

    rect rgba(120,200,150,0.06)
      loop while currentIndex < Plan.size()
        Worker->>Batch: submit batch (with plan ref & currentIndex)
        Batch->>Scanner: scan partition(s)
        Scanner-->>Batch: results / progress
        Batch-->>Worker: status (progress / yield)
        alt progress
          Worker->>SavePoint: advance currentIndex
        else yield/no progress
          Worker->>SavePoint: persist position (yield)
        end
      end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Focus review on:
    • storage/eloq/ha_eloq.cc — correctness of plan-driven loop, savepoint lifecycle, resume/yield transitions.
    • API/type migrations from txservice::store to txservice — ensure all callers updated.
    • storage/eloq/eloq_catalog_factory.cpp — validate scanner substitution and constructor parameters.

Possibly related PRs

Suggested labels

trigger-ci

Suggested reviewers

  • xiexiaoy
  • MrGuin

Poem

🐰 I hop where scanners learn to hash,
Savepoints pin each careful pass,
Types shed their prefixes, tidy and light,
Plans guide the loop from dawn to night,
Submodules roll forward — a sprightly little dash!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title Check ❓ Inconclusive The pull request title "Fix build" is extremely vague and generic, failing to convey meaningful information about the changeset. While the changes likely address build-related issues (evidenced by the API refactoring, namespace updates, and submodule pointer changes), the title does not communicate which changes were made or why. A teammate scanning PR history would have no clear understanding of the actual modifications, such as the scanner type replacement, API signature changes, or scan replay rework that constitute the main content of this PR. Consider revising the title to be more specific about the primary changes, such as "Refactor pushdown condition types and update scanner implementation" or "Update txservice namespace for DataStoreSearchCond and related types," which would better communicate the actual scope and intent of the changes to reviewers and future developers.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6b1767b and 8b3f3f8.

📒 Files selected for processing (2)
  • storage/eloq/store_handler (1 hunks)
  • storage/eloq/tx_service (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
storage/eloq/ha_eloq.cc (2)

4876-4883: Potential tight loop; clarify ScanBatchTxRequest::Result() semantics and add progress guard

If Result() returns false and no tuples are returned for a plan, current_index never advances, causing a spin. Please confirm that tx side guarantees progress for the same plan (e.g., save_point advances internally). If not, add a guard to break or back off when no progress is observed.

Example minimal guard:

  • Track last_seen_count; if scan_batch.empty() for N consecutive iterations with same current_index, break and log.

Also, consider reading prev_pause_idx_ semantics: if it refers to the “next” plan index, you might want to start from prev_pause_idx_ + 1 to avoid re-scanning.

I recommend:

  • Document Result() meaning in a comment.
  • Add a no-progress safeguard.

Also applies to: 4903-4924


7331-7506: Pushdown type refactor LGTM; small fixes to improve robustness

  • Good: Switch to txservice::DataStoreSearchCond and DataStoreDataType aligns with new API.
  • Suggest: Expand literal type check to include MYSQL_TYPE_VAR_STRING so varchar constants don’t get skipped.
  • Bug: assert(true && "Unsupported cmp function.") never fires. Use assert(false && ...) or remove and handle via default.

Diff sketch:

-      pushdown= (val_field->field_type() == MYSQL_TYPE_STRING ||
-                 val_field->field_type() == MYSQL_TYPE_VARCHAR)
+      pushdown= (val_field->field_type() == MYSQL_TYPE_STRING ||
+                 val_field->field_type() == MYSQL_TYPE_VARCHAR ||
+                 val_field->field_type() == MYSQL_TYPE_VAR_STRING)
                  ? true
                  : false;
...
-    default:
-      assert(true && "Unsupported cmp function.");
-      break;
+    default:
+      assert(false && "Unsupported cmp function.");
+      break;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 211f9e9 and 50f742e.

📒 Files selected for processing (3)
  • storage/eloq/eloq_catalog_factory.cpp (1 hunks)
  • storage/eloq/ha_eloq.cc (12 hunks)
  • storage/eloq/ha_eloq.h (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
storage/eloq/ha_eloq.h (1)
storage/eloq/ha_eloq.cc (2)
  • BindPushedCond (7331-7506)
  • BindPushedCond (7331-7331)
🔇 Additional comments (4)
storage/eloq/ha_eloq.h (1)

669-669: Approve namespace simplification
No remaining references to store::DataStoreSearchCond or store::DataStoreDataType found.

storage/eloq/eloq_catalog_factory.cpp (1)

687-687: Verify spelling of HashParitionCcScanner: No local definition was found in this codebase. Confirm the correct class name in the txservice API (likely HashPartitionCcScanner) and update this reference at line 687 accordingly.

storage/eloq/ha_eloq.cc (2)

119-119: Include sys/types.h is correct

Needed for ssize_t (e.g., readlink usage). No issues.


7509-7572: PushedConditionString enum switch updated correctly

String quoting, blob hex, and numeric passthrough look consistent with new txservice::DataStoreDataType values.

@lokax lokax force-pushed the hash-part-scan branch 2 times, most recently from 5294331 to db7af96 Compare October 14, 2025 08:07
@lokax lokax force-pushed the hash-part-scan branch 2 times, most recently from ff944b9 to d507e74 Compare October 21, 2025 03:37
@lokax lokax force-pushed the hash-part-scan branch 2 times, most recently from 0ec099a to 3a3352b Compare October 28, 2025 02:53
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
storage/eloq/ha_eloq.cc (1)

4870-4895: Fix OOB: PickPlan must occur after bounds check

You still call save_point->PickPlan(current_index) before verifying current_index < plan_size. If prev_pause_idx_ equals plan_size, this dereferences past end. Move PickPlan inside the while and stop re-picking within the body.

Apply this diff:

-    size_t plan_size= save_point->PlanSize();
-    BucketScanPlan plan= save_point->PickPlan(current_index);
-    while (current_index < plan_size)
+    size_t plan_size= save_point->PlanSize();
+    while (current_index < plan_size)
     {
+      BucketScanPlan plan= save_point->PickPlan(current_index);
       scan_batch.clear();
       ScanBatchTxRequest scan_batch_req(scan_alias, scan_table_name,
                                         &scan_batch, yield_func, resume_func,
-                                        txm, -1, {}, &plan);
+                                        txm, -1, {}, &plan);
       txm->Execute(&scan_batch_req);
       scan_batch_req.Wait();
       if (scan_batch_req.IsError())
       {
         ret= HA_ADMIN_FAILED;
         break;
       }
-
-      if (scan_batch_req.Result())
-      {
-        current_index++;
-        if (current_index < plan_size)
-        {
-          plan= save_point->PickPlan(current_index);
-        }
-      }
+      if (scan_batch_req.Result()) {
+        current_index++;
+      }
     }
🧹 Nitpick comments (1)
storage/eloq/ha_eloq.cc (1)

7303-7478: Pushdown safety and small robustness tweaks in BindPushedCond

  • Skip NULL constants to avoid meaningless “col op NULL” pushdowns.
  • For VARCHAR/TEXT with '=' you switch to '>=' only; add an upper bound (“< nextPrefix(val)”) to avoid scanning supersets while keeping correctness.
  • Minor: reserve result capacity.

Minimal changes:

 std::vector<txservice::DataStoreSearchCond> ha_eloq::BindPushedCond()
 {
+  // Reserve to reduce reallocs
+  std::vector<txservice::DataStoreSearchCond> res;
+  res.reserve(pushed_conds_.size());
@@
     Item_field *col_field= pushed_cond.col_field_;
     Item_literal *val_field= pushed_cond.val_field_;
+    // Do not push down comparisons with NULL
+    if (val_field->null_value) {
+      continue;
+    }
@@
-    switch (func_type)
+    switch (func_type)
     {
     case Item_func::EQ_FUNC:
       op.append("=");
       break;
@@
     }
@@
     switch (field_type.data_type_)
     {
@@
     default: {
       StringBuffer<MAX_FIELD_WIDTH> str;
       field->val_str(&str);
       val_str.append((char *) str.ptr(), str.length());
       // VARCHAR/TEXT type
       if (field_type.data_type_ == EloqDataType::VarString)
       {
         // Remove trailing space.
         size_t endpos= val_str.find_last_not_of(' ');
         if (endpos != std::string::npos && endpos < val_str.length() - 1)
         {
           val_str.replace((endpos + 1), (val_str.length() - endpos - 1), "");
         }
-        // Fetch values from kv without regard for trailing sapces.
-        if (!op.compare("="))
-        {
-          op.assign(">=");
-        }
+        // For '=' push both lower and upper bounds to avoid supersets:
+        // >= trimmed and < next prefix
+        if (op == "=") {
+          // Lower bound (>=)
+          res.push_back({field_name, std::string(">="), val_str, data_type});
+          // Upper bound: next lexicographic prefix
+          std::string hi = val_str;
+          hi.push_back('\xFF');
+          res.push_back({field_name, std::string("<"), hi, data_type});
+          // Restore original field and continue (already pushed both)
+          field->move_field(table->record[0] + field_offset,
+                            maybe_null ? table->record[0] + null_offset : nullptr,
+                            field->null_bit);
+          continue;
+        }
       }
       break;
     }
     }
@@
-    res.push_back({field_name, op, val_str, data_type});
+    res.push_back({field_name, op, val_str, data_type});

Note: nextPrefix via appending 0xFF is acceptable under binary collations used for pushdown here. If non-binary collations are later allowed, use the key-def ceiling helper to compute a safe bound.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17760cf and 3a3352b.

📒 Files selected for processing (5)
  • storage/eloq/eloq_catalog_factory.cpp (1 hunks)
  • storage/eloq/ha_eloq.cc (16 hunks)
  • storage/eloq/ha_eloq.h (2 hunks)
  • storage/eloq/store_handler (1 hunks)
  • storage/eloq/tx_service (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • storage/eloq/store_handler
🧰 Additional context used
🧬 Code graph analysis (1)
storage/eloq/ha_eloq.h (1)
storage/eloq/ha_eloq.cc (2)
  • BindPushedCond (7303-7478)
  • BindPushedCond (7303-7303)
🔇 Additional comments (6)
storage/eloq/tx_service (1)

1-1: Verify submodule pointer updates and provide build fix rationale.

Two concerns prevent approval:

  1. Unverifiable commits: The old tx_service commit (0a9fe67a54) returns "not our ref" error, suggesting it may have been deleted or force-pushed. The new commit (eb799bb36cc5169d4961e6e5c94b410fc983c665) cannot be independently verified as existing in the remote repository.

  2. Missing context: The commit message "update submodule" provides no explanation of what specific build issue is being fixed. No code changes in the main repository clarify the rationale.

Before merging, confirm:

  • Both submodule commits exist and are stable in their remote repositories
  • What build failure this PR resolves and that the fix has been tested
  • Whether the "not our ref" error on the old commit indicates intentional history changes
storage/eloq/ha_eloq.h (2)

669-669: LGTM: Namespace change for DataStoreSearchCond

The parameter type update correctly reflects the move of DataStoreSearchCond from the txservice::store:: namespace to the top-level txservice:: namespace. This change is consistent with the implementation in ha_eloq.cc.


746-746: LGTM: Namespace change for BindPushedCond return type

The return type update correctly reflects the move of DataStoreSearchCond from the txservice::store:: namespace to the top-level txservice:: namespace. This aligns with the function implementation and the broader namespace refactoring.

storage/eloq/eloq_catalog_factory.cpp (1)

686-692: Verify class name against txservice library documentation or build output

The class name HashParitionCcScanner appears to have a spelling inconsistency ("Parition" vs. "Partition"), but the class definition is not found in the accessible codebase—it comes from the external txservice library. The typo claim cannot be verified without access to the txservice library source or documentation. If the class name is indeed incorrect, the compilation will fail. Verify the correct spelling in txservice headers or build logs before proceeding.

storage/eloq/ha_eloq.cc (2)

119-119: LGTM: portability and guard cleanups

sys/types.h include and the conditional LOG/DS macro guards look fine; aws_deinit moved under matching compile-time guards in deinit is correct.

Also applies to: 169-172, 216-219, 226-229, 3304-3307


7481-7544: String/blob rendering for diagnostics looks correct

Switching on txservice::DataStoreDataType with proper escaping/hex formatting is sound. No issues.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
storage/eloq/ha_eloq.cc (1)

7303-7317: Pushdown type migration LGTM; fix default branch to avoid empty operator

Type shift to txservice::DataStoreSearchCond/DataStoreDataType looks correct. However, the default in the func_type switch uses assert(true), which never trips and may leave op empty.

Apply:

-    default:
-      assert(true && "Unsupported cmp function.");
-      break;
+    default:
+      DBUG_ASSERT(false && "Unsupported cmp function.");
+      continue; // skip unsupported comparator

Also applies to: 7328-7360, 7395-7397

♻️ Duplicate comments (1)
storage/eloq/ha_eloq.cc (1)

4870-4872: Critical: PickPlan before bounds check can read past end (still present)

If current_index == plan_size (e.g., prev_pause_idx_ equals PlanSize), PickPlan(current_index) dereferences out of range. Move PickPlan inside the while after verifying bounds. Also drop the mid-loop reassignment.

Apply:

-    size_t plan_size= save_point->PlanSize();
-    BucketScanPlan plan= save_point->PickPlan(current_index);
-    while (current_index < plan_size)
+    size_t plan_size= save_point->PlanSize();
+    while (current_index < plan_size)
     {
+      BucketScanPlan plan = save_point->PickPlan(current_index);
       scan_batch.clear();
       ...
-      if (scan_batch_req.Result())
-      {
-        current_index++;
-        if (current_index < plan_size)
-        {
-          plan= save_point->PickPlan(current_index);
-        }
-      }
+      if (scan_batch_req.Result()) {
+        current_index++;
+      }
     }

Also applies to: 4889-4893

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 305f631 and 9674aa1.

📒 Files selected for processing (5)
  • storage/eloq/eloq_catalog_factory.cpp (1 hunks)
  • storage/eloq/ha_eloq.cc (16 hunks)
  • storage/eloq/ha_eloq.h (2 hunks)
  • storage/eloq/store_handler (1 hunks)
  • storage/eloq/tx_service (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • storage/eloq/store_handler
🚧 Files skipped from review as they are similar to previous changes (2)
  • storage/eloq/ha_eloq.h
  • storage/eloq/eloq_catalog_factory.cpp
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-25T11:41:40.111Z
Learnt from: githubzilla
PR: eloqdata/eloqsql#127
File: storage/eloq/build_log_service.cmake:97-124
Timestamp: 2025-09-25T11:41:40.111Z
Learning: The `storage/eloq/build_log_service.cmake` file is specifically for the open log service build and only supports ROCKSDB as its log state backend. Unlike the enterprise version in `build_eloq_log_service.cmake`, it does not need to handle cloud-specific variants like ROCKSDB_CLOUD_S3 or ROCKSDB_CLOUD_GCS.

Applied to files:

  • storage/eloq/tx_service
  • storage/eloq/ha_eloq.cc
📚 Learning: 2025-09-25T11:33:33.221Z
Learnt from: githubzilla
PR: eloqdata/eloqsql#127
File: concourse/scripts/build_tarball_open.bash:136-136
Timestamp: 2025-09-25T11:33:33.221Z
Learning: The open log service (OPEN_LOG_SERVICE=ON) in concourse/scripts/build_tarball_open.bash only supports ROCKSDB as its log state and does not use the WITH_LOG_STATE parameter that was introduced to replace USE_ROCKSDB_LOG_STATE in other parts of the codebase.

Applied to files:

  • storage/eloq/ha_eloq.cc
🔇 Additional comments (6)
storage/eloq/tx_service (1)

1-1: Verify submodule compatibility — manual review of commits required.

The submodule update cannot be directly verified in the sandbox environment (repository access limitations). However, the parent repository extensively references txservice:: types (30+ occurrences across type definitions, method signatures, and data structures), indicating the update is intentionally aligned with namespace restructuring and type migration mentioned in the PR summary.

Recommend manually verifying:

  • The commits included in the submodule pointer advancement contain the necessary txservice:: type definitions
  • No breaking changes to type signatures or public APIs referenced in parent repo (e.g., txservice::DataStoreSearchCond, txservice::TransactionExecution)
  • The store::DataStoreScanner usage at line 798 of storage/eloq/ha_eloq.h remains compatible with the updated submodule
storage/eloq/ha_eloq.cc (5)

119-119: Good fix: add sys/types.h for ssize_t

Needed for readlink/ssize_t. No concerns.


169-172: Cloud DS guard is clearer and safer

Defining ELOQDS_RKDB_CLOUD when S3/GCS variants are set improves readability.

Please build once with each DS combo (ROCKSDB_CLOUD_S3, ROCKSDB_CLOUD_GCS, ROCKSDB, ELOQSTORE) to ensure no conflicting defines.


216-219: Deriving LOG_STATE_TYPE_RKDB_ from S3/GCS/RKDB defines*

These guards make downstream includes (#if defined(LOG_STATE_TYPE_RKDB_CLOUD/ALL)) robust.

Sanity-check compile with OPEN_LOG_SERVICE=ON/OFF and each LOG_STATE variant to confirm only intended branches activate.

Also applies to: 226-229


3304-3307: Pair aws_deinit with init paths

Symmetric shutdown avoids SDK leaks. Looks good.


7481-7481: Stringification update matches new DataStoreDataType

Covers String/Blob/Numeric correctly, with escaping/hex handling.

Confirm no callers rely on the old store:: namespace types.

Also applies to: 7498-7535

}
}

scan_batch.clear();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Likely lock leak: clearing scan_batch before ScanCloseTxRequest

You pass scan_batch to ScanCloseTxRequest, but clear it immediately before. This likely drops the last batch’s unlock metadata.

Apply:

-    scan_batch.clear();
+    /* keep the last batch for ScanCloseTxRequest to unlock */
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
scan_batch.clear();
/* keep the last batch for ScanCloseTxRequest to unlock */
🤖 Prompt for AI Agents
In storage/eloq/ha_eloq.cc around line 4896, clearing scan_batch before calling
ScanCloseTxRequest risks dropping the last batch’s unlock metadata and causing a
lock leak; move the scan_batch.clear() so it executes after ScanCloseTxRequest
(or otherwise preserve the batch contents until the call completes) so that
ScanCloseTxRequest receives the full batch data before any clearing.

Copy link
Collaborator

@yi-xmu yi-xmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@lokax lokax removed the trigger-ci label Nov 3, 2025
@lokax lokax merged commit 4d659e6 into eloqdata:eloq-10.6.10 Nov 3, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants