Skip to content

chore: update data_substrate — fix BackFill dirty_data_key_count_ underflow#458

Open
githubzilla wants to merge 1 commit intoeloqdata:mainfrom
githubzilla:fix/backfill-dirty-key-count-underflow
Open

chore: update data_substrate — fix BackFill dirty_data_key_count_ underflow#458
githubzilla wants to merge 1 commit intoeloqdata:mainfrom
githubzilla:fix/backfill-dirty-key-count-underflow

Conversation

@githubzilla
Copy link
Collaborator

@githubzilla githubzilla commented Mar 18, 2026

Summary

Update data_substrate submodule to include the fix for an intermittent assertion failure during cluster scale-out.

Problem

test_add_node_with_double_write intermittently crashes with:

eloqkv: cc_shard.cpp:428: void txservice::CcShard::AdjustDataKeyStats(...):
Assertion `dirty_delta >= 0 || dirty_data_key_count_ >= static_cast<size_t>(-dirty_delta)' failed.

dirty_data_key_count_ is being decremented below zero — more entries are "un-dirtied" than were ever counted as dirty.

Root Cause

BackFill() in TemplateCcMap called operations in the wrong order:

  1. SetCkptTs(commit_ts) — sets the flush bit (marks entry as persistent)
  2. OnFlushed() — accounting sees the entry as persistent, no dirty-count increment
  3. SetCommitTsPayloadStatus(commit_ts, status)overwrites the field entirely, clearing the flush bit

The entry ends up dirty with no matching dirty-count increment. When checkpoint later flushes and decrements the counter, it underflows.

Fix (in data_substrate)

  1. template_cc_map.hBackFill(): Reorder so SetCommitTsPayloadStatus() runs before SetCkptTs(), preserving the flush bit. Add OnCommittedUpdate() to reconcile stats.
  2. template_cc_map.hRemoteReadOutsideCc(): Add OnCommittedUpdate() for consistency.
  3. cc_req_misc.cppUpdateCceCkptTsCc::Execute(): Relax 4 assertions that checked !IsPersistent()BackFill can now legitimately mark entries persistent before the checkpoint callback fires.

Testing

Ran test_add_node_with_double_write 6 consecutive times after the fix — all passed with zero assertion failures:

Run Result Time
1 OK 589s
2 OK 560s
3 OK 583s
4 OK 665s
5 OK 621s
6 OK 603s

Summary by CodeRabbit

  • Chores
    • Updated an internal dependency reference to a newer commit; no user-facing functionality changed.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 87dd4a8b-eb59-4863-aad9-be3c15173679

📥 Commits

Reviewing files that changed from the base of the PR and between ec5f92a and da92602.

📒 Files selected for processing (1)
  • data_substrate
🚧 Files skipped from review as they are similar to previous changes (1)
  • data_substrate

Walkthrough

This PR updates the data_substrate git submodule pointer from 1a4729b71e08b72f0461bd3fbe107fd728592516 to 60c8665f33beca1e87d5c9d5290b4d80588b075c. No functional code changes were made.

Changes

Cohort / File(s) Summary
Submodule Update
./.gitmodules, path/to/submodules/data_substrate
Bumped data_substrate submodule commit pointer to 60c8665f33be... (metadata-only change).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~1 minute

Possibly related PRs

Suggested reviewers

  • thweetkomputer
  • liunyl
  • MrGuin

Poem

🐰 I hopped through hashes, light and spry,
A tiny pointer reached the sky,
From old to new the commit did glide,
Upstream whispers now inside,
Thump-thump — the rabbit stamps with pride!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: updating the data_substrate submodule to fix a BackFill dirty_data_key_count_ underflow issue. It is concise, specific, and directly related to the primary purpose of the PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…erflow

Update data_substrate submodule to include the fix for an intermittent
assertion failure in AdjustDataKeyStats where dirty_data_key_count_
underflows during cluster scale-out.

See eloqdata/tx_service#465 for details.
@githubzilla githubzilla force-pushed the fix/backfill-dirty-key-count-underflow branch from ec5f92a to da92602 Compare March 18, 2026 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant