Dirty memory ratio and size can trigger ckpt#406
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds periodic dirty-memory-based checkpoint triggering: CcShard samples AdjustDataKeyStats calls and runs CheckAndTriggerCkptByDirtyMemory(), estimating dirty memory from heap usage and dirty/data key counts, then requests an immediate checkpoint when configured or default thresholds are exceeded; removes the prior memory-full notify path. Changes
Sequence Diagram(s)sequenceDiagram
participant Adjust as AdjustDataKeyStats
participant Shard as CcShard
participant Heap as ShardHeap
participant Ckp as Checkpointer
Adjust->>Shard: call / increment adjust_stats_call_count_
alt sampling interval reached
Adjust->>Shard: CheckAndTriggerCkptByDirtyMemory()
Shard->>Heap: GetShardHeap()->Full(...)
Heap-->>Shard: memory_limit, used, heap_allocated
Shard->>Shard: compute dirty_memory = heap_allocated * (dirty_data_key_count_/data_key_count_)
Shard->>Shard: determine threshold (config MB or 10% memory_limit)
alt dirty_memory > threshold
Shard->>Ckp: ckpter_->Notify(true)
Ckp-->>Shard: checkpoint initiated
Shard->>Shard: log shard id, dirty_memory_MB, threshold_MB, key counts
else
Shard-->>Adjust: continue normal operation
end
else
Shard-->>Adjust: continue normal operation
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@tx_service/src/cc/cc_shard.cpp`:
- Around line 430-469: In CcShard::CheckAndTriggerCkptByDirtyMemory, add a
nullptr check for GetShardHeap() before calling Full() and return early if it is
null to avoid dereferencing an uninitialized heap; also guard the allocated
value (returned by Full()) so that if allocated <= 0 you treat it as 0 (or clamp
to 0) before computing dirty_memory and casting to uint64_t to avoid producing a
huge unsigned value when allocated is negative; update the dirty_memory
calculation to use the clamped allocated value and keep the rest of the
threshold logic and ckpter_->Notify(true) behavior unchanged.
🧹 Nitpick comments (1)
tx_service/src/cc/cc_shard.cpp (1)
63-73: Missing blank line beforeCcShard::CcShardand flag naming nit.Minor: There's no blank line separating the gflag definitions from the
CcShardconstructor on Line 74, which hurts readability slightly.More substantively, consider whether the default
dirty_memory_ratio_thresholdof0.20anddirty_memory_size_threshold_mbof50are appropriate for all deployment sizes. A 50 MB absolute threshold could be too low for large-memory nodes or too high for small ones. Since these are gflags, users can override them, but documenting guidance in the flag help string would be useful.
| void CcShard::CheckAndTriggerCkptByDirtyMemory() | ||
| { | ||
| if (memory_limit_ == 0 || data_key_count_ == 0 || ckpter_ == nullptr) | ||
| { | ||
| return; | ||
| } | ||
|
|
||
| // Get current memory usage | ||
| int64_t allocated = 0, committed = 0; | ||
| GetShardHeap()->Full(&allocated, &committed); | ||
|
|
||
| // Calculate dirty memory | ||
| double dirty_key_ratio = static_cast<double>(dirty_data_key_count_) / | ||
| static_cast<double>(data_key_count_); | ||
| uint64_t dirty_memory = static_cast<uint64_t>(allocated * dirty_key_ratio); | ||
| double dirty_memory_ratio = | ||
| static_cast<double>(dirty_memory) / static_cast<double>(memory_limit_); | ||
|
|
||
| // Check thresholds | ||
| uint64_t size_threshold = | ||
| FLAGS_dirty_memory_size_threshold_mb * 1024 * 1024; | ||
| bool ratio_exceeded = dirty_memory_ratio > FLAGS_dirty_memory_ratio_threshold; | ||
| bool size_exceeded = dirty_memory > size_threshold; | ||
|
|
||
| // Trigger checkpoint only when BOTH thresholds are exceeded | ||
| if (ratio_exceeded && size_exceeded) | ||
| { | ||
| DLOG(INFO) << "Shard " << core_id_ | ||
| << " triggering checkpoint - dirty_memory_ratio=" << std::fixed | ||
| << std::setprecision(1) << (dirty_memory_ratio * 100) << "%" | ||
| << " (threshold=" | ||
| << (FLAGS_dirty_memory_ratio_threshold * 100) | ||
| << "%), dirty_memory=" << (dirty_memory / (1024 * 1024)) | ||
| << "MB (threshold=" << FLAGS_dirty_memory_size_threshold_mb | ||
| << "MB), dirty_keys=" << dirty_data_key_count_ << "/" | ||
| << data_key_count_; | ||
|
|
||
| ckpter_->Notify(true); // Request immediate checkpoint | ||
| } | ||
| } |
There was a problem hiding this comment.
Null-check GetShardHeap() before dereferencing.
GetShardHeap() can return nullptr if InitializeShardHeap() hasn't been called yet. The existing guards (memory_limit_ == 0, data_key_count_ == 0) don't guarantee the heap is initialized.
Additionally, allocated is int64_t — if it's zero or negative (edge case), casting allocated * dirty_key_ratio to uint64_t on Line 444 would produce an incorrect large value.
Proposed fix
void CcShard::CheckAndTriggerCkptByDirtyMemory()
{
if (memory_limit_ == 0 || data_key_count_ == 0 || ckpter_ == nullptr)
{
return;
}
+ if (GetShardHeap() == nullptr)
+ {
+ return;
+ }
+
// Get current memory usage
int64_t allocated = 0, committed = 0;
GetShardHeap()->Full(&allocated, &committed);
+ if (allocated <= 0)
+ {
+ return;
+ }
+
// Calculate dirty memory🤖 Prompt for AI Agents
In `@tx_service/src/cc/cc_shard.cpp` around lines 430 - 469, In
CcShard::CheckAndTriggerCkptByDirtyMemory, add a nullptr check for
GetShardHeap() before calling Full() and return early if it is null to avoid
dereferencing an uninitialized heap; also guard the allocated value (returned by
Full()) so that if allocated <= 0 you treat it as 0 (or clamp to 0) before
computing dirty_memory and casting to uint64_t to avoid producing a huge
unsigned value when allocated is negative; update the dirty_memory calculation
to use the clamped allocated value and keep the rest of the threshold logic and
ckpter_->Notify(true) behavior unchanged.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@tx_service/src/cc/cc_shard.cpp`:
- Around line 446-455: The computed threshold in the shard's checkpoint logic
(variable threshold computed from FLAGS_dirty_memory_size_threshold_mb and
memory_limit_) can become zero for tiny memory_limit_ values, causing
dirty_memory > threshold to always be true; update the calculation in
cc_shard.cpp (the block that sets threshold using memory_limit_ and
FLAGS_dirty_memory_size_threshold_mb) to enforce a reasonable minimum floor
(e.g., a constant MIN_DIRTY_THRESHOLD_BYTES) or early-return when the computed
threshold would be zero so you don't trigger spurious checkpoint notifications;
ensure you reference memory_limit_, FLAGS_dirty_memory_size_threshold_mb, and
the local variable threshold when making the change.
🧹 Nitpick comments (1)
tx_service/src/cc/cc_shard.cpp (1)
30-30: Unused<iomanip>include.The comment says
std::setprecisionbut no<iomanip>facilities are used in this file. Consider removing it to keep includes clean.Proposed fix
-#include <iomanip> // std::setprecision
tx_service/src/cc/cc_shard.cpp
Outdated
| uint64_t threshold; | ||
| if (FLAGS_dirty_memory_size_threshold_mb == 0) | ||
| { | ||
| // Default: 10% of memory limit | ||
| threshold = static_cast<uint64_t>(memory_limit_ * 0.1); | ||
| } | ||
| else | ||
| { | ||
| threshold = FLAGS_dirty_memory_size_threshold_mb * 1024 * 1024; | ||
| } |
There was a problem hiding this comment.
threshold may be zero when memory_limit_ is small.
When FLAGS_dirty_memory_size_threshold_mb == 0, the threshold is memory_limit_ * 0.1. If memory_limit_ is very small (e.g., < 10 bytes in a test), the truncation to uint64_t yields 0, making dirty_memory > threshold trivially true and spamming checkpoint notifications. Consider a minimum floor or an early return.
🤖 Prompt for AI Agents
In `@tx_service/src/cc/cc_shard.cpp` around lines 446 - 455, The computed
threshold in the shard's checkpoint logic (variable threshold computed from
FLAGS_dirty_memory_size_threshold_mb and memory_limit_) can become zero for tiny
memory_limit_ values, causing dirty_memory > threshold to always be true; update
the calculation in cc_shard.cpp (the block that sets threshold using
memory_limit_ and FLAGS_dirty_memory_size_threshold_mb) to enforce a reasonable
minimum floor (e.g., a constant MIN_DIRTY_THRESHOLD_BYTES) or early-return when
the computed threshold would be zero so you don't trigger spurious checkpoint
notifications; ensure you reference memory_limit_,
FLAGS_dirty_memory_size_threshold_mb, and the local variable threshold when
making the change.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/src/tx_service_init.cpp (1)
217-234:⚠️ Potential issue | 🟡 MinorAvoid silent narrowing of dirty-memory values in
tx_service_conf.
tx_service_confstoresuint32_t, but the new values areuint64_t; large settings will truncate without warning. Add a guard (or widen the map) before inserting.🛠️ Suggested guard + explicit cast
+ if (dirty_memory_check_interval > std::numeric_limits<uint32_t>::max() || + dirty_memory_size_threshold_mb > std::numeric_limits<uint32_t>::max()) + { + LOG(ERROR) << "dirty_memory_* exceeds uint32_t max"; + return false; + } std::map<std::string, uint32_t> tx_service_conf{ @@ - {"dirty_memory_check_interval", dirty_memory_check_interval}, - {"dirty_memory_size_threshold_mb", dirty_memory_size_threshold_mb}}; + {"dirty_memory_check_interval", + static_cast<uint32_t>(dirty_memory_check_interval)}, + {"dirty_memory_size_threshold_mb", + static_cast<uint32_t>(dirty_memory_size_threshold_mb)}};// Add near includes if not already present `#include` <limits>
🧹 Nitpick comments (2)
tx_service/include/cc/cc_shard.h (1)
324-328: Consider makingCheckAndTriggerCkptByDirtyMemory()private.The docstring states this is "called periodically from
AdjustDataKeyStats", which is an internal call path. If no external caller needs it, moving it to the private section would better encapsulate the implementation detail. That said, keeping it public is acceptable if it simplifies unit testing.tx_service/src/cc/cc_shard.cpp (1)
30-30: Unused#include <iomanip>.The comment says this is for
std::setprecision, but it's never used in the file. Remove the include to avoid clutter.-#include <iomanip> // std::setprecision
a222b76 to
13dbcc1
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@core/src/tx_service_init.cpp`:
- Around line 232-234: The map tx_service_conf is typed std::map<std::string,
uint32_t> but variables dirty_memory_check_interval and
dirty_memory_size_threshold_mb are uint64_t, risking silent truncation; fix by
either (A) clamping/casting the two values to uint32_t with a runtime check that
if value > UINT32_MAX it is set to UINT32_MAX (and optionally log/warn) before
inserting into tx_service_conf, or (B) add a static_assert or explicit runtime
assertion near the definitions of
dirty_memory_check_interval/dirty_memory_size_threshold_mb to ensure they never
exceed UINT32_MAX; update the insertion code that uses the keys
"dirty_memory_check_interval" and "dirty_memory_size_threshold_mb" accordingly
(perform clamp or assert prior to populating tx_service_conf).
🧹 Nitpick comments (2)
tx_service/src/cc/cc_shard.cpp (2)
427-434: Sampling placement: check only fires whendirty_delta != 0.The counter increment and modulo check are inside the
if (dirty_delta != 0)block (line 409), so the flag description ("every N calls to AdjustDataKeyStats") is slightly misleading — it's actually "every N calls where dirty_delta is non-zero." This is arguably more meaningful (only check when dirtiness changes), but the flag description could be updated for clarity.Also, note that
adjust_stats_call_count_will be0on the very first call, and++adjust_stats_call_count_ % dirty_memory_check_interval_will equal1 % N, so the first check won't happen until the N-th call — which is the expected behavior.
30-30: Remove unused<iomanip>include.The
std::setprecisionfunction is not used in the file. TheDLOGstatement at line 476 performs integer division without any formatting precision requirements, making this include unnecessary.Proposed fix
-#include <iomanip> // std::setprecision
core/src/tx_service_init.cpp
Outdated
| {"range_slice_memory_limit_percent", range_slice_memory_limit_percent}, | ||
| {"dirty_memory_check_interval", dirty_memory_check_interval}, | ||
| {"dirty_memory_size_threshold_mb", dirty_memory_size_threshold_mb}}; |
There was a problem hiding this comment.
Narrowing from uint64_t to uint32_t when inserting into tx_service_conf.
The map is std::map<std::string, uint32_t>, but dirty_memory_check_interval and dirty_memory_size_threshold_mb are uint64_t. Values exceeding UINT32_MAX will be silently truncated. This is unlikely to be a practical problem for these specific settings, and other entries in this map already have the same narrowing pattern (e.g., checkpointer_interval), but worth noting for correctness.
If the map's value type can't be widened (to avoid a larger refactor), consider adding a static_assert or runtime check clamping these values to UINT32_MAX before insertion.
🤖 Prompt for AI Agents
In `@core/src/tx_service_init.cpp` around lines 232 - 234, The map tx_service_conf
is typed std::map<std::string, uint32_t> but variables
dirty_memory_check_interval and dirty_memory_size_threshold_mb are uint64_t,
risking silent truncation; fix by either (A) clamping/casting the two values to
uint32_t with a runtime check that if value > UINT32_MAX it is set to UINT32_MAX
(and optionally log/warn) before inserting into tx_service_conf, or (B) add a
static_assert or explicit runtime assertion near the definitions of
dirty_memory_check_interval/dirty_memory_size_threshold_mb to ensure they never
exceed UINT32_MAX; update the insertion code that uses the keys
"dirty_memory_check_interval" and "dirty_memory_size_threshold_mb" accordingly
(perform clamp or assert prior to populating tx_service_conf).
tx_service/src/cc/cc_shard.cpp
Outdated
|
|
||
| // Check dirty memory thresholds periodically | ||
| if (dirty_memory_check_interval_ > 0 && | ||
| ++adjust_stats_call_count_ % dirty_memory_check_interval_ == 0) |
There was a problem hiding this comment.
just clear adjust_stats_call_count_ when it reaches check interval to avoid expensive % call
tx_service/src/cc/cc_shard.cpp
Outdated
| // Determine threshold: use configured value or default to 10% of | ||
| // memory_limit | ||
| uint64_t threshold; | ||
| if (dirty_memory_size_threshold_mb_ == 0) |
There was a problem hiding this comment.
this should be pre calculated at cc shard initialization
tx_service/src/cc/cc_shard.cpp
Outdated
| << "MB), dirty_keys=" << dirty_data_key_count_ << "/" | ||
| << data_key_count_; | ||
|
|
||
| ckpter_->Notify(true); // Request immediate checkpoint |
There was a problem hiding this comment.
you should not spam notify when there's already a checkpoint ongoing
tx_service/include/cc/cc_shard.h
Outdated
|
|
||
| // Config for dirty memory checkpoint triggering. | ||
| uint64_t dirty_memory_check_interval_{1000}; | ||
| uint64_t dirty_memory_size_threshold_mb_{0}; |
There was a problem hiding this comment.
why do you need 2 fields here?
There was a problem hiding this comment.
dirty_memory_check_interval_: check dirty memory size every 1000 times of AdjustDataKeyStats call
dirty_memory_size_threshold_mb_: trigger ckpt when dirty memory size bigger than this value
There was a problem hiding this comment.
I mean dirty_memory_size_threshold_mb_ and dirty_memory_threshold_bytes_
There was a problem hiding this comment.
I mean dirty_memory_threshold_bytes_ and dirty_memory_size_threshold_mb_
| std::mutex ckpt_mux_; | ||
| std::condition_variable ckpt_cv_; | ||
| bool request_ckpt_; | ||
| std::atomic<bool> request_ckpt_{false}; |
There was a problem hiding this comment.
why is this changed to atomic? atomic and mutex + cv does not work well together
|
We have more work to do, since liuyl and I have new idea to control the frequency of the NotifyCKPT should be triggered
|
tx_service/include/cc/cc_shard.h
Outdated
|
|
||
| // Config for dirty memory checkpoint triggering. | ||
| uint64_t dirty_memory_check_interval_{1000}; | ||
| uint64_t dirty_memory_size_threshold_mb_{0}; |
There was a problem hiding this comment.
I mean dirty_memory_size_threshold_mb_ and dirty_memory_threshold_bytes_
| // Notify the checkpointer thread to do checkpoint if there | ||
| // is not freeable entries to be kicked out from ccmap and | ||
| // if the shard is not doing defrag. | ||
| if (free_count_ == 0 && !shard_heap->IsDefragHeapCcOnFly() && |
There was a problem hiding this comment.
We should keep this as a safety check. This should not trigger. But if user set some terrible params like check interval = 1000000 and dirty ratio = 99%, we might be stuck at memory full and unable to trigger checkpoint
tx_service/src/checkpointer.cpp
Outdated
| { | ||
| request_ckpt_ = true; | ||
| if (std::chrono::high_resolution_clock::now() < | ||
| last_checkpoint_request_ts_.load(std::memory_order_acquire) + |
There was a problem hiding this comment.
no need to use acquire.
no need to use high_resolution_clock.
Too expensive.
Save the current timestamp so you don't have do it twice
tx_service/include/cc/cc_shard.h
Outdated
|
|
||
| // Config for dirty memory checkpoint triggering. | ||
| uint64_t dirty_memory_check_interval_{1000}; | ||
| uint64_t dirty_memory_size_threshold_mb_{0}; |
There was a problem hiding this comment.
I mean dirty_memory_threshold_bytes_ and dirty_memory_size_threshold_mb_
tx_service/src/cc/cc_req_misc.cpp
Outdated
| { | ||
| ccs.NotifyCkpt(); | ||
| } | ||
| // Note: Checkpoint is now triggered based on dirty memory |
There was a problem hiding this comment.
We should keep the memory full check as a safety check. In case some terrible params are set, like check interval = 1000000 and dirty memory ratio = 99%, we might stuck at memory full and unable to trigger checkpoint.
Here are some reminders before you submit the pull request
fixes eloqdb/tx_service#issue_id./mtr --suite=mono_main,mono_multi,mono_basicSummary by CodeRabbit
New Features
Bug Fixes / Improvements