Skip to content

Add payload-size-aware large-value zone eviction for EloqKV (ObjectCcMap)#415

Draft
Copilot wants to merge 15 commits intomainfrom
copilot/enhance-cache-eviction-policy
Draft

Add payload-size-aware large-value zone eviction for EloqKV (ObjectCcMap)#415
Copilot wants to merge 15 commits intomainfrom
copilot/enhance-cache-eviction-policy

Conversation

Copy link

Copilot AI commented Feb 25, 2026

EloqKV holds large values that are expensive to reload from disk. The standard LRU eviction policy does not account for payload size, causing large-value pages to drift to the eviction head at the same rate as cheap small-value entries.

Design

The LRU list is divided into two zones at a stable pointer boundary:

head ← [small-value pages] ← lru_large_value_zone_head_ ← [large-value pages] ← tail
  • Small-value pages insert before lru_large_value_zone_head_ — they cycle through the LRU normally.
  • Large-value pages (PayloadSize() > txservice_large_value_threshold) insert before tail_ccp_ — they always stay in the recent half regardless of how many small-value accesses occur.
  • Eviction scan (head → tail) exhausts small-value pages before reaching the large-value zone. Large-value pages can still be evicted under extreme memory pressure once the SV zone is empty.
  • No zone-size cap: memory accounting relies on mimalloc heap, not page count, so a fixed-ratio limit is not meaningful. In the extreme all-large-value case, SV pages are evicted first and that is accepted.

Key changes

CcShard (cc_shard.h / cc_shard.cpp)

  • Add LruPage *lru_large_value_zone_head_ (init &tail_ccp_).
  • UpdateLruList: insert point is &tail_ccp_ for LV pages, lru_large_value_zone_head_ for SV pages; advance boundary when the first LV page is inserted.
  • DetachLru / ReplaceLru: advance/update lru_large_value_zone_head_ when the boundary page is removed/replaced.

LruPage (cc_entry.h)

  • Add bool has_large_value_{false} — marks a page as belonging to the LV zone; set once, never cleared.

CcMap / ObjectCcMap (cc_map.h, object_cc_map.h)

  • virtual bool IsLargeValueZoneEnabled() const { return false; } — default off for all maps.
  • ObjectCcMap overrides to return txservice_large_value_threshold > 0 — policy is EloqKV-only; RangeCcMap / CatalogCcMap are unaffected.

TemplateCcMap (template_cc_map.h)

  • MaybeMarkAndRezoneAsLargeValue(LruPage*, size_t) private helper: sets has_large_value_ and calls UpdateLruList to eagerly move the page into the LV zone on first detection.

ObjectCcMap (object_cc_map.h) — 6 eager re-zone call sites

  • Execute(ApplyCc)apply_and_commit_ path after payload commit
  • Execute(PostWriteCc) — after commit when commit_ts > 0
  • Execute(UploadBatchCc) — after PassInCurrentPayload for Normal records
  • Execute(KeyObjectStandbyForwardCc) — after commands committed to payload
  • Execute(ReplayLogCc) — after EmplaceAndCommitBufferedTxnCommand
  • BackFill — after DeserializeCurrentPayload for Normal records

CcPageCleanGuardWithoutKickoutCc (cc_page_clean_guard.h) — lazy fallback

  • On each scan, sets has_large_value_ and signals re-zone for any LV entry not yet detected by the eager path (safety net only).

tx_service_common.h

  • inline size_t txservice_large_value_threshold = 0 — threshold in bytes; 0 disables the policy entirely.

tx_service.h

  • Config key large_value_threshold parsed into txservice_large_value_threshold.

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Reference the link of issue using fixes eloqdb/tx_service#issue_id
  • Reference the link of RFC if exists
  • Pass ./mtr --suite=mono_main,mono_multi,mono_basic

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

- Add txservice_large_value_threshold and txservice_large_value_eviction_age
  to tx_service_common.h. When threshold > 0, entries whose PayloadSize()
  exceeds the threshold are protected from LRU eviction until the page's
  LRU age (access_counter_ - last_access_ts_) reaches the eviction age.
- Expose AccessCounter() getter on CcShard for use by the clean guard.
- Modify CcPageCleanGuardWithoutKickoutCc::CanBeCleaned() to implement
  the protection logic, cooperating with existing LRU-based eviction.
- Add optional configuration keys large_value_threshold and
  large_value_eviction_age to TxService::Init().
- Add BulkEmplaceFreeForTest() and SetPayloadForTest() helpers to
  TemplateCcMap for deterministic test setup.
- Add test case validating that large-value entries are protected when the
  policy is active and evicted normally when protection is disabled.

Co-authored-by: xiexiaoy <5558602+xiexiaoy@users.noreply.github.com>
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copilot AI changed the title [WIP] Add payload-size-aware cache eviction policy Add payload-size-aware cache eviction policy Feb 25, 2026
Copilot AI requested a review from xiexiaoy February 25, 2026 07:40
- Expand the access_counter_ comment in cc_shard.h to cover both its
  primary use (O(1) relative-order comparison of two LRU pages during
  merge/redistribute) and its secondary use (measuring LRU age for the
  payload-size-aware eviction guard).
- Add a comment to last_access_ts_ in cc_entry.h explaining that it
  stores the snapshot of access_counter_ at last-access time, that the
  difference (access_counter_ - last_access_ts_) is always >= 0 and a
  valid LRU-age proxy.
- Enrich the comment in CcPageCleanGuardWithoutKickoutCc::CanBeCleaned()
  to explain the invariant that makes the age calculation correct and
  why it cooperates naturally with standard LRU eviction.

Co-authored-by: xiexiaoy <5558602+xiexiaoy@users.noreply.github.com>
Copilot AI changed the title Add payload-size-aware cache eviction policy Add payload-size-aware cache eviction cooperating with LRU Feb 25, 2026
{
return {false, false};
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个算法存在两个问题:

shard.access_counter_是一个shard level的值,某个热点key就可以使它变得很大。用户不知道怎么配置txservice_large_value_eviction_age,也不想配置。

large value容易沉淀到LRU list的头部,但是因为它的age小于txservice_large_value_eviction_age,所以不会被踢掉。这造成了LRU 扫描的时候做了一些无效的工作,扫描了不能被踢的large value。

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 55925be.

Both problems are resolved by replacing the absolute txservice_large_value_eviction_age threshold with a self-calibrating relative LRU-position check:

uint64_t now        = cc_shard_->AccessCounter();
uint64_t page_age   = now - page_->last_access_ts_;
uint64_t total_span = now - cc_shard_->LruOldestTs();
if (page_age * 2 < total_span)   // page in recent half → protect
    return {false, false};
// page in old half → allow eviction (fall through)

Problem 1 (unconfigurable threshold): txservice_large_value_eviction_age is removed entirely. The check compares a page's age against the total span of the LRU list, so it self-calibrates regardless of how fast access_counter_ grows. No user configuration is needed.

Problem 2 (cold large values blocking the scan): A cold large-value page near the LRU head has page_age ≈ total_span, so page_age * 2 >= total_span — it is immediately eligible for eviction rather than wasting scan work.

Co-authored-by: xiexiaoy <5558602+xiexiaoy@users.noreply.github.com>
…e LRU check

Problem 1: access_counter_ grows at unpredictable rates with hot keys,
making txservice_large_value_eviction_age impossible to configure.

Problem 2: cold large-value pages near the LRU head accumulate large
absolute age but still smaller than the threshold, causing useless scan
work on entries that cannot be evicted.

Fix: replace the absolute threshold with a self-calibrating relative
LRU-position check:
  page_age * 2 < total_span  → page in recent half → protect
  page_age * 2 >= total_span → page in old half    → evict

where:
  page_age   = access_counter_ - page->last_access_ts_
  total_span = access_counter_ - LruOldestTs()

This requires no user configuration and automatically adapts to any
access rate. Cold large-value pages (near the head) always satisfy
page_age * 2 >= total_span, so they are correctly evicted instead of
blocking the scan. Hot pages (near the tail) always satisfy
page_age * 2 < total_span, so they are correctly protected.

- Add CcShard::LruOldestTs() returning last_access_ts_ of the LRU-head page.
- Remove txservice_large_value_eviction_age global and config parsing.
- Update comments in cc_shard.h, cc_entry.h, cc_page_clean_guard.h.
- Update test: use two maps (old vs new) to verify old-half entries are
  evicted while new-half entries are retained under the relative check.

Co-authored-by: xiexiaoy <5558602+xiexiaoy@users.noreply.github.com>
The previous relative-position check (page_age * 2 < total_span) was
fundamentally flawed: when a large-value page drifts to the LRU head it
IS the oldest page, so total_span == page_age and page_age * 2 >=
total_span is always true — the page was always evictable at the head.

New approach: LRU boost on protection.
- CanBeCleaned (WithoutKickoutCc) protects any free entry whose
  PayloadSize() > txservice_large_value_threshold and sets a new
  mutable flag has_blocked_large_value_ in the guard.
- CleanPage propagates the flag via new out_has_blocked_large_value
  parameter.
- CleanPageAndReBalance: after RebalancePage captures next_page, if
  the flag is set and the page is non-empty and still in the LRU list,
  call UpdateLruList(page, false) to move it to the tail. After the
  boost page->lru_next_ == &tail_ccp_, so next_page = page->lru_next_
  naturally stops the current scan.

Effect:
- Large-value pages are never evicted by the regular scan (they keep
  getting boosted whenever the scan visits them).
- Small-value pages at the head are evicted normally (scan progresses
  past them before hitting the first large-value page).
- The scan never loops: after boosting a page and stopping, the next
  scan starts from the head which no longer contains that large-value
  page (it was just moved to the tail).
- No user-visible configuration change: txservice_large_value_threshold
  is still the only knob.

Removes LruOldestTs() (no longer needed) from cc_shard.h.
Restores single-map test (no more two-map approach).

Co-authored-by: xiexiaoy <5558602+xiexiaoy@users.noreply.github.com>
Copilot AI changed the title Add payload-size-aware cache eviction via LRU zone separation Add payload-size-aware cache eviction via LRU zone separation with SLRU-style ratio cap Feb 26, 2026
The system uses mimalloc heap for memory statistics and pages are not
fixed size, so a fixed-capacity ratio (large_value_zone_max_ratio) based
on page count does not map to actual memory usage and is removed.

The zone-separation approach is retained unchanged:
  head ← [small-value pages] ← lru_large_value_zone_head_ ← [large-value pages] ← tail

Small-value pages are inserted before lru_large_value_zone_head_.
Large-value pages are inserted before &tail_ccp_ (true MRU end).
In the extreme case where all entries are large-value, the SV zone will
be empty and SV pages will be evicted immediately even if accessed
frequently — this is explicitly accepted.

Removed:
- txservice_large_value_zone_max_ratio global (tx_service_common.h)
- bool in_large_value_zone_ from LruPage (cc_entry.h)
- uint64_t large_value_zone_page_count_ / total_lru_page_count_ from
  CcShard (cc_shard.h)
- LargeValueZonePageCount() / TotalLruPageCount() getters (cc_shard.h)
- Count maintenance (++/-- in UpdateLruList, DetachLru) (cc_shard.cpp)
- in_large_value_zone_ inheritance in ReplaceLru (cc_shard.cpp)
- Ratio-enforcement while loop from UpdateLruList (cc_shard.cpp)
- PART 2 ratio-enforcement test section (CcPage-Test.cpp)
- References to removed globals/methods in test (CcPage-Test.cpp)

Co-authored-by: xiexiaoy <5558602+xiexiaoy@users.noreply.github.com>
Copilot AI changed the title Add payload-size-aware cache eviction via LRU zone separation with SLRU-style ratio cap Add payload-size-aware cache eviction via LRU zone separation Feb 27, 2026
Copilot stopped work on behalf of xiexiaoy due to an error February 27, 2026 08:56
… sentinel

Evaluation: dummy sentinel is strictly better than the pointer approach.

Pointer approach (old):
  - DetachLru must advance lru_large_value_zone_head_ when the zone-head
    page is removed (3 extra lines).
  - ReplaceLru must update lru_large_value_zone_head_ when the zone-head
    page is replaced by defrag (3 extra lines).
  - UpdateLruList must initialize lru_large_value_zone_head_ on first LV
    page insertion and handle the sentinel-vs-data-page duality
    (lru_large_value_zone_head_ == &tail_ccp_ for empty zone, data page
    otherwise) (3 extra lines + conceptual complexity).

Dummy sentinel approach (new):
  - head_large_ccp_ is always in the list as the permanent boundary:
      head_ccp_ ← [SV pages] ← head_large_ccp_ ← [LV pages] ← tail_ccp_
  - DetachLru: no zone-boundary update needed (sentinel never moves).
  - ReplaceLru: no zone-boundary update needed (sentinel never moves).
  - UpdateLruList: insert_before = has_large_value_ ? &tail_ccp_
    : &head_large_ccp_. No empty-zone check needed.
  - Clean(): skip sentinel pages (parent_map_ == nullptr) in the scan
    loop (4 extra lines) — a small, clean addition consistent with how
    the loop already skips head_ccp_ (via start-offset) and tail_ccp_
    (via loop condition).
  - Consistent with head_ccp_ / tail_ccp_ design.

Renamed:  lru_large_value_zone_head_  →  head_large_ccp_
          LruLargeValueZoneHead() returns &head_large_ccp_ (always valid,
          parent_map_ == nullptr indicating it is a sentinel).

Updated test: zone_head → zone_sentinel with parent_map_ == nullptr
check; zone-structure walk split into SV walk (up to sentinel) and LV
walk (after sentinel); added assertion that newly inserted SV page lands
immediately before the sentinel.

Co-authored-by: xiexiaoy <5558602+xiexiaoy@users.noreply.github.com>
Copilot AI changed the title Add payload-size-aware cache eviction via LRU zone separation Add payload-size-aware cache eviction with LRU zone separation Feb 27, 2026
After comparing the two approaches the pointer-based zone boundary is
preferred over the dummy-sentinel approach (head_large_ccp_).

Restores cc_shard.h, cc_shard.cpp, and CcPage-Test.cpp to the state
they had before the sentinel commit (991a12f), using:
  git checkout 42d757e~1 -- <files>

Co-authored-by: xiexiaoy <5558602+xiexiaoy@users.noreply.github.com>
Copilot AI and others added 2 commits February 27, 2026 11:02
…et sites

Feasibility evaluation: FEASIBLE. All 7 payload-assignment paths already
have the LruPage* in scope and execute on the single-threaded shard
coroutine, so there are no concurrency or scoping issues.

New private helper MaybeMarkAndRezoneAsLargeValue(LruPage*, size_t):
- No-op when txservice_large_value_threshold == 0 (disabled), page is null,
  has_large_value_ already set, or payload_size <= threshold.
- Otherwise sets page->has_large_value_ = true and calls UpdateLruList to
  move the page into the large-value zone immediately.

Called eagerly at all 7 payload-assignment sites:
1. PostWriteCc         — after SetCurrentPayload / DeserializeCurrentPayload
2. AcquireAllCc        — after SetCurrentPayload(payload)
3. ReadCc (outside)    — after PassInCurrentPayload
4. RemoteReadOutside   — after DeserializeCurrentPayload (Normal path)
5. ReplayLogCc         — after DeserializeCurrentPayload (Insert/Update)
6. UploadBatchCc       — after SetCurrentPayload(commit_val) for Normal
7. BackFill            — after DeserializeCurrentPayload (non-Deleted path)

The lazy fallback in CanBeCleaned (has_blocked_large_value_ / needs_rezoning)
is retained as a safety net for any path not listed above.

Also:
- TriggerEagerRezoneForTest() public test helper: invokes
  MaybeMarkAndRezoneAsLargeValue for every page in the map using its
  actual max entry PayloadSize(). Simulates what production commit /
  backfill paths do without going through Execute handlers.
- New test case "Eager re-zone on large-value payload": inserts two maps
  (SV + LV), assigns large payloads, calls TriggerEagerRezoneForTest
  WITHOUT any clean scan, then asserts correct zone structure. Confirms
  the eager path fires before CanBeCleaned is ever invoked.

Co-authored-by: xiexiaoy <5558602+xiexiaoy@users.noreply.github.com>
Co-authored-by: xiexiaoy <5558602+xiexiaoy@users.noreply.github.com>
Copilot AI changed the title Add payload-size-aware cache eviction with LRU zone separation Add payload-size-aware cache eviction: zone separation with eager re-zone Feb 27, 2026
Copilot AI requested a review from xiexiaoy February 27, 2026 11:07
The payload-size-aware large-value zone policy is only intended for
EloqKV (ObjectCcMap). It must not affect EloqSQL / EloqDoc tables
served by RangeCcMap, CatalogCcMap, or ClusterConfigCcMap.

Add virtual bool CcMap::IsLargeValueZoneEnabled() const { return false; }.
  Default: disabled for every CcMap subclass.

Override in ObjectCcMap: return txservice_large_value_threshold > 0.
  Only ObjectCcMap / EloqKV enables the zone.

Update MaybeMarkAndRezoneAsLargeValue in TemplateCcMap:
  replace: txservice_large_value_threshold == 0
  with:    !IsLargeValueZoneEnabled()
  Virtual dispatch on 'this' → no-op for RangeCcMap instances.

Update CcPageCleanGuardWithoutKickoutCc::CanBeCleaned:
  replace: txservice_large_value_threshold > 0
  with:    page_->parent_map_ != nullptr &&
           page_->parent_map_->IsLargeValueZoneEnabled()
  Virtual dispatch via parent_map_ → no-op for non-ObjectCcMap pages.

Update CcPage-Test.cpp:
  The tests use TemplateCcMap<..., true, true> (not ObjectCcMap).
  Add LargeValueTestCcMap thin test subclass that overrides
  IsLargeValueZoneEnabled() to return txservice_large_value_threshold > 0.
  Use it in the two large-value test cases.

Co-authored-by: xiexiaoy <5558602+xiexiaoy@users.noreply.github.com>
Copilot AI changed the title Add payload-size-aware cache eviction: zone separation with eager re-zone Add payload-size-aware cache eviction for EloqKV (large-value LRU zone) Feb 28, 2026
EloqKV uses ApplyCc (not ReadCc/AcquireCc) to read and write ObjectCcMap.
The previous implementation placed MaybeMarkAndRezoneAsLargeValue calls in
TemplateCcMap's Execute overloads (PostWriteCc, AcquireAllCc, ReadCc,
ReplayLogCc, UploadBatchCc, BackFill). However, ObjectCcMap overrides all
of these, so those calls were dead code for EloqKV.

Remove all 7 MaybeMarkAndRezoneAsLargeValue calls from TemplateCcMap:
- Dead code for EloqKV (ObjectCcMap overrides every method they were in)
- No-ops for EloqSQL/EloqDoc (IsLargeValueZoneEnabled() returns false)

Add MaybeMarkAndRezoneAsLargeValue at 6 correct payload-set sites in ObjectCcMap:
1. Execute(ApplyCc): apply_and_commit_ path after payload commit (status == Normal)
2. Execute(PostWriteCc): after dirty/pending-command commit when commit_ts > 0
3. Execute(UploadBatchCc): after PassInCurrentPayload for Normal records (migration)
4. Execute(KeyObjectStandbyForwardCc): after commands committed to payload (standby)
5. Execute(ReplayLogCc): after EmplaceAndCommitBufferedTxnCommand, payload == Normal
6. BackFill: after DeserializeCurrentPayload for Normal records (KV load path)

Update MaybeMarkAndRezoneAsLargeValue doc comment to list the correct sites.

Co-authored-by: xiexiaoy <5558602+xiexiaoy@users.noreply.github.com>
Copilot AI changed the title Add payload-size-aware cache eviction for EloqKV (large-value LRU zone) Add payload-size-aware large-value zone eviction for EloqKV (ObjectCcMap) Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants