Skip to content

Support optional large object LRU policy#426

Draft
xiexiaoy wants to merge 2 commits intomainfrom
lo_lru
Draft

Support optional large object LRU policy#426
xiexiaoy wants to merge 2 commits intomainfrom
lo_lru

Conversation

@xiexiaoy
Copy link
Collaborator

@xiexiaoy xiexiaoy commented Mar 4, 2026

For EloqKV, objects might become super large, which are heavy to load from
storage. An alternative is allow large objects resident inside memory as long
as possible, compared with small objects.

This PR implements a variant of basic LRU, naming large object LRU (LO-LRU).
Its divides the LRU list into two list logically. One list for large objects,
and another for small objects.

When releasing memory, LO-LRU kickout small list, then large list. In the
implementation, there is just one list, but a separation pointer is maintained
to seperate the list.

[head_ccp] <----> [large_obj_ccp]  <----> [tail_ccp]
                        ^
                protected_head_page

Large object will occupy one page exclusively.

Configuration options:

  • cache_evict_policy: LRU | LO_LRU. LRU by default.
  • lolru_large_obj_threshold_kb: 1024 by default.

A ccmap method EnsureLargeObjOccupyPageAlone() is implemented for:

  • When an object size exceeds lolru_large_obj_threshold_kb, detach it to a
    standalone CcPage if the original CcPage contains more than one objects.
  • Ensure small object page resides between head_ccp and protected_head_page.
  • Ensure large object page resides between tail_ccp and protected_head_page.

Insert a record into ccmap worked as (1)FindEmplace to create CcEntry inside a
CcPage, (2)then commit payload. LO_LRU won't change the workflow, but the commit
payload phase will call EnsureLargeObjOccupyPageAlone() to maintain invariants.

Each large object should occupy one page exclusively.

Summary by CodeRabbit

  • New Features

    • Configurable cache eviction: choose classic LRU or Large-Object-aware LRU via public flags, with a tunable large-object threshold applied at startup and exposed via new public APIs.
    • Large objects can be isolated to dedicated pages to improve eviction behavior and memory stability.
  • Tests

    • New and expanded tests validating large-object eviction, page isolation, protected-list LRU behavior, and related invariants.

@coderabbitai
Copy link

coderabbitai bot commented Mar 4, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a configurable cache-eviction policy (LRU or LO_LRU), exposes and initializes policy flags, integrates LO_LRU-aware page isolation and LRU list semantics in shard/map/page code, updates public APIs, and adds comprehensive unit tests exercising large-object LRU behavior.

Changes

Cohort / File(s) Summary
Policy config & init
core/src/tx_service_init.cpp, tx_service/include/type.h, tx_service/include/tx_service.h
Adds CLI flags cache_evict_policy, lolru_large_obj_threshold_kb; new CacheEvictPolicy enum and policy structs; adds SetupPolicyLRU() / SetupPolicyLoLRU(uint32_t) hooks and validation at init.
Local shard policy plumbing
tx_service/include/cc/local_cc_shards.h, tx_service/include/cc/cc_shard.h, tx_service/src/cc/cc_shard.cpp
Introduces per-shard policy state and accessors, exposes several fields, adds protected_head_page_, and implements LO_LRU-aware LRU insertion/removal/update semantics and threshold access.
Page, entry & map logic
tx_service/include/cc/cc_entry.h, tx_service/include/cc/object_cc_map.h, tx_service/include/cc/template_cc_map.h
Marks pages as large-object pages; adds CcPage helpers (Emplace, MoveEntry, UpdateSmallestTTL); adds EnsureLargeObjOccupyPageAlone; prevents merging/borrowing of LO pages and isolates oversized payloads into dedicated pages.
API signature changes
tx_service/include/store/int_mem_store.h, tx_service/include/tx_service.h
Adds TableEngine table_engine parameter to several IntMemoryStore methods; adds TxService/TxServiceModule delegation methods for policy setup.
Tests & build
tx_service/CMakeLists.txt, tx_service/tests/CMakeLists.txt, tx_service/tests/LargeObjLRU-Test.cpp, tx_service/tests/CcPage-Test.cpp, tx_service/tests/StartTsCollector-Test.cpp, tx_service/tests/include/mock/mock_catalog_factory.h
Adds LargeObjLRU test and registers it; updates tests to pass catalog_factory arrays and new config keys; extends MockCatalogFactory signatures; adds extensive LO_LRU unit tests verifying isolation and LRU invariants.

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant TxSvc as TxService
    participant LocalCS as LocalCcShards
    participant Shard as CcShard
    participant Evict as EvictionMgr
    participant Mem as MemoryMgr

    App->>TxSvc: SetupPolicyLoLRU(threshold_kb)
    activate TxSvc
    TxSvc->>LocalCS: SetupPolicyLoLRU(threshold_kb)
    activate LocalCS
    LocalCS->>LocalCS: set cache_evict_policy_ = LO_LRU<br/>set lo_lru.large_obj_threshold_bytes
    deactivate LocalCS
    deactivate TxSvc

    App->>TxSvc: Start()
    TxSvc->>Shard: init shard (protected_head_page_ = tail)

    Mem->>Shard: memory pressure / page update
    activate Shard
    alt LO_LRU active
        Shard->>Shard: check page.large_obj_page_
        alt large object page
            Shard->>Evict: place at tail / maintain protected list
        else small object page
            Shard->>Evict: insert at protected_head_page_
        end
    else LRU
        Shard->>Evict: insert at tail
    end
    Evict->>Mem: evict selected page
    deactivate Shard
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • liunyl

"I hop through pages both big and small,
I guard the head and fence the hall,
When payloads grow, I make them alone,
A cozy page and a peaceful home,
Hooray — nibble victory for one and all!" 🐇

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Support optional large object LRU policy' accurately and concisely summarizes the main feature added: a new optional LO-LRU eviction policy that prioritizes keeping large objects in memory.
Description check ✅ Passed The PR description is clear and comprehensive, covering the implementation details of LO-LRU, configuration options, and the workflow changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch lo_lru
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can validate your CodeRabbit configuration file in your editor.

If your editor has YAML language server, you can enable auto-completion and validation by adding # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json at the top of your CodeRabbit configuration file.

@xiexiaoy xiexiaoy force-pushed the lo_lru branch 7 times, most recently from 567c302 to 637ea62 Compare March 10, 2026 10:15
@coderabbitai
Copy link

coderabbitai bot commented Mar 10, 2026

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details
{"name":"HttpError","status":500,"request":{"method":"PATCH","url":"https://api.github.com/repos/eloqdata/tx_service/issues/comments/3996737104","headers":{"accept":"application/vnd.github.v3+json","user-agent":"octokit.js/0.0.0-development octokit-core.js/7.0.6 Node.js/24","authorization":"token [REDACTED]","content-type":"application/json; charset=utf-8"},"body":{"body":"<!-- This is an auto-generated comment: summarize by coderabbit.ai -->\n<!-- This is an auto-generated comment: skip review by coderabbit.ai -->\n\n> [!IMPORTANT]\n> ## Review skipped\n> \n> Draft detected.\n> \n> Please check the settings in the CodeRabbit UI or the `.coderabbit.yaml` file in this repository. To trigger a single review, invoke the `@coderabbitai review` command.\n> \n> <details>\n> <summary>⚙️ Run configuration</summary>\n> \n> **Configuration used**: Organization UI\n> \n> **Review profile**: CHILL\n> \n> **Plan**: Pro\n> \n> **Run ID**: `18b289cb-181c-4cd6-b193-eb73be6fcd3e`\n> \n> </details>\n> \n> You can disable this status message by setting the `reviews.review_status` to `false` in the CodeRabbit configuration file.\n> \n> Use the checkbox below for a quick retry:\n> - [ ] <!-- {\"checkboxId\": \"e9bb8d72-00e8-4f67-9cb2-caf3b22574fe\"} --> 🔍 Trigger review\n\n<!-- end of auto-generated comment: skip review by coderabbit.ai -->\n\n\n<!-- finishing_touch_checkbox_start -->\n\n<details>\n<summary>✨ Finishing Touches</summary>\n\n<details>\n<summary>🧪 Generate unit tests (beta)</summary>\n\n- [ ] <!-- {\"checkboxId\": \"f47ac10b-58cc-4372-a567-0e02b2c3d479\", \"radioGroupId\": \"utg-output-choice-group-unknown_comment_id\"} -->   Create PR with unit tests\n- [ ] <!-- {\"checkboxId\": \"07f1e7d6-8a8e-4e23-9900-8731c2c87f58\", \"radioGroupId\": \"utg-output-choice-group-unknown_comment_id\"} -->   Post copyable unit tests in a comment\n- [ ] <!-- {\"checkboxId\": \"6ba7b810-9dad-11d1-80b4-00c04fd430c8\", \"radioGroupId\": \"utg-output-choice-group-unknown_comment_id\"} -->   Commit unit tests in branch `lo_lru`\n\n</details>\n\n</details>\n\n<!-- finishing_touch_checkbox_end -->\n\n<!-- announcements_start -->\n\n> [!TIP]\n> Try [Coding Plans](https://www.coderabbit.ai/issue-planner). Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).\n> Share your feedback on [Discord](https://discord.com/invite/coderabbit).\n\n<!-- announcements_end -->\n\n<!-- tips_start -->\n\n---\n\nThanks for using [CodeRabbit](https://coderabbit.ai?utm_source=oss&utm_medium=github&utm_campaign=eloqdata/tx_service&utm_content=426)! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.\n\n<details>\n<summary>❤️ Share</summary>\n\n- [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai)\n- [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai)\n- [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai)\n- [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)\n\n</details>\n\n<sub>Comment `@coderabbitai help` to get the list of available commands and usage tips.</sub>\n\n<!-- tips_end -->"},"request":{"retryCount":3,"retries":3,"retryAfter":16}}}

@xiexiaoy xiexiaoy force-pushed the lo_lru branch 3 times, most recently from e1081e5 to 946379f Compare March 11, 2026 06:52
@xiexiaoy xiexiaoy requested a review from MrGuin March 11, 2026 09:23
@xiexiaoy xiexiaoy marked this pull request as ready for review March 11, 2026 09:23
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (4)
tx_service/include/tx_service.h (1)

1303-1311: Make these policy setters init-only.

These methods read like live reconfiguration APIs, but the implementation is just shared state mutation in LocalCcShards. That is fine during bootstrap, but risky once worker threads are running. Consider making them private/startup-only, or at least enforcing the pre-Start() contract with an assert.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/tx_service.h` around lines 1303 - 1311, The two public
setters SetupPolicyLRU and SetupPolicyLoLRU mutate shared LocalCcShards and must
be init-only; either move their declarations into the private/startup-only
section of the class or add a precondition check that Start() has not yet run
(e.g., assert(!started_) or equivalent) at the top of SetupPolicyLRU and
SetupPolicyLoLRU to prevent live reconfiguration; update the header/class
visibility or add the assert and document the pre-Start contract for
LocalCcShards usage.
tx_service/tests/include/mock/mock_catalog_factory.h (1)

326-329: Fail fast in CreateTxRecord().

This new mock hook is the only unsupported factory override here that silently returns nullptr. If a LO-LRU test path starts using it, the failure will show up later as a null dereference instead of at the mock boundary. Match the other stubs with assert(false) or return a trivial mock record.

♻️ Suggested change
     std::unique_ptr<TxRecord> CreateTxRecord() const override
     {
-        return nullptr;
+        assert(false);
+        return nullptr;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/tests/include/mock/mock_catalog_factory.h` around lines 326 - 329,
The CreateTxRecord() mock currently returns nullptr which hides errors; change
it to fail fast by either asserting or returning a trivial mock TxRecord: update
the CreateTxRecord() implementation in mock_catalog_factory (method
CreateTxRecord) to call assert(false) (or equivalent test-fail macro) with a
clear message, or allocate and return a minimal TxRecord stub object that
satisfies callers instead of nullptr so tests fail at the mock boundary.
tx_service/include/cc/cc_entry.h (1)

2571-2579: Preserve the page invariant when extracting an entry.

MoveEntry() leaves keys_[idx] in place but nulls entries_[idx] until a separate Remove(). Iterator::UpdateCurrent() and the scan loops in template_cc_map.h dereference entries_[idx] unconditionally, so this API makes it easy to create a crash-prone transient state. Prefer an extract-and-erase helper that removes the slot atomically.

♻️ Suggested direction
-    std::unique_ptr<CcEntry<KeyT, ValueT, VersionedRecord, RangePartitioned>>
-    MoveEntry(size_t idx)
+    std::pair<
+        KeyT,
+        std::unique_ptr<CcEntry<KeyT, ValueT, VersionedRecord,
+                                RangePartitioned>>>
+    ExtractEntry(size_t idx)
     {
-        std::unique_ptr<
-            CcEntry<KeyT, ValueT, VersionedRecord, RangePartitioned>>
-            entry = std::move(entries_[idx]);
+        assert(idx < entries_.size());
+        KeyT key = std::move(keys_[idx]);
+        auto entry = std::move(entries_[idx]);
         assert(entry != nullptr);
-        return entry;
+        keys_.erase(keys_.begin() + idx);
+        entries_.erase(entries_.begin() + idx);
+        return {std::move(key), std::move(entry)};
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/cc/cc_entry.h` around lines 2571 - 2579, MoveEntry leaves
keys_[idx] intact while nulling entries_[idx], creating a transient state that
Iterator::UpdateCurrent and scan loops in template_cc_map.h can dereference;
replace it with an atomic extract-and-erase helper (e.g.,
ExtractAndEraseEntry(size_t idx)) that moves the unique_ptr out, then removes
the slot from both entries_ and keys_ (or otherwise updates both containers in
the same operation so their alignment/invariant is preserved), and update
callers (currently using MoveEntry and Remove) to call this new helper so no
caller observes a null entry with a valid key; reference MoveEntry, entries_,
keys_, Remove(), Iterator::UpdateCurrent and the template_cc_map.h scan code
while making the change.
tx_service/include/cc/cc_shard.h (1)

1155-1167: Prefer targeted accessors over a new public state block.

The LO_LRU work already has narrow entry points (GetCacheEvictPolicy() / LargeObjThresholdBytes()), and the shard IDs already have helpers (LocalCoreId() / GlobalCoreId()). Making meta_data_mux_, local_shards_, memory_limit_, and realtime_sampling_ public widens the API surface more than this feature needs and lets callers bypass CcShard's locking/memory invariants.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/cc/cc_shard.h` around lines 1155 - 1167, Publicizing
meta_data_mux_, local_shards_, memory_limit_, and realtime_sampling_ expands the
class surface and allows bypassing CcShard invariants; change those members back
to private and add narrow accessor methods (e.g., int32_t MetaDataMux() const /
std::atomic<int32_t>& MetaDataMuxRef() for meta_data_mux_, LocalCcShards&
LocalShards() for local_shards_, uint64_t MemoryLimit() const and void
SetMemoryLimit(uint64_t) for memory_limit_, and bool RealtimeSampling() const
for realtime_sampling_) that enforce any required locking or validation in
CcShard, then update all call sites to use these new accessors instead of
directly accessing meta_data_mux_, local_shards_, memory_limit_, or
realtime_sampling_.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/src/tx_service_init.cpp`:
- Around line 233-265: The variable lolru_large_obj_threshold_kb is only set in
the LO_LRU branch but is logged unconditionally later, causing a read of an
uninitialized value; fix by either initializing lolru_large_obj_threshold_kb at
declaration (e.g., 0 or a sensible default) or change the LOG that prints ">>
lru: ..., threshold: ..." to only include the threshold when cache_evict_policy
== txservice::CacheEvictPolicy::LO_LRU (use the cache_evict_policy and
cache_evict_policy_str checks to gate the log), ensuring the variable is never
read before assignment.

In `@tx_service/include/cc/cc_entry.h`:
- Around line 2303-2314: The code in cc_entry.h uses
entry->payload_.cur_payload_->HasTTL() and then only updates smallest_ttl_ when
ttl > smallest_ttl_, which is wrong for tracking the page minimum; change the
logic to set smallest_ttl_ = std::min(smallest_ttl_, ttl) (or, if smallest_ttl_
is UINT64_MAX, assign ttl) whenever HasTTL() is true so a fresh page's
smallest_ttl_ gets initialized and future smaller TTLs reduce it; keep the
existing update to last_dirty_commit_ts_ as-is.

In `@tx_service/include/cc/local_cc_shards.h`:
- Around line 1821-1825: The multiplication in SetupPolicyLoLRU uses 1024ul
which can be 32-bit on some platforms and may overflow before assignment to
u_cache_evict_policy_.lo_lru.large_obj_threshold_bytes (uint64_t); fix by
ensuring the 1024 operand is 64-bit (e.g., use 1024ull or cast the multiplier to
uint64_t) so the expression large_obj_threshold_kb * 1024 is evaluated in 64-bit
and then stored into large_obj_threshold_bytes.

In `@tx_service/include/cc/object_cc_map.h`:
- Around line 1267-1269: The LO_LRU reshaping call must only run on fully
committed, unlocked entries; update each call site that currently invokes
EnsureLargeObjOccupyPageAlone (e.g., in ApplyCc, UploadTxCommandsCc,
CreateDirtyPayloadFromPendingCommand, PostWriteCc) to first confirm
shard_->GetCacheEvictPolicy() == CacheEvictPolicy::LO_LRU AND that the target
entry is unlocked and in a committed/normal or deleted state before calling
EnsureLargeObjOccupyPageAlone; specifically check cce->GetKeyLock() == nullptr
and the entry state (e.g., cce->GetState() == EntryState::Normal ||
cce->GetState() == EntryState::Deleted or an equivalent
IsCommitted()/IsUnlocked() predicate) and ensure there are no pending
write/command attachments (e.g., !cce->HasPendingCommand() or pending command
pointer == nullptr); if these conditions are not met skip the call (or defer
until after unlock/commit) to avoid assertions or relocations while
locked/pending.

In `@tx_service/include/cc/template_cc_map.h`:
- Line 11630: Remove the noisy INFO log in the page-clean hot path by replacing
the unconditional LOG(INFO) << ">> FreeCount: " << clean_guard->FreedCount()
with a debug-only log; update the call in the CleanPage path (referenced by
CleanPage and clean_guard->FreedCount()) to use DLOG(INFO) or VLOG(1) (or wrap
it behind an existing debug flag) so it only emits when verbose/debug logging is
enabled.
- Around line 11966-11989: The new large_obj_ccp_uptr page is left with default
page-level metadata so scans/cleanup can incorrectly skip/handle it; after
Emplace(*key, ccp->MoveEntry(idx_in_ccp)) and before inserting into ccmp_ and
updating LRU, initialize the page's metadata (e.g., set
large_obj_ccp_uptr->last_dirty_commit_ts_ and large_obj_ccp_uptr->smallest_ttl_)
from the moved record or by calling the page-level recompute routine used
elsewhere (or a helper like ComputePageMetadata on CcPage) so the page reflects
the moved entry's dirty/TTL state, then proceed with TryUpdatePageKey,
try_emplace into ccmp_, and shard_->UpdateLruList.
- Around line 11945-11975: The assert that cce->GetKeyLock() == nullptr is
unsafe because we move the CCE into a new CcPage (via Emplace/MOVEEntry) while a
live or recycled key lock may still reference the old page; fix by ensuring the
CCE has no live lock before relocating or by repairing the moved entry’s
lock/page metadata after the move. Concretely, either call
ReleaseCceLock/RecycleKeyLock on the CCE (or ensure the caller did so) prior to
creating large_obj_ccp_uptr and calling Emplace/MoveEntry, or after
Emplace/MoveEntry update the moved CCE’s internal pointer(s) (e.g., its page
reference and key-lock pointer via the same mechanisms used by
ReleaseCceLock/RecycleKeyLock) so the lock object points at the new CcPage;
apply this change around the block that constructs large_obj_ccp_uptr and
invokes Emplace/MoveEntry/Remove to ensure no dangling lock references.

In `@tx_service/src/cc/cc_entry.cpp`:
- Around line 72-73: The unconditional LOG(INFO) in the hot path should be
removed or gated: replace the LOG(INFO) << ">> commit_ts: " << CommitTs() << ",
checkpoint_ts: " << CkptTs(); with a gated log (e.g., VLOG(1) or DLOG(INFO)) or
wrap it behind the cheap predicate if (IsPersistent()) { LOG(INFO) ... } so the
hot-path during eviction/checkpoint scans doesn't unconditionally emit
INFO-level logs; update the logging call near CommitTs() and CkptTs()
accordingly.

In `@tx_service/src/cc/cc_shard.cpp`:
- Around line 929-939: The protected-list head handling assumes
protected_head_page_ is still a large object page; guard against pages that were
flipped (large_obj_page_ toggled) before this path: in the LO_LRU branch where
you compare protected_head_page_ == page, first advance protected_head_page_
while it is not &tail_ccp_ and protected_head_page_->large_obj_page_ is false
(skip demoted pages) so you never assert on a small page; also ensure any cached
insert_before pointer (and code in DetachLru/DetachLru-related logic) is cleared
or updated when a page is detached/demoted so reinsert paths do not dereference
a now-detached page’s lru_prev_; apply the same guard/update to the other LO_LRU
handling region noted (the block around the other occurrence).

---

Nitpick comments:
In `@tx_service/include/cc/cc_entry.h`:
- Around line 2571-2579: MoveEntry leaves keys_[idx] intact while nulling
entries_[idx], creating a transient state that Iterator::UpdateCurrent and scan
loops in template_cc_map.h can dereference; replace it with an atomic
extract-and-erase helper (e.g., ExtractAndEraseEntry(size_t idx)) that moves the
unique_ptr out, then removes the slot from both entries_ and keys_ (or otherwise
updates both containers in the same operation so their alignment/invariant is
preserved), and update callers (currently using MoveEntry and Remove) to call
this new helper so no caller observes a null entry with a valid key; reference
MoveEntry, entries_, keys_, Remove(), Iterator::UpdateCurrent and the
template_cc_map.h scan code while making the change.

In `@tx_service/include/cc/cc_shard.h`:
- Around line 1155-1167: Publicizing meta_data_mux_, local_shards_,
memory_limit_, and realtime_sampling_ expands the class surface and allows
bypassing CcShard invariants; change those members back to private and add
narrow accessor methods (e.g., int32_t MetaDataMux() const /
std::atomic<int32_t>& MetaDataMuxRef() for meta_data_mux_, LocalCcShards&
LocalShards() for local_shards_, uint64_t MemoryLimit() const and void
SetMemoryLimit(uint64_t) for memory_limit_, and bool RealtimeSampling() const
for realtime_sampling_) that enforce any required locking or validation in
CcShard, then update all call sites to use these new accessors instead of
directly accessing meta_data_mux_, local_shards_, memory_limit_, or
realtime_sampling_.

In `@tx_service/include/tx_service.h`:
- Around line 1303-1311: The two public setters SetupPolicyLRU and
SetupPolicyLoLRU mutate shared LocalCcShards and must be init-only; either move
their declarations into the private/startup-only section of the class or add a
precondition check that Start() has not yet run (e.g., assert(!started_) or
equivalent) at the top of SetupPolicyLRU and SetupPolicyLoLRU to prevent live
reconfiguration; update the header/class visibility or add the assert and
document the pre-Start contract for LocalCcShards usage.

In `@tx_service/tests/include/mock/mock_catalog_factory.h`:
- Around line 326-329: The CreateTxRecord() mock currently returns nullptr which
hides errors; change it to fail fast by either asserting or returning a trivial
mock TxRecord: update the CreateTxRecord() implementation in
mock_catalog_factory (method CreateTxRecord) to call assert(false) (or
equivalent test-fail macro) with a clear message, or allocate and return a
minimal TxRecord stub object that satisfies callers instead of nullptr so tests
fail at the mock boundary.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: aecd631c-d37c-47a2-9705-9265507182de

📥 Commits

Reviewing files that changed from the base of the PR and between f94af00 and 946379f.

📒 Files selected for processing (17)
  • core/src/tx_service_init.cpp
  • tx_service/CMakeLists.txt
  • tx_service/include/cc/cc_entry.h
  • tx_service/include/cc/cc_shard.h
  • tx_service/include/cc/local_cc_shards.h
  • tx_service/include/cc/object_cc_map.h
  • tx_service/include/cc/template_cc_map.h
  • tx_service/include/store/int_mem_store.h
  • tx_service/include/tx_service.h
  • tx_service/include/type.h
  • tx_service/src/cc/cc_entry.cpp
  • tx_service/src/cc/cc_shard.cpp
  • tx_service/tests/CMakeLists.txt
  • tx_service/tests/CcPage-Test.cpp
  • tx_service/tests/LargeObjLRU-Test.cpp
  • tx_service/tests/StartTsCollector-Test.cpp
  • tx_service/tests/include/mock/mock_catalog_factory.h

Comment on lines +1267 to +1269
if (shard_->GetCacheEvictPolicy() == CacheEvictPolicy::LO_LRU)
{
EnsureLargeObjOccupyPageAlone(ccp, cce);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Run LO-LRU reshaping only on committed, fully unlocked entries.

EnsureLargeObjOccupyPageAlone() in tx_service/include/cc/template_cc_map.h:11941-11999 immediately asserts cce->GetKeyLock() == nullptr and only handles Normal / Deleted. These new call sites do not enforce that contract: ApplyCc, UploadTxCommandsCc, and CreateDirtyPayloadFromPendingCommand invoke it while write/pending-command state is still attached; PostWriteCc can hit it on an aborted IgnoreOldValue() entry that is still Unknown; and the replay/standby/backfill paths can still retain extra data while buffered commands remain. With LO_LRU enabled, this can assert or relocate a locked entry between pages.

Also applies to: 1522-1525, 1871-1875, 2103-2106, 2555-2558, 2736-2739, 2825-2830

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/cc/object_cc_map.h` around lines 1267 - 1269, The LO_LRU
reshaping call must only run on fully committed, unlocked entries; update each
call site that currently invokes EnsureLargeObjOccupyPageAlone (e.g., in
ApplyCc, UploadTxCommandsCc, CreateDirtyPayloadFromPendingCommand, PostWriteCc)
to first confirm shard_->GetCacheEvictPolicy() == CacheEvictPolicy::LO_LRU AND
that the target entry is unlocked and in a committed/normal or deleted state
before calling EnsureLargeObjOccupyPageAlone; specifically check
cce->GetKeyLock() == nullptr and the entry state (e.g., cce->GetState() ==
EntryState::Normal || cce->GetState() == EntryState::Deleted or an equivalent
IsCommitted()/IsUnlocked() predicate) and ensure there are no pending
write/command attachments (e.g., !cce->HasPendingCommand() or pending command
pointer == nullptr); if these conditions are not met skip the call (or defer
until after unlock/commit) to avoid assertions or relocations while
locked/pending.

const KeyT &key = page->keys_[0];
page_it = ccmp_.find(key);
}
LOG(INFO) << ">> FreeCount: " << clean_guard->FreedCount();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove the INFO log from the page-clean hot path.

CleanPage() runs during eviction/cleanup. Logging every FreedCount at INFO will flood logs and add I/O exactly when the shard is already under memory pressure. Please gate this behind DLOG/VLOG or a debug flag instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/cc/template_cc_map.h` at line 11630, Remove the noisy INFO
log in the page-clean hot path by replacing the unconditional LOG(INFO) << ">>
FreeCount: " << clean_guard->FreedCount() with a debug-only log; update the call
in the CleanPage path (referenced by CleanPage and clean_guard->FreedCount()) to
use DLOG(INFO) or VLOG(1) (or wrap it behind an existing debug flag) so it only
emits when verbose/debug logging is enabled.

Comment on lines +72 to +73
LOG(INFO) << ">> commit_ts: " << CommitTs()
<< ", checkpoint_ts: " << CkptTs();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove the unconditional INFO log from this hot path.

IsPersistent() is a cheap predicate today; turning every call into LOG(INFO) will flood logs and add avoidable I/O/formatting overhead during eviction/checkpoint scans. If you still need this temporarily, gate it behind VLOG/DLOG instead.

Proposed fix
-        LOG(INFO) << ">> commit_ts: " << CommitTs()
-                  << ", checkpoint_ts: " << CkptTs();
+        VLOG(1) << "commit_ts=" << CommitTs()
+                << ", checkpoint_ts=" << CkptTs();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
LOG(INFO) << ">> commit_ts: " << CommitTs()
<< ", checkpoint_ts: " << CkptTs();
VLOG(1) << "commit_ts=" << CommitTs()
<< ", checkpoint_ts=" << CkptTs();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/src/cc/cc_entry.cpp` around lines 72 - 73, The unconditional
LOG(INFO) in the hot path should be removed or gated: replace the LOG(INFO) <<
">> commit_ts: " << CommitTs() << ", checkpoint_ts: " << CkptTs(); with a gated
log (e.g., VLOG(1) or DLOG(INFO)) or wrap it behind the cheap predicate if
(IsPersistent()) { LOG(INFO) ... } so the hot-path during eviction/checkpoint
scans doesn't unconditionally emit INFO-level logs; update the logging call near
CommitTs() and CkptTs() accordingly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
tx_service/include/cc/template_cc_map.h (2)

11962-12020: ⚠️ Potential issue | 🔴 Critical

Relocating the large object here breaks page invariants.

large_obj_ccp_uptr is always linked after ccp, so this stays ordered only when the extracted key is already ccp's last key. If idx_in_ccp is 0 or in the middle, ccp still keeps keys that must sort on both sides of *key, and ccmp_/iterator lookups can walk the wrong page order. This block also bypasses the usual cross-page repair work: the moved entry is not reparented with UpdateCcPage(...), and the new page keeps default dirty/TTL metadata. Please split ccp around idx_in_ccp (or restrict this fast path to tail keys) and initialize/reparent the relocated page before inserting it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/cc/template_cc_map.h` around lines 11962 - 12020,
EnsureLargeObjOccupyPageAlone currently relocates an entry into a new
large_obj_ccp_uptr positioned after ccp without preserving page key ordering or
performing cross-page repair; either restrict this fast path to when idx_in_ccp
is the tail key or split ccp at idx_in_ccp so keys left/right remain correctly
ordered, then before ccmp_.try_emplace(*key, ...) call perform proper
reparenting/initialization: run UpdateCcPage(...) (or equivalent) on
large_obj_ccp_uptr to set parent pointers, update dirty/TTL/size metadata, and
call shard_->UpdateLruList(...) for the new page (and update ccp metadata) so
the moved entry is fully initialized and pages remain invariant-consistent.

11651-11651: ⚠️ Potential issue | 🟠 Major

Keep the page-clean hot path out of INFO logs.

CleanPage() sits on the eviction/cleanup path. Emitting FreedCount at Line 11651 will spam logs exactly when the shard is under memory pressure; please gate this behind DLOG/VLOG instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/cc/template_cc_map.h` at line 11651, The log statement on
the page-clean hot path is too noisy: replace the LOG(INFO) << ">> FreeCount: "
<< clean_guard->FreedCount(); call in CleanPage() with a debug-only log (e.g.,
DLOG(INFO) << ... or VLOG(1) << ...) so FreedCount() is only emitted under
debug/verbose logging and not during normal eviction/cleanup activity; locate
the statement using the clean_guard->FreedCount() reference and update the macro
accordingly.
🧹 Nitpick comments (2)
tx_service/tests/LargeObjLRU-Test.cpp (2)

6-7: The #define private public hack can cause ODR violations.

While this is a common pattern for white-box testing, redefining access specifiers can lead to One Definition Rule violations if the same headers are included differently in other translation units linked into the same test binary. This can cause subtle, hard-to-debug issues.

Consider using friend declarations in the classes under test, or a dedicated test accessor class pattern for safer access to internals.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/tests/LargeObjLRU-Test.cpp` around lines 6 - 7, The test currently
redefines access specifiers via the macros "#define protected public" and
"#define private public", which risks ODR violations; remove these macro hacks
from LargeObjLRU-Test.cpp and instead expose needed internals by adding explicit
test-only access (e.g., add friend test declarations or a TestAccessor class in
the production class such as LargeObjLRU) or by providing accessors/methods
behind a TESTING build flag; update the test to include the new TestAccessor or
friend-enabled methods and ensure no headers are pulled in with different
access-specifier definitions in other translation units.

200-205: Raw pointers returned from helper risk memory leaks on test failure.

If a REQUIRE assertion fails before the delete statements at the end of each test, the allocated pages will leak. Consider using std::unique_ptr or RAII wrappers to ensure cleanup even on early exits.

♻️ Suggested approach using unique_ptr
template <typename KeyT, typename ValT>
static auto make_ccmap_and_pages(
    CcShard &sh,
    std::unique_ptr<TemplateCcMap<KeyT, ValT, true, true>> &ccmap_uptr)
{
    // ... existing setup ...
    
    struct PageSet {
        std::unique_ptr<CcPage<KeyT, ValT, true, true>> neg_inf;
        std::unique_ptr<CcPage<KeyT, ValT, true, true>> pos_inf;
        std::unique_ptr<CcPage<KeyT, ValT, true, true>> p1;
        std::unique_ptr<CcPage<KeyT, ValT, true, true>> p2;
    };
    
    PageSet pages;
    pages.neg_inf = std::make_unique<CcPage<KeyT, ValT, true, true>>(cc_map);
    pages.pos_inf = std::make_unique<CcPage<KeyT, ValT, true, true>>(cc_map);
    pages.p1 = std::make_unique<CcPage<KeyT, ValT, true, true>>(
        cc_map, pages.neg_inf.get(), pages.pos_inf.get());
    pages.p2 = std::make_unique<CcPage<KeyT, ValT, true, true>>(
        cc_map, pages.p1.get(), pages.pos_inf.get());
    return pages;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/tests/LargeObjLRU-Test.cpp` around lines 200 - 205, The helper
make_ccmap_and_pages currently returns raw CcPage pointers which can leak if a
REQUIRE triggers early exit; change it to return RAII-managed pages (e.g., a
PageSet struct holding std::unique_ptr<CcPage<KeyT,ValT,true,true>>) and
allocate pages with std::make_unique, passing raw .get() pointers where links
are required; also update the caller tests to accept the PageSet and use the
unique_ptrs (or .get()) so deletions are automatic on scope exit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tx_service/include/cc/template_cc_map.h`:
- Around line 10346-10365: The branch is wrongly re-checking
target_page->large_obj_page_ (always true) when deciding to emplace a new page;
update the condition to inspect the candidate iterator's page instead. In the
block inside CacheEvictPolicy::LO_LRU where you advance target_it, change the
emplace condition to check (target_it == ccmp_.end() ||
target_it->second->large_obj_page_ || target_it->second->Full()) so you only
allocate when there is no candidate or the candidate is a LO page or is full;
keep using CcPage<KeyT,ValueT,VersionedRecord,RangePartitioned> and target_page
= target_it->second.get() afterwards.

In `@tx_service/tests/include/mock/mock_catalog_factory.h`:
- Around line 308-318: The two CreateTxKey overrides (CreateTxKey() and
CreateTxKey(const char*, size_t)) and the CreateTxRecord stub must return
concrete mock instances instead of asserting; replace the assert(false) + empty
TxKey() with construction of a testable TxKey backed by the same mock CcMap
types used elsewhere in this mock (use the mock CcMap/Mock* types already
present in mock_catalog_factory.h), and for the const char*,size overload
initialize the mock key data from the provided buffer/size so tests get a usable
key; do the same for CreateTxRecord() so it returns a real mock record instance
compatible with the mock CcMap.
- Around line 320-323: PackedNegativeInfinity currently asserts and returns
nullptr which diverges from production; replace it with a stable non-null
sentinel TxKey instance that represents packed negative infinity and return its
pointer. Implement this by creating a static (or static-allocated) TxKey
sentinel inside the mock (so lifetime is process-wide) initialized the same way
production does for its negative-infinity boundary, and have
PackedNegativeInfinity() return &sentinel; ensure the sentinel initialization
uses the same packing/key representation as the real factory so mock behavior
matches CatalogFactory::PackedNegativeInfinity().

---

Duplicate comments:
In `@tx_service/include/cc/template_cc_map.h`:
- Around line 11962-12020: EnsureLargeObjOccupyPageAlone currently relocates an
entry into a new large_obj_ccp_uptr positioned after ccp without preserving page
key ordering or performing cross-page repair; either restrict this fast path to
when idx_in_ccp is the tail key or split ccp at idx_in_ccp so keys left/right
remain correctly ordered, then before ccmp_.try_emplace(*key, ...) call perform
proper reparenting/initialization: run UpdateCcPage(...) (or equivalent) on
large_obj_ccp_uptr to set parent pointers, update dirty/TTL/size metadata, and
call shard_->UpdateLruList(...) for the new page (and update ccp metadata) so
the moved entry is fully initialized and pages remain invariant-consistent.
- Line 11651: The log statement on the page-clean hot path is too noisy: replace
the LOG(INFO) << ">> FreeCount: " << clean_guard->FreedCount(); call in
CleanPage() with a debug-only log (e.g., DLOG(INFO) << ... or VLOG(1) << ...) so
FreedCount() is only emitted under debug/verbose logging and not during normal
eviction/cleanup activity; locate the statement using the
clean_guard->FreedCount() reference and update the macro accordingly.

---

Nitpick comments:
In `@tx_service/tests/LargeObjLRU-Test.cpp`:
- Around line 6-7: The test currently redefines access specifiers via the macros
"#define protected public" and "#define private public", which risks ODR
violations; remove these macro hacks from LargeObjLRU-Test.cpp and instead
expose needed internals by adding explicit test-only access (e.g., add friend
test declarations or a TestAccessor class in the production class such as
LargeObjLRU) or by providing accessors/methods behind a TESTING build flag;
update the test to include the new TestAccessor or friend-enabled methods and
ensure no headers are pulled in with different access-specifier definitions in
other translation units.
- Around line 200-205: The helper make_ccmap_and_pages currently returns raw
CcPage pointers which can leak if a REQUIRE triggers early exit; change it to
return RAII-managed pages (e.g., a PageSet struct holding
std::unique_ptr<CcPage<KeyT,ValT,true,true>>) and allocate pages with
std::make_unique, passing raw .get() pointers where links are required; also
update the caller tests to accept the PageSet and use the unique_ptrs (or
.get()) so deletions are automatic on scope exit.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4be5298d-1f38-477a-8c68-47e4c1c24b63

📥 Commits

Reviewing files that changed from the base of the PR and between 946379f and 4ebaf4c.

📒 Files selected for processing (12)
  • core/src/tx_service_init.cpp
  • tx_service/CMakeLists.txt
  • tx_service/include/cc/cc_entry.h
  • tx_service/include/cc/object_cc_map.h
  • tx_service/include/cc/template_cc_map.h
  • tx_service/include/store/int_mem_store.h
  • tx_service/src/cc/cc_shard.cpp
  • tx_service/tests/CMakeLists.txt
  • tx_service/tests/CcPage-Test.cpp
  • tx_service/tests/LargeObjLRU-Test.cpp
  • tx_service/tests/StartTsCollector-Test.cpp
  • tx_service/tests/include/mock/mock_catalog_factory.h
🚧 Files skipped from review as they are similar to previous changes (5)
  • tx_service/tests/StartTsCollector-Test.cpp
  • tx_service/include/cc/cc_entry.h
  • tx_service/tests/CMakeLists.txt
  • tx_service/CMakeLists.txt
  • tx_service/include/store/int_mem_store.h

Comment on lines +308 to +318
TxKey CreateTxKey() const override
{
assert(false);
return TxKey();
}

TxKey CreateTxKey(const char *data, size_t size) const override
{
assert(false);
return TxKey();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Implement the new creation APIs with usable mock objects.

These overrides were added to match the CatalogFactory contract, but they still return empty/null placeholders. If the new policy-driven test setup reaches them, the failure mode comes from the mock rather than the LO-LRU code under test. Please back CreateTxKey(...) and CreateTxRecord() with concrete test key/record instances that match the mock CcMap types.

Also applies to: 326-328

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/tests/include/mock/mock_catalog_factory.h` around lines 308 - 318,
The two CreateTxKey overrides (CreateTxKey() and CreateTxKey(const char*,
size_t)) and the CreateTxRecord stub must return concrete mock instances instead
of asserting; replace the assert(false) + empty TxKey() with construction of a
testable TxKey backed by the same mock CcMap types used elsewhere in this mock
(use the mock CcMap/Mock* types already present in mock_catalog_factory.h), and
for the const char*,size overload initialize the mock key data from the provided
buffer/size so tests get a usable key; do the same for CreateTxRecord() so it
returns a real mock record instance compatible with the mock CcMap.

Comment on lines +320 to +323
const TxKey *PackedNegativeInfinity() const override
{
assert(false);
return nullptr;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Return a real packed negative-infinity sentinel here.

CatalogFactory::PackedNegativeInfinity() is a boundary-key API, and the concrete factory returns a stable non-null TxKey * in tx_service/src/eloq_basic_catalog_factory.cpp:208-211. Returning nullptr on Line 323 makes this mock diverge from production semantics and can break range/scan paths before the LO-LRU behavior is exercised.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/tests/include/mock/mock_catalog_factory.h` around lines 320 - 323,
PackedNegativeInfinity currently asserts and returns nullptr which diverges from
production; replace it with a stable non-null sentinel TxKey instance that
represents packed negative infinity and return its pointer. Implement this by
creating a static (or static-allocated) TxKey sentinel inside the mock (so
lifetime is process-wide) initialized the same way production does for its
negative-infinity boundary, and have PackedNegativeInfinity() return &sentinel;
ensure the sentinel initialization uses the same packing/key representation as
the real factory so mock behavior matches
CatalogFactory::PackedNegativeInfinity().

@xiexiaoy xiexiaoy force-pushed the lo_lru branch 2 times, most recently from 936cd55 to 1b27b7a Compare March 12, 2026 09:07
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (4)
tx_service/tests/include/mock/mock_catalog_factory.h (2)

320-323: ⚠️ Potential issue | 🟠 Major

Keep PackedNegativeInfinity() non-null.

Line 323 returns nullptr for a boundary-key factory method. That makes range/scan-style tests trip over the mock before they reach the eviction-policy logic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/tests/include/mock/mock_catalog_factory.h` around lines 320 - 323,
PackedNegativeInfinity() currently asserts and returns nullptr which breaks
range/scan tests; modify the MockCatalogFactory::PackedNegativeInfinity()
implementation to return a valid non-null TxKey sentinel instead of nullptr
(remove the assert(false)); create or reuse a static/constexpr TxKey instance
representing negative-infinity (e.g., a static TxKey kPackedNegInf) and return
its pointer, ensuring lifetime covers test usage so callers never get nullptr.

308-318: ⚠️ Potential issue | 🟠 Major

Return usable mock objects from the new const factory APIs.

Lines 308-318 and Line 326 still hard-fail or return nullptr, so any test path that now exercises these overrides fails in the mock rather than the LO-LRU code under test.

Also applies to: 326-328

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/tests/include/mock/mock_catalog_factory.h` around lines 308 - 318,
The two const factory overrides CreateTxKey() const and CreateTxKey(const char
*data, size_t size) const currently assert(false) and return an invalid TxKey,
causing tests to hard-fail; change them to return valid mock TxKey objects
(e.g., construct and return a default/valid TxKey or build one from the provided
data/size) so the mock factory exercises LO-LRU code paths instead of aborting;
apply the same fix pattern to the other failing const override referenced (the
method around lines 326-328) so all const factory APIs return usable mock
objects rather than asserting or returning null.
tx_service/include/cc/template_cc_map.h (1)

10346-10363: ⚠️ Potential issue | 🔴 Critical

Keep the fallback page anchored to the current LO page.

After Line 10350, target_it is dereferenced before the end() check, and the fallback page is still constructed relative to the candidate next page instead of the current large-object page. In the “LO page is last” / “next page is unusable” paths this can either walk past ccmp_.end() or splice the new page into the wrong spot in the page chain.

♻️ Minimal fix
         if (shard_->GetCacheEvictPolicy() == CacheEvictPolicy::LO_LRU)
         {
             if (target_page->large_obj_page_)
             {
-                target_it++;
-                target_page = target_it->second.get();
-                if (target_it == ccmp_.end() || target_page->large_obj_page_ ||
-                    target_page->Full())
+                auto *lo_page = target_page;
+                ++target_it;
+                auto *next_page =
+                    target_it == ccmp_.end() ? nullptr : target_it->second.get();
+                if (next_page == nullptr || next_page->large_obj_page_ ||
+                    next_page->Full())
                 {
                     target_it = ccmp_.try_emplace(
                         target_it,
                         key,
                         std::make_unique<CcPage<KeyT,
                                                 ValueT,
                                                 VersionedRecord,
                                                 RangePartitioned>>(
-                            this, target_page, target_page->next_page_));
-                    target_page = target_it->second.get();
+                            this, lo_page, lo_page->next_page_));
+                    next_page = target_it->second.get();
                 }
+                target_page = next_page;
             }
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/cc/template_cc_map.h` around lines 10346 - 10363, The code
dereferences target_it and constructs the fallback page relative to the wrong
anchor when handling LO pages under shard_->GetCacheEvictPolicy() ==
CacheEvictPolicy::LO_LRU; fix by first checking target_it != ccmp_.end() before
using *target_it, and when you need to insert a fallback LO page use the current
large-object page (the original target_page before increment) as the anchor for
try_emplace/constructor (not the advanced iterator or its next_page_), adjusting
the iterator manipulation so you never dereference past ccmp_.end() and the new
CcPage<KeyT,ValueT,VersionedRecord,RangePartitioned> is spliced after the actual
current LO page.
tx_service/include/cc/object_cc_map.h (1)

1267-1270: ⚠️ Potential issue | 🔴 Critical

Defer LO-LRU reshaping until the entry is fully committed and unlocked.

Line 1267 and Line 1871 still run before ReleaseCceLock(), Line 2825 runs on dirty/uncommitted state, and Line 1522 can still see an aborted Unknown entry. The same risk is still present in the standby/replay paths when buffered state or a key lock remains. EnsureLargeObjOccupyPageAlone() should only run on a final Normal/Deleted entry with no lock, no pending command, and no buffered commands; otherwise it can relocate an entry under an active txn or trip its preconditions.

🔒 Minimal guard/defer pattern
- if (shard_->GetCacheEvictPolicy() == CacheEvictPolicy::LO_LRU)
+ if (shard_->GetCacheEvictPolicy() == CacheEvictPolicy::LO_LRU &&
+     cce->GetKeyLock() == nullptr &&
+     cce->IsNullPendingCmd() &&
+     !cce->HasBufferedCommandList() &&
+     (cce->PayloadStatus() == RecordStatus::Normal ||
+      cce->PayloadStatus() == RecordStatus::Deleted))
  {
-     EnsureLargeObjOccupyPageAlone(ccp, cce);
+     EnsureLargeObjOccupyPageAlone(
+         static_cast<CcPage<KeyT, ValueT, false, false> *>(cce->GetCcPage()),
+         cce);
  }

For the pre-unlock paths, move the call to after ReleaseCceLock() rather than just guarding it.

Also applies to: 1522-1525, 1871-1875, 2103-2107, 2555-2559, 2825-2830

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/cc/object_cc_map.h` around lines 1267 - 1270, The LO-LRU
reshaping call EnsureLargeObjOccupyPageAlone(ccp, cce) is running too early
(while entry may be locked/dirty/unfinished); move and guard this call so it
only runs after the entry is fully committed/unlocked: for pre-unlock code
paths, relocate the EnsureLargeObjOccupyPageAlone call to after
ReleaseCceLock(), and add explicit checks before invoking it that
shard_->GetCacheEvictPolicy() == CacheEvictPolicy::LO_LRU and the cce entry is
in a final state (Normal or Deleted), has no lock, no pending command, and no
buffered commands; apply this change to all sites that currently call
EnsureLargeObjOccupyPageAlone (the occurrences around the GetCacheEvictPolicy
check and usages of ccp/cce) so standby/replay and dirty/uncommitted paths are
protected.
🧹 Nitpick comments (2)
tx_service/include/cc/cc_entry.h (1)

2571-2579: Add the usual bounds check in MoveEntry().

Line 2576 indexes entries_[idx] directly, unlike the other index-based page helpers. An assert(idx < entries_.size()) here would turn a bad re-homing index into a fast failure instead of UB.

Suggested change
     MoveEntry(size_t idx)
     {
+        assert(idx < entries_.size());
         std::unique_ptr<
             CcEntry<KeyT, ValueT, VersionedRecord, RangePartitioned>>
             entry = std::move(entries_[idx]);
         assert(entry != nullptr);
         return entry;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/cc/cc_entry.h` around lines 2571 - 2579, MoveEntry
currently indexes entries_[idx] without validating bounds; add the same bounds
check used elsewhere to fail fast. Inside CcEntry::MoveEntry (the MoveEntry
method returning std::unique_ptr<CcEntry<KeyT, ValueT, VersionedRecord,
RangePartitioned>>), insert an assertion like assert(idx < entries_.size())
before accessing entries_[idx] so an out-of-range re-homing index triggers a
fast failure instead of undefined behavior.
tx_service/include/cc/local_cc_shards.h (1)

1816-1826: Make the policy setters startup-only or rebuild shard state.

These methods only flip cache_evict_policy_ and the threshold. If they are called after pages are already in the LRU, existing shards keep the old layout and LO-LRU invariants can drift. An assertion or documented precondition that they run before TxService::Start() would make this public API much safer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/cc/local_cc_shards.h` around lines 1816 - 1826, The
setters SetupPolicyLRU and SetupPolicyLoLRU currently flip cache_evict_policy_
and the threshold after runtime which can break invariants for
already-initialized shards; make them startup-only by checking a "started" state
or rebuild shard state: either assert the service hasn't started (e.g., require
!started_ and throw/log/assert in SetupPolicyLRU and SetupPolicyLoLRU) or, if
called at runtime, iterate existing shards (use the shard container member) and
update each shard's LRU data structures and thresholds to match
cache_evict_policy_ and u_cache_evict_policy_.lo_lru.large_obj_threshold_bytes
so in-memory shard layout stays consistent with the new policy.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tx_service/include/cc/cc_entry.h`:
- Around line 2303-2310: The code dereferences entry->payload_.cur_payload_
without checking for null or tombstone; update the logic that updates
smallest_ttl_ to first check if entry->payload_.cur_payload_ is non-null and not
a tombstone before calling HasTTL/GetTTL, and when walking the full-recompute
path ensure the deleted-entry sentinel is applied (set smallest_ttl_ = 0) if any
entry has a null/tombstone payload; apply the same null/tombstone guard to the
other occurrence around lines referencing entry->payload_.cur_payload_ (the
block at 2581-2597) so you never dereference a null payload and you correctly
set smallest_ttl_ = 0 for deleted entries.

In `@tx_service/include/cc/template_cc_map.h`:
- Around line 11982-12011: The code moves an arbitrary entry into a new
large_obj page but always links/places it after ccp, breaking sorted ordering
when idx_in_ccp is not the tail; fix by splitting ccp at idx_in_ccp (or moving
all entries after idx_in_ccp into the new page rather than a single entry) so
the new page contains the correct contiguous key range, update
ccp->Remove/UpdateSmallestTTL on the source and set
large_obj_ccp_uptr->large_obj_page_, prev_page_/next_page_ pointers and LRU via
shard_->UpdateLruList for both pages, then insert the new page into ccmp_ at the
proper sorted position (use ccmp_.upper_bound on the new page key and adjust
iterator like the existing logic) and call TryUpdatePageKey for the correct
neighbor(s) so iterator traversal and ccmp_ ordering remain consistent.

In `@tx_service/tests/LargeObjLRU-Test.cpp`:
- Around line 67-123: The helper make_shard (and make_shard_config) currently
creates ng_configs and raft_path on the stack and passes their addresses into
LocalCcShards and Sharder::Instance, which leaves dangling pointers when the
helper returns; fix by heap-allocating and transferring ownership of those
objects so they outlive the returned shard objects — e.g., create a
std::shared_ptr<std::unordered_map<uint32_t, std::vector<NodeConfig>>>
ng_configs_ptr and a std::shared_ptr<std::string> raft_path_ptr (or return a
struct/tuple that includes owning unique_ptr/shared_ptr for ng_configs and
raft_path) and pass ng_configs_ptr.get() and raft_path_ptr.get() into
LocalCcShards, CcShard and Sharder::Instance; apply the same change in
make_shard_config so the addresses passed into LocalCcShards/CcShard/Sharder
come from heap-owned objects that are returned/owned alongside the shard
fixtures.

---

Duplicate comments:
In `@tx_service/include/cc/object_cc_map.h`:
- Around line 1267-1270: The LO-LRU reshaping call
EnsureLargeObjOccupyPageAlone(ccp, cce) is running too early (while entry may be
locked/dirty/unfinished); move and guard this call so it only runs after the
entry is fully committed/unlocked: for pre-unlock code paths, relocate the
EnsureLargeObjOccupyPageAlone call to after ReleaseCceLock(), and add explicit
checks before invoking it that shard_->GetCacheEvictPolicy() ==
CacheEvictPolicy::LO_LRU and the cce entry is in a final state (Normal or
Deleted), has no lock, no pending command, and no buffered commands; apply this
change to all sites that currently call EnsureLargeObjOccupyPageAlone (the
occurrences around the GetCacheEvictPolicy check and usages of ccp/cce) so
standby/replay and dirty/uncommitted paths are protected.

In `@tx_service/include/cc/template_cc_map.h`:
- Around line 10346-10363: The code dereferences target_it and constructs the
fallback page relative to the wrong anchor when handling LO pages under
shard_->GetCacheEvictPolicy() == CacheEvictPolicy::LO_LRU; fix by first checking
target_it != ccmp_.end() before using *target_it, and when you need to insert a
fallback LO page use the current large-object page (the original target_page
before increment) as the anchor for try_emplace/constructor (not the advanced
iterator or its next_page_), adjusting the iterator manipulation so you never
dereference past ccmp_.end() and the new
CcPage<KeyT,ValueT,VersionedRecord,RangePartitioned> is spliced after the actual
current LO page.

In `@tx_service/tests/include/mock/mock_catalog_factory.h`:
- Around line 320-323: PackedNegativeInfinity() currently asserts and returns
nullptr which breaks range/scan tests; modify the
MockCatalogFactory::PackedNegativeInfinity() implementation to return a valid
non-null TxKey sentinel instead of nullptr (remove the assert(false)); create or
reuse a static/constexpr TxKey instance representing negative-infinity (e.g., a
static TxKey kPackedNegInf) and return its pointer, ensuring lifetime covers
test usage so callers never get nullptr.
- Around line 308-318: The two const factory overrides CreateTxKey() const and
CreateTxKey(const char *data, size_t size) const currently assert(false) and
return an invalid TxKey, causing tests to hard-fail; change them to return valid
mock TxKey objects (e.g., construct and return a default/valid TxKey or build
one from the provided data/size) so the mock factory exercises LO-LRU code paths
instead of aborting; apply the same fix pattern to the other failing const
override referenced (the method around lines 326-328) so all const factory APIs
return usable mock objects rather than asserting or returning null.

---

Nitpick comments:
In `@tx_service/include/cc/cc_entry.h`:
- Around line 2571-2579: MoveEntry currently indexes entries_[idx] without
validating bounds; add the same bounds check used elsewhere to fail fast. Inside
CcEntry::MoveEntry (the MoveEntry method returning std::unique_ptr<CcEntry<KeyT,
ValueT, VersionedRecord, RangePartitioned>>), insert an assertion like
assert(idx < entries_.size()) before accessing entries_[idx] so an out-of-range
re-homing index triggers a fast failure instead of undefined behavior.

In `@tx_service/include/cc/local_cc_shards.h`:
- Around line 1816-1826: The setters SetupPolicyLRU and SetupPolicyLoLRU
currently flip cache_evict_policy_ and the threshold after runtime which can
break invariants for already-initialized shards; make them startup-only by
checking a "started" state or rebuild shard state: either assert the service
hasn't started (e.g., require !started_ and throw/log/assert in SetupPolicyLRU
and SetupPolicyLoLRU) or, if called at runtime, iterate existing shards (use the
shard container member) and update each shard's LRU data structures and
thresholds to match cache_evict_policy_ and
u_cache_evict_policy_.lo_lru.large_obj_threshold_bytes so in-memory shard layout
stays consistent with the new policy.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 03a2152b-ec14-4682-9923-870494976e28

📥 Commits

Reviewing files that changed from the base of the PR and between 4ebaf4c and 936cd55.

📒 Files selected for processing (13)
  • core/src/tx_service_init.cpp
  • tx_service/CMakeLists.txt
  • tx_service/include/cc/cc_entry.h
  • tx_service/include/cc/local_cc_shards.h
  • tx_service/include/cc/object_cc_map.h
  • tx_service/include/cc/template_cc_map.h
  • tx_service/include/store/int_mem_store.h
  • tx_service/src/cc/cc_shard.cpp
  • tx_service/tests/CMakeLists.txt
  • tx_service/tests/CcPage-Test.cpp
  • tx_service/tests/LargeObjLRU-Test.cpp
  • tx_service/tests/StartTsCollector-Test.cpp
  • tx_service/tests/include/mock/mock_catalog_factory.h
🚧 Files skipped from review as they are similar to previous changes (2)
  • tx_service/tests/CMakeLists.txt
  • tx_service/CMakeLists.txt

Comment on lines +2303 to +2310
if (entry->payload_.cur_payload_->HasTTL())
{
uint64_t ttl = entry->payload_.cur_payload_->GetTTL();
if (ttl < smallest_ttl_)
{
smallest_ttl_ = ttl;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle tombstones and null payloads when updating smallest_ttl_.

Line 2303 and Line 2588 dereference cur_payload_ unconditionally, but deleted entries are represented with a null payload. The full recompute path also never applies the documented deleted-entry sentinel (smallest_ttl_ = 0), so after re-homing a page with tombstones can either crash here or be skipped by purge logic.

Proposed fix
-        if (entry->payload_.cur_payload_->HasTTL())
+        if (entry->PayloadStatus() == RecordStatus::Deleted)
+        {
+            smallest_ttl_ = 0;
+        }
+        else if (smallest_ttl_ != 0 &&
+                 entry->PayloadStatus() == RecordStatus::Normal &&
+                 entry->payload_.cur_payload_ != nullptr &&
+                 entry->payload_.cur_payload_->HasTTL())
         {
-            uint64_t ttl = entry->payload_.cur_payload_->GetTTL();
-            if (ttl < smallest_ttl_)
-            {
-                smallest_ttl_ = ttl;
-            }
+            smallest_ttl_ =
+                std::min(smallest_ttl_,
+                         entry->payload_.cur_payload_->GetTTL());
         }
@@
     void UpdateSmallestTTL()
     {
         smallest_ttl_ = UINT64_MAX;
         for (std::unique_ptr<
                  CcEntry<KeyT, ValueT, VersionedRecord, RangePartitioned>>
                  &entry : entries_)
         {
-            if (entry->payload_.cur_payload_->HasTTL())
+            if (smallest_ttl_ == 0)
+            {
+                break;
+            }
+            if (entry->PayloadStatus() == RecordStatus::Deleted)
+            {
+                smallest_ttl_ = 0;
+            }
+            else if (entry->PayloadStatus() == RecordStatus::Normal &&
+                     entry->payload_.cur_payload_ != nullptr &&
+                     entry->payload_.cur_payload_->HasTTL())
             {
-                uint64_t ttl = entry->payload_.cur_payload_->GetTTL();
-                if (ttl < smallest_ttl_)
-                {
-                    smallest_ttl_ = ttl;
-                }
+                smallest_ttl_ =
+                    std::min(smallest_ttl_,
+                             entry->payload_.cur_payload_->GetTTL());
             }
         }
     }

Also applies to: 2581-2597

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/cc/cc_entry.h` around lines 2303 - 2310, The code
dereferences entry->payload_.cur_payload_ without checking for null or
tombstone; update the logic that updates smallest_ttl_ to first check if
entry->payload_.cur_payload_ is non-null and not a tombstone before calling
HasTTL/GetTTL, and when walking the full-recompute path ensure the deleted-entry
sentinel is applied (set smallest_ttl_ = 0) if any entry has a null/tombstone
payload; apply the same null/tombstone guard to the other occurrence around
lines referencing entry->payload_.cur_payload_ (the block at 2581-2597) so you
never dereference a null payload and you correctly set smallest_ttl_ = 0 for
deleted entries.

Comment on lines +67 to +123
static std::pair<std::unique_ptr<LocalCcShards>, std::unique_ptr<CcShard>>
make_shard()
{
std::unordered_map<uint32_t, std::vector<NodeConfig>> ng_configs{
{0, {NodeConfig(0, "127.0.0.1", 8600)}}};
std::map<std::string, uint32_t> tx_cnf{
{"node_memory_limit_mb", 1000},
{"enable_key_cache", 0},
{"reltime_sampling", 0},
{"range_split_worker_num", 1},
{"range_slice_memory_limit_percent", 20},
{"core_num", 1},
{"realtime_sampling", 0},
{"checkpointer_interval", 10},
{"checkpointer_delay_seconds", 0},
{"checkpointer_min_ckpt_request_interval", 5},
{"enable_shard_heap_defragment", 0},
{"node_log_limit_mb", 1000},
{"collect_active_tx_ts_interval_seconds", 2},
{"rep_group_cnt", 1},
};

CatalogFactory *catalog_factory[5] = {&mock_catalog_factory,
&mock_catalog_factory,
&mock_catalog_factory,
&mock_catalog_factory,
&mock_catalog_factory};

auto local = std::make_unique<LocalCcShards>(0,
0,
tx_cnf,
catalog_factory,
nullptr,
&ng_configs,
2,
nullptr,
nullptr,
true);
// Use LO_LRU for this shard
local->SetupPolicyLoLRU(1);

auto shard = std::make_unique<CcShard>(0,
1,
10000,
false,
0,
*local,
catalog_factory,
nullptr,
&ng_configs,
2);
shard->Init();
std::string raft_path("");
Sharder::Instance(
0, &ng_configs, 0, nullptr, nullptr, local.get(), nullptr, &raft_path);
return {std::move(local), std::move(shard)};
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't return shard fixtures that still point at dead stack config.

Both helpers pass &ng_configs and &raft_path into LocalCcShards / Sharder::Instance(), then return only the shard objects. After the helper returns, those addresses dangle, so later test code can read freed memory through the singleton or shard config pointers.

🧪 Safer fixture pattern
+struct ShardFixture
+{
+    std::unordered_map<uint32_t, std::vector<NodeConfig>> ng_configs{
+        {0, {NodeConfig(0, "127.0.0.1", 8600)}}};
+    std::string raft_path;
+    std::unique_ptr<LocalCcShards> local;
+    std::unique_ptr<CcShard> shard;
+};

-static std::pair<std::unique_ptr<LocalCcShards>, std::unique_ptr<CcShard>>
+static ShardFixture make_shard()
 {
-    std::unordered_map<uint32_t, std::vector<NodeConfig>> ng_configs{
-        {0, {NodeConfig(0, "127.0.0.1", 8600)}}};
+    ShardFixture fixture;
     ...
-    auto local = std::make_unique<LocalCcShards>(..., &ng_configs, ...);
+    fixture.local = std::make_unique<LocalCcShards>(
+        ..., &fixture.ng_configs, ...);
     ...
-    std::string raft_path("");
-    Sharder::Instance(
-        0, &ng_configs, 0, nullptr, nullptr, local.get(), nullptr, &raft_path);
-    return {std::move(local), std::move(shard)};
+    Sharder::Instance(0,
+                      &fixture.ng_configs,
+                      0,
+                      nullptr,
+                      nullptr,
+                      fixture.local.get(),
+                      nullptr,
+                      &fixture.raft_path);
+    return fixture;
 }

Apply the same ownership fix to make_shard_config().

Also applies to: 126-183

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/tests/LargeObjLRU-Test.cpp` around lines 67 - 123, The helper
make_shard (and make_shard_config) currently creates ng_configs and raft_path on
the stack and passes their addresses into LocalCcShards and Sharder::Instance,
which leaves dangling pointers when the helper returns; fix by heap-allocating
and transferring ownership of those objects so they outlive the returned shard
objects — e.g., create a std::shared_ptr<std::unordered_map<uint32_t,
std::vector<NodeConfig>>> ng_configs_ptr and a std::shared_ptr<std::string>
raft_path_ptr (or return a struct/tuple that includes owning
unique_ptr/shared_ptr for ng_configs and raft_path) and pass
ng_configs_ptr.get() and raft_path_ptr.get() into LocalCcShards, CcShard and
Sharder::Instance; apply the same change in make_shard_config so the addresses
passed into LocalCcShards/CcShard/Sharder come from heap-owned objects that are
returned/owned alongside the shard fixtures.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
tx_service/include/cc/template_cc_map.h (2)

10346-10363: ⚠️ Potential issue | 🔴 Critical

Insert the fallback page after the current LO page, not after the candidate page.

Line 10351 dereferences target_it before checking ccmp_.end(). Also, when the next page exists but is unusable, the constructor uses that candidate as prev_page_, which links the new page after it even though key < target_it->first. That breaks the page chain ordering.

♻️ Minimal fix
         if (shard_->GetCacheEvictPolicy() == CacheEvictPolicy::LO_LRU)
         {
             if (target_page->large_obj_page_)
             {
-                target_it++;
-                target_page = target_it->second.get();
-                if (target_it == ccmp_.end() || target_page->large_obj_page_ ||
-                    target_page->Full())
+                auto *lo_page = target_page;
+                ++target_it;
+                auto *next_page =
+                    target_it == ccmp_.end() ? nullptr : target_it->second.get();
+                if (next_page == nullptr || next_page->large_obj_page_ ||
+                    next_page->Full())
                 {
                     target_it = ccmp_.try_emplace(
                         target_it,
                         key,
                         std::make_unique<CcPage<KeyT,
                                                 ValueT,
                                                 VersionedRecord,
                                                 RangePartitioned>>(
-                            this, target_page, target_page->next_page_));
-                    target_page = target_it->second.get();
+                            this, lo_page, lo_page->next_page_));
+                    next_page = target_it->second.get();
                 }
+                target_page = next_page;
             }
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/cc/template_cc_map.h` around lines 10346 - 10363, The code
dereferences target_it before checking ccmp_.end() and sets the new page's
prev_page_ to a candidate that may be after the intended LO page, breaking
ordering; to fix, first capture the current LO page pointer as prev_page =
target_page (the page after which we want to insert), then advance target_it
(target_it++) only to probe the next candidate, and check target_it ==
ccmp_.end() or unusable conditions before dereferencing; when constructing the
fallback CcPage in the ccmp_.try_emplace call, pass prev_page (the original LO
page) as the prev_page_ argument so the new page is inserted after the LO page,
and avoid dereferencing target_it until after the end() check. Ensure these
changes touch the block handling CacheEvictPolicy::LO_LRU, the
target_it/target_page manipulation, and the ccmp_.try_emplace call that
constructs CcPage.

11982-12010: ⚠️ Potential issue | 🔴 Critical

Relocating an arbitrary entry here still corrupts page state.

Line 11991 always links the new page after ccp, so if idx_in_ccp is not the tail, iterator traversal no longer matches key order. And Lines 11993-11995 move the CcEntry without repairing its page pointer, so resumed lock/page lookups can still reference the source page. Either split around idx_in_ccp before isolating, or restrict this helper to tail entries and update the moved entry with UpdateCcPage() immediately after emplacement.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/cc/template_cc_map.h` around lines 11982 - 12010, The code
always links the new large_obj page after ccp and moves an entry without fixing
its page pointer, corrupting key order and stale page references when idx_in_ccp
is not the tail; fix by ensuring correct insertion order and repairing the moved
entry's page pointer: compute the correct map insertion point using
lower_bound/upper_bound on *key so the new page is inserted where key order
dictates (not always after ccp), and immediately after Emplace(*key,
ccp->MoveEntry(idx_in_ccp)) call the entry/page-pointer fixup routine (e.g.,
UpdateCcPage on the moved CcEntry or equivalent on large_obj_ccp_ptr) so the
moved entry references large_obj_ccp_ptr before calling
ccp->Remove()/UpdateSmallestTTL(), then proceed to TryUpdatePageKey,
ccmp_.try_emplace(...) and shard_->UpdateLruList(...).
tx_service/tests/LargeObjLRU-Test.cpp (1)

67-123: ⚠️ Potential issue | 🟠 Major

Keep ng_configs and raft_path alive with the returned fixture.

Both helpers pass addresses of stack locals into LocalCcShards and Sharder::Instance(), then return only local/shard. After the helper returns, those pointers dangle and the singleton can read freed memory in later assertions or later tests.

📦 Safer fixture shape
+struct ShardFixture
+{
+    std::unordered_map<uint32_t, std::vector<NodeConfig>> ng_configs{
+        {0, {NodeConfig(0, "127.0.0.1", 8600)}}};
+    std::string raft_path;
+    std::unique_ptr<LocalCcShards> local;
+    std::unique_ptr<CcShard> shard;
+};
+
-static std::pair<std::unique_ptr<LocalCcShards>, std::unique_ptr<CcShard>>
+static ShardFixture
 make_shard()
 {
-    std::unordered_map<uint32_t, std::vector<NodeConfig>> ng_configs{ ... };
+    ShardFixture fixture;
     ...
-    auto local = std::make_unique<LocalCcShards>(..., &ng_configs, ...);
+    fixture.local = std::make_unique<LocalCcShards>(
+        ..., &fixture.ng_configs, ...);
     ...
-    std::string raft_path("");
-    Sharder::Instance(0, &ng_configs, 0, nullptr, nullptr, local.get(), nullptr, &raft_path);
-    return {std::move(local), std::move(shard)};
+    Sharder::Instance(0,
+                      &fixture.ng_configs,
+                      0,
+                      nullptr,
+                      nullptr,
+                      fixture.local.get(),
+                      nullptr,
+                      &fixture.raft_path);
+    return fixture;
 }

Apply the same ownership fix to make_shard_config().

Also applies to: 126-183

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/tests/LargeObjLRU-Test.cpp` around lines 67 - 123, The helper
make_shard is returning local/shard while passing addresses of stack variables
(ng_configs, raft_path) into LocalCcShards and Sharder::Instance, causing
dangling pointer use; change the fixture to own those objects (allocate
ng_configs and raft_path with heap storage or include them inside a returned
fixture struct) and update make_shard (and make_shard_config) to return/hold
ng_configs and raft_path alongside LocalCcShards and CcShard so the pointers
passed into LocalCcShards::LocalCcShards and Sharder::Instance remain valid for
the lifetime of the returned fixtures.
tx_service/src/cc/cc_shard.cpp (1)

929-939: ⚠️ Potential issue | 🟠 Major

Handle in-place tail promotions before the fast return.

TemplateCcMap::EnsureLargeObjOccupyPageAlone() can promote a one-entry page by flipping large_obj_page_ and then calling UpdateLruList(). If that page is already tail_ccp_.lru_prev_, Lines 1030-1035 return before protected_head_page_ is set, so the next small-page insertion can land after a large page and the invariant asserted in Lines 933-939 becomes stale.

🔁 Small fix sketch
     if (page->lru_next_ == insert_before && insert_before->lru_prev_ == page)
     {
+        if (GetCacheEvictPolicy() == CacheEvictPolicy::LO_LRU &&
+            page->large_obj_page_ && protected_head_page_ == &tail_ccp_)
+        {
+            protected_head_page_ = page;
+        }
         ++access_counter_;
         page->last_access_ts_ = access_counter_;
         return;
     }

Also applies to: 1002-1069

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/src/cc/cc_shard.cpp` around lines 929 - 939, Ensure
protected_head_page_ is updated when a one-entry page is promoted in-place
(TemplateCcMap::EnsureLargeObjOccupyPageAlone may flip large_obj_page_ and call
UpdateLruList()), so before any early return that assumes protected_head_page_
is valid, detect if the promoted page was tail_ccp_.lru_prev_ (or otherwise
became the tail predecessor) and advance protected_head_page_ accordingly; in
practice, after UpdateLruList() or any promotion path in
EnsureLargeObjOccupyPageAlone(), set protected_head_page_ =
protected_head_page_->lru_next_ (or to &tail_ccp_ as appropriate) when
protected_head_page_ == page to preserve the invariant used by the LO_LRU branch
that checks protected_head_page_ against tail_ccp_.
tx_service/include/cc/object_cc_map.h (1)

1267-1269: ⚠️ Potential issue | 🔴 Critical

Only reshape LO_LRU pages after unlock/commit.

EnsureLargeObjOccupyPageAlone() still assumes a committed Normal/Deleted entry with no key-lock/pending state attached. Several of these sites run before ReleaseCceLock(), and Line 1522 can still see an aborted Unknown entry. That can trip the helper's asserts or relocate a locked/buffered entry onto a different page.

Also applies to: 1522-1525, 1871-1875, 2103-2107, 2555-2559, 2736-2739, 2825-2830

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/cc/object_cc_map.h` around lines 1267 - 1269, The call to
EnsureLargeObjOccupyPageAlone assumes the CCE is committed and unlocked but is
currently invoked while the CCE may still be locked/Unknown; move the
EnsureLargeObjOccupyPageAlone(ccp, cce) invocation out of the
pre-unlock/pre-commit paths and place it only after ReleaseCceLock()/commit
completes (or in the post-commit/unlocked cleanup path) so the entry is
guaranteed to be Normal/Deleted with no key-lock/pending state; apply the same
change to the other occurrences you noted (around the blocks currently at lines
~1522-1525, 1871-1875, 2103-2107, 2555-2559, 2736-2739, 2825-2830) and, as a
safety, check the CCE state before calling EnsureLargeObjOccupyPageAlone.
🧹 Nitpick comments (1)
tx_service/tests/include/mock/mock_catalog_factory.h (1)

326-329: Inconsistent stub pattern: missing assert(false).

Unlike the other newly added methods, CreateTxRecord() silently returns nullptr without asserting. This inconsistency may mask test failures if this path is unexpectedly reached.

♻️ Suggested fix for consistency
     std::unique_ptr<TxRecord> CreateTxRecord() const override
     {
+        assert(false);
         return nullptr;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/tests/include/mock/mock_catalog_factory.h` around lines 326 - 329,
The CreateTxRecord() stub in mock_catalog_factory.h currently returns nullptr
silently; change it to follow the other stubs by asserting when reached (e.g.,
call assert(false) or equivalent test-failure macro) and then return nullptr so
tests fail fast if this path is hit; update the CreateTxRecord() implementation
to include the assertion before returning to match the pattern used by the other
mock methods.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tx_service/include/cc/object_cc_map.h`:
- Around line 1871-1875: The current guards only call
EnsureLargeObjOccupyPageAlone(ccp, cce) when payload_status ==
RecordStatus::Normal, which skips the Deleted reclassification path that clears
large_obj_page_ and moves pages back to the small-object region; update the
condition to also handle RecordStatus::Deleted (or otherwise call
EnsureLargeObjOccupyPageAlone for Deleted) and move the call to a safe unlocked
location so deletes applied via upload/replay/standby correctly clear
large-object state; apply the same change for the cce->PayloadStatus() checks in
the standby and replay paths (locations referenced by
EnsureLargeObjOccupyPageAlone, shard_->GetCacheEvictPolicy(),
RecordStatus::Normal/Deleted, and cce->PayloadStatus()).

In `@tx_service/tests/CcPage-Test.cpp`:
- Around line 96-102: Header declaration for the CatalogFactory array must be
changed to match the implementation: update the declaration of CatalogFactory
*catalog_factory in cc_shard.h from size [6] to [5] so it aligns with the
implementation in cc_shard.cpp and the class member catalog_factory_[5]; ensure
GetCatalogFactory() indexing (0–4 for TableEngine values 1–5) remains valid
after the header fix.

---

Duplicate comments:
In `@tx_service/include/cc/object_cc_map.h`:
- Around line 1267-1269: The call to EnsureLargeObjOccupyPageAlone assumes the
CCE is committed and unlocked but is currently invoked while the CCE may still
be locked/Unknown; move the EnsureLargeObjOccupyPageAlone(ccp, cce) invocation
out of the pre-unlock/pre-commit paths and place it only after
ReleaseCceLock()/commit completes (or in the post-commit/unlocked cleanup path)
so the entry is guaranteed to be Normal/Deleted with no key-lock/pending state;
apply the same change to the other occurrences you noted (around the blocks
currently at lines ~1522-1525, 1871-1875, 2103-2107, 2555-2559, 2736-2739,
2825-2830) and, as a safety, check the CCE state before calling
EnsureLargeObjOccupyPageAlone.

In `@tx_service/include/cc/template_cc_map.h`:
- Around line 10346-10363: The code dereferences target_it before checking
ccmp_.end() and sets the new page's prev_page_ to a candidate that may be after
the intended LO page, breaking ordering; to fix, first capture the current LO
page pointer as prev_page = target_page (the page after which we want to
insert), then advance target_it (target_it++) only to probe the next candidate,
and check target_it == ccmp_.end() or unusable conditions before dereferencing;
when constructing the fallback CcPage in the ccmp_.try_emplace call, pass
prev_page (the original LO page) as the prev_page_ argument so the new page is
inserted after the LO page, and avoid dereferencing target_it until after the
end() check. Ensure these changes touch the block handling
CacheEvictPolicy::LO_LRU, the target_it/target_page manipulation, and the
ccmp_.try_emplace call that constructs CcPage.
- Around line 11982-12010: The code always links the new large_obj page after
ccp and moves an entry without fixing its page pointer, corrupting key order and
stale page references when idx_in_ccp is not the tail; fix by ensuring correct
insertion order and repairing the moved entry's page pointer: compute the
correct map insertion point using lower_bound/upper_bound on *key so the new
page is inserted where key order dictates (not always after ccp), and
immediately after Emplace(*key, ccp->MoveEntry(idx_in_ccp)) call the
entry/page-pointer fixup routine (e.g., UpdateCcPage on the moved CcEntry or
equivalent on large_obj_ccp_ptr) so the moved entry references large_obj_ccp_ptr
before calling ccp->Remove()/UpdateSmallestTTL(), then proceed to
TryUpdatePageKey, ccmp_.try_emplace(...) and shard_->UpdateLruList(...).

In `@tx_service/src/cc/cc_shard.cpp`:
- Around line 929-939: Ensure protected_head_page_ is updated when a one-entry
page is promoted in-place (TemplateCcMap::EnsureLargeObjOccupyPageAlone may flip
large_obj_page_ and call UpdateLruList()), so before any early return that
assumes protected_head_page_ is valid, detect if the promoted page was
tail_ccp_.lru_prev_ (or otherwise became the tail predecessor) and advance
protected_head_page_ accordingly; in practice, after UpdateLruList() or any
promotion path in EnsureLargeObjOccupyPageAlone(), set protected_head_page_ =
protected_head_page_->lru_next_ (or to &tail_ccp_ as appropriate) when
protected_head_page_ == page to preserve the invariant used by the LO_LRU branch
that checks protected_head_page_ against tail_ccp_.

In `@tx_service/tests/LargeObjLRU-Test.cpp`:
- Around line 67-123: The helper make_shard is returning local/shard while
passing addresses of stack variables (ng_configs, raft_path) into LocalCcShards
and Sharder::Instance, causing dangling pointer use; change the fixture to own
those objects (allocate ng_configs and raft_path with heap storage or include
them inside a returned fixture struct) and update make_shard (and
make_shard_config) to return/hold ng_configs and raft_path alongside
LocalCcShards and CcShard so the pointers passed into
LocalCcShards::LocalCcShards and Sharder::Instance remain valid for the lifetime
of the returned fixtures.

---

Nitpick comments:
In `@tx_service/tests/include/mock/mock_catalog_factory.h`:
- Around line 326-329: The CreateTxRecord() stub in mock_catalog_factory.h
currently returns nullptr silently; change it to follow the other stubs by
asserting when reached (e.g., call assert(false) or equivalent test-failure
macro) and then return nullptr so tests fail fast if this path is hit; update
the CreateTxRecord() implementation to include the assertion before returning to
match the pattern used by the other mock methods.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8b1e7bd7-edd4-45b0-a952-60be9af77aac

📥 Commits

Reviewing files that changed from the base of the PR and between 936cd55 and 1b27b7a.

📒 Files selected for processing (13)
  • core/src/tx_service_init.cpp
  • tx_service/CMakeLists.txt
  • tx_service/include/cc/cc_entry.h
  • tx_service/include/cc/local_cc_shards.h
  • tx_service/include/cc/object_cc_map.h
  • tx_service/include/cc/template_cc_map.h
  • tx_service/include/store/int_mem_store.h
  • tx_service/src/cc/cc_shard.cpp
  • tx_service/tests/CMakeLists.txt
  • tx_service/tests/CcPage-Test.cpp
  • tx_service/tests/LargeObjLRU-Test.cpp
  • tx_service/tests/StartTsCollector-Test.cpp
  • tx_service/tests/include/mock/mock_catalog_factory.h
🚧 Files skipped from review as they are similar to previous changes (5)
  • tx_service/tests/StartTsCollector-Test.cpp
  • tx_service/include/cc/local_cc_shards.h
  • tx_service/tests/CMakeLists.txt
  • core/src/tx_service_init.cpp
  • tx_service/include/cc/cc_entry.h

Comment on lines +1871 to +1875
if (shard_->GetCacheEvictPolicy() == CacheEvictPolicy::LO_LRU &&
payload_status == RecordStatus::Normal)
{
EnsureLargeObjOccupyPageAlone(ccp, cce);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't skip the Deleted reclassification path.

These guards only call EnsureLargeObjOccupyPageAlone() for RecordStatus::Normal, but the helper's Deleted branch is what clears large_obj_page_ and moves a formerly-large page back into the small-object region. A delete applied through upload/replay/standby can therefore leave the page incorrectly protected.

🧭 Minimal guard change
- if (shard_->GetCacheEvictPolicy() == CacheEvictPolicy::LO_LRU &&
-     payload_status == RecordStatus::Normal)
+ if (shard_->GetCacheEvictPolicy() == CacheEvictPolicy::LO_LRU &&
+     (payload_status == RecordStatus::Normal ||
+      payload_status == RecordStatus::Deleted))
{
    EnsureLargeObjOccupyPageAlone(ccp, cce);
}

Apply the same change to the cce->PayloadStatus() guards in the standby/replay paths, after moving the call to a safe unlocked point.

Also applies to: 2103-2107, 2555-2559

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/cc/object_cc_map.h` around lines 1871 - 1875, The current
guards only call EnsureLargeObjOccupyPageAlone(ccp, cce) when payload_status ==
RecordStatus::Normal, which skips the Deleted reclassification path that clears
large_obj_page_ and moves pages back to the small-object region; update the
condition to also handle RecordStatus::Deleted (or otherwise call
EnsureLargeObjOccupyPageAlone for Deleted) and move the call to a safe unlocked
location so deletes applied via upload/replay/standby correctly clear
large-object state; apply the same change for the cce->PayloadStatus() checks in
the standby and replay paths (locations referenced by
EnsureLargeObjOccupyPageAlone, shard_->GetCacheEvictPolicy(),
RecordStatus::Normal/Deleted, and cce->PayloadStatus()).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (6)
tx_service/include/cc/template_cc_map.h (3)

10346-10372: ⚠️ Potential issue | 🔴 Critical

Keep the fallback page linked after the original LO page.

Once target_it has advanced, target_page may already point at the candidate next page. In the allocation branch, Lines 10364-10371 then build the new page with that candidate as prev_page_, so the page chain ends up linking the new page after the candidate while ccmp_ inserts it before the candidate. That splits map order from prev_page_/next_page_ traversal.

♻️ Minimal fix
         if (shard_->GetCacheEvictPolicy() == CacheEvictPolicy::LO_LRU)
         {
             if (target_page->large_obj_page_)
             {
-                target_it++;
+                auto *lo_page = target_page;
+                ++target_it;
                 bool emplace_page = false;
                 if (target_it == ccmp_.end())
                 {
                     emplace_page = true;
                 }
@@
                 if (emplace_page)
                 {
                     target_it = ccmp_.try_emplace(
                         target_it,
                         key,
                         std::make_unique<CcPage<KeyT,
                                                 ValueT,
                                                 VersionedRecord,
                                                 RangePartitioned>>(
-                            this, target_page, target_page->next_page_));
+                            this, lo_page, lo_page->next_page_));
                     target_page = target_it->second.get();
                 }
             }
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/cc/template_cc_map.h` around lines 10346 - 10372, Save the
original target_page (the LO page you started from) before advancing target_it
and use that saved pointer as the prev_page_ when constructing the new CcPage;
specifically, capture e.g. original_page = target_page before doing target_it++
and then call std::make_unique<CcPage<...>>(this, original_page,
original_page->next_page_) so the new page's prev/next links remain consistent
with ccmp_ insertion order.

11995-12018: ⚠️ Potential issue | 🔴 Critical

Don't move an arbitrary entry into a page that is always linked after ccp.

idx_in_ccp can point at any entry in the source page, but the new LO page is always constructed as (this, ccp, ccp->next_page_). If the moved key is not the source-page tail, ccp still retains larger keys while the new single-key page is placed after it, so page-chain traversal no longer matches key order.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/cc/template_cc_map.h` around lines 11995 - 12018, The code
moves an arbitrary entry (idx_in_ccp) into a new large-object page constructed
as (this, ccp, ccp->next_page_) which always links the new page after ccp,
breaking global key order if the moved key is not the page tail; fix by ensuring
only the largest key from ccp is moved when constructing the new LO page placed
after ccp (e.g., compute tail index and use that instead of idx_in_ccp), or
alternatively construct/insert the new page before ccp when moving a smaller key
so chain order is preserved; update the code paths around the CcPage constructor
call, large_obj_ccp_uptr allocation, the call that uses idx_in_ccp,
TryUpdatePageKey, and the ccmp_.try_emplace usage to reflect moving the correct
element or inserting the page in the correct position.

12001-12019: ⚠️ Potential issue | 🔴 Critical

Repair the moved entry and initialize the new LO page before inserting it.

After Lines 12002-12005, the relocated entry is published on a brand-new page, but this path never updates the moved CcEntry's owning page and never seeds the new page's last_dirty_commit_ts_ / smallest_ttl_. Other move/split paths in this file do both explicitly, so this can leave stale page references and make scan/cleanup treat the LO page as clean or non-expiring.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/cc/template_cc_map.h` around lines 12001 - 12019, The
moved CcEntry must have its owning page set and the new LO page must be seeded
with the source page's dirty/TTL metadata before publishing: after
Emplace(...)/Remove(...)/UpdateSmallestTTL() set the moved entry's owning page
pointer to large_obj_ccp_uptr.get() (i.e. update the CcEntry's parent/page
reference returned/held by MoveEntry/Emplace), copy the relevant metadata from
the source ccp into the new page (e.g. seed
large_obj_ccp_uptr->last_dirty_commit_ts_ and large_obj_ccp_uptr->smallest_ttl_
from ccp's values or call the page init helper used elsewhere), then proceed to
TryUpdatePageKey, try_emplace, and shard_->UpdateLruList so the LO page is
correctly initialized and visible.
tx_service/tests/LargeObjLRU-Test.cpp (2)

67-123: ⚠️ Potential issue | 🔴 Critical

Keep ng_configs and raft_path owned by the returned fixture.

Both helpers pass &ng_configs and &raft_path into LocalCcShards / Sharder::Instance() and then return only local and shard. Those pointers dangle as soon as the helper returns, so the tests can later read freed memory through the shard or the singleton.

Also applies to: 126-183

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/tests/LargeObjLRU-Test.cpp` around lines 67 - 123, The ng_configs
and raft_path objects are allocated on the stack inside make_shard() but their
addresses are stored in LocalCcShards / passed to Sharder::Instance(), so they
dangle after return; fix by making the fixture own them: create a small return
struct (e.g., ShardFixture) that contains std::unique_ptr<LocalCcShards> local,
std::unique_ptr<CcShard> shard, the ng_configs map, and raft_path string; change
make_shard() to construct those members and pass &fixture.ng_configs and
&fixture.raft_path into LocalCcShards, CcShard and Sharder::Instance(); update
call sites/tests to use the new ShardFixture return type; apply the same change
to the other helper at lines ~126-183.

168-177: ⚠️ Potential issue | 🟡 Minor

make_shard_config() drops the requested shard memory limit.

The helper accepts node_memory_limit_mb, but Line 170 still constructs CcShard with 10000. Any test that expects shard-level memory/eviction behavior from this helper will run with a different limit than requested.

Suggested fix
-    auto shard = std::make_unique<CcShard>(0,
-                                           1,
-                                           10000,
+    auto shard = std::make_unique<CcShard>(0,
+                                           1,
+                                           node_memory_limit_mb,
                                            false,
                                            0,
                                            *local,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/tests/LargeObjLRU-Test.cpp` around lines 168 - 177, The helper
make_shard_config() accepts node_memory_limit_mb but the CcShard instantiation
still hardcodes 10000; update the CcShard constructor call in the helper to use
the node_memory_limit_mb parameter (or the equivalent field from the config it
returns) instead of the literal 10000 so shard-level memory/eviction behavior
reflects the requested limit; adjust any related signatures or callers of
make_shard_config() if needed to propagate node_memory_limit_mb to the CcShard
constructor.
tx_service/include/cc/object_cc_map.h (1)

1267-1269: ⚠️ Potential issue | 🔴 Critical

Only run LO-LRU reshaping on unlocked, committed Normal/Deleted entries.

These hooks still reach EnsureLargeObjOccupyPageAlone() while the entry can be write-locked or pending (ApplyCc, UploadTxCommandsCc, CreateDirtyPayloadFromPendingCommand), and some guards still exclude RecordStatus::Deleted in the upload/replay/standby paths. That means the helper can still assert on locked/unknown state, or fail to clear large_obj_page_ when a formerly-large object is deleted.

Also applies to: 1522-1525, 1871-1875, 2103-2107, 2555-2559, 2736-2739, 2825-2830

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/cc/object_cc_map.h` around lines 1267 - 1269, The LO-LRU
call to EnsureLargeObjOccupyPageAlone must only run for entries that are
unlocked, committed (not pending/applying), and whose RecordStatus is Normal or
Deleted; update each site that currently checks shard_->GetCacheEvictPolicy() ==
CacheEvictPolicy::LO_LRU to additionally verify the entry is not write-locked,
is in a committed state (not part of ApplyCc / UploadTxCommandsCc /
CreateDirtyPayloadFromPendingCommand paths), and that entry->GetStatus() (or
equivalent) is either RecordStatus::Normal or RecordStatus::Deleted before
invoking EnsureLargeObjOccupyPageAlone, so the helper never sees locked/unknown
states and large_obj_page_ is cleared for deleted entries.
🧹 Nitpick comments (2)
core/src/tx_service_init.cpp (1)

233-260: LGTM with optional enhancement.

The policy parsing logic is correct: proper initialization, validation against known policy names, and error handling on invalid input. The previous uninitialized variable issue has been addressed with the initialization at line 235.

Consider adding a log message to confirm the selected policy and threshold for observability during startup:

💡 Optional: Add policy configuration logging
     else
     {
         LOG(ERROR) << "Invalidate `cache_evict_policy` "
                    << cache_evict_policy_str;
         return false;
     }
+    LOG(INFO) << "Cache eviction policy: " << cache_evict_policy_str;
+    if (cache_evict_policy == txservice::CacheEvictPolicy::LO_LRU)
+    {
+        LOG(INFO) << "LO_LRU large object threshold: "
+                  << lolru_large_obj_threshold_kb << " KB";
+    }
 
     LOG(INFO) << "Data substrate memory limit: "
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/tx_service_init.cpp` around lines 233 - 260, Add a startup log that
reports which eviction policy and, if applicable, the LO_LRU threshold were
selected: after the branch that sets cache_evict_policy (using
cache_evict_policy_str, txservice::CacheEvictPolicy, and
lolru_large_obj_threshold_kb) emit an informative INFO log like "Selected
cache_evict_policy=<policy> [lolru_large_obj_threshold_kb=<value>]" so operators
can observe the configured policy and threshold at startup; ensure the message
only includes the threshold when cache_evict_policy ==
txservice::CacheEvictPolicy::LO_LRU.
tx_service/src/cc/cc_shard.cpp (1)

1017-1023: Consider adding an early assertion after advancing protected_head_page_.

When page == protected_head_page_ but page->large_obj_page_ is false (demoted page), the code advances protected_head_page_ to page->lru_next_. However, there's no immediate validation that the new protected head maintains the invariant.

The assertion at line 1068 will catch violations, but adding an early assertion here would fail faster and clarify the expectation.

🛡️ Suggested defensive assertion
         if (page == protected_head_page_)
         {
             assert(page != &tail_ccp_);
             protected_head_page_ = page->lru_next_;
+            assert(protected_head_page_ == &tail_ccp_ ||
+                   protected_head_page_->large_obj_page_ == true);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/src/cc/cc_shard.cpp` around lines 1017 - 1023, After advancing
protected_head_page_ when page == protected_head_page_ (and
page->large_obj_page_ is false), add an immediate assertion that the new
protected_head_page_ still satisfies the invariant (e.g., is not tail_ccp_ and
has expected properties), so violations are caught early; locate the block
handling if (page == protected_head_page_) in cc_shard.cpp, update the branch
that sets protected_head_page_ = page->lru_next_ to assert the new
protected_head_page_ is valid (using protected_head_page_, tail_ccp_,
page->large_obj_page_, and lru_next_ to form the check).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tx_service/include/cc/local_cc_shards.h`:
- Around line 1816-1826: The setters SetupPolicyLRU and SetupPolicyLoLRU mutate
shared state (cache_evict_policy_ and
u_cache_evict_policy_.lo_lru.large_obj_threshold_bytes) that shard readers
access concurrently; make these operations safe by either (A) guarding the
fields with a mutex used by readers and writers (protect cache_evict_policy_ and
u_cache_evict_policy_.lo_lru.large_obj_threshold_bytes with a single std::mutex
or std::shared_mutex, lock for write in SetupPolicyLRU/SetupPolicyLoLRU and lock
for read where shards consult the policy), or (B) restrict the setters to
init-only by asserting the service is not started (add a started_ flag check in
SetupPolicyLRU/SetupPolicyLoLRU and document that they must be called before
startup). Update all reader sites that access
cache_evict_policy_/u_cache_evict_policy_.lo_lru.large_obj_threshold_bytes to
use the same synchronization (or skip if you choose the init-only path and
enforce the assert).

---

Duplicate comments:
In `@tx_service/include/cc/object_cc_map.h`:
- Around line 1267-1269: The LO-LRU call to EnsureLargeObjOccupyPageAlone must
only run for entries that are unlocked, committed (not pending/applying), and
whose RecordStatus is Normal or Deleted; update each site that currently checks
shard_->GetCacheEvictPolicy() == CacheEvictPolicy::LO_LRU to additionally verify
the entry is not write-locked, is in a committed state (not part of ApplyCc /
UploadTxCommandsCc / CreateDirtyPayloadFromPendingCommand paths), and that
entry->GetStatus() (or equivalent) is either RecordStatus::Normal or
RecordStatus::Deleted before invoking EnsureLargeObjOccupyPageAlone, so the
helper never sees locked/unknown states and large_obj_page_ is cleared for
deleted entries.

In `@tx_service/include/cc/template_cc_map.h`:
- Around line 10346-10372: Save the original target_page (the LO page you
started from) before advancing target_it and use that saved pointer as the
prev_page_ when constructing the new CcPage; specifically, capture e.g.
original_page = target_page before doing target_it++ and then call
std::make_unique<CcPage<...>>(this, original_page, original_page->next_page_) so
the new page's prev/next links remain consistent with ccmp_ insertion order.
- Around line 11995-12018: The code moves an arbitrary entry (idx_in_ccp) into a
new large-object page constructed as (this, ccp, ccp->next_page_) which always
links the new page after ccp, breaking global key order if the moved key is not
the page tail; fix by ensuring only the largest key from ccp is moved when
constructing the new LO page placed after ccp (e.g., compute tail index and use
that instead of idx_in_ccp), or alternatively construct/insert the new page
before ccp when moving a smaller key so chain order is preserved; update the
code paths around the CcPage constructor call, large_obj_ccp_uptr allocation,
the call that uses idx_in_ccp, TryUpdatePageKey, and the ccmp_.try_emplace usage
to reflect moving the correct element or inserting the page in the correct
position.
- Around line 12001-12019: The moved CcEntry must have its owning page set and
the new LO page must be seeded with the source page's dirty/TTL metadata before
publishing: after Emplace(...)/Remove(...)/UpdateSmallestTTL() set the moved
entry's owning page pointer to large_obj_ccp_uptr.get() (i.e. update the
CcEntry's parent/page reference returned/held by MoveEntry/Emplace), copy the
relevant metadata from the source ccp into the new page (e.g. seed
large_obj_ccp_uptr->last_dirty_commit_ts_ and large_obj_ccp_uptr->smallest_ttl_
from ccp's values or call the page init helper used elsewhere), then proceed to
TryUpdatePageKey, try_emplace, and shard_->UpdateLruList so the LO page is
correctly initialized and visible.

In `@tx_service/tests/LargeObjLRU-Test.cpp`:
- Around line 67-123: The ng_configs and raft_path objects are allocated on the
stack inside make_shard() but their addresses are stored in LocalCcShards /
passed to Sharder::Instance(), so they dangle after return; fix by making the
fixture own them: create a small return struct (e.g., ShardFixture) that
contains std::unique_ptr<LocalCcShards> local, std::unique_ptr<CcShard> shard,
the ng_configs map, and raft_path string; change make_shard() to construct those
members and pass &fixture.ng_configs and &fixture.raft_path into LocalCcShards,
CcShard and Sharder::Instance(); update call sites/tests to use the new
ShardFixture return type; apply the same change to the other helper at lines
~126-183.
- Around line 168-177: The helper make_shard_config() accepts
node_memory_limit_mb but the CcShard instantiation still hardcodes 10000; update
the CcShard constructor call in the helper to use the node_memory_limit_mb
parameter (or the equivalent field from the config it returns) instead of the
literal 10000 so shard-level memory/eviction behavior reflects the requested
limit; adjust any related signatures or callers of make_shard_config() if needed
to propagate node_memory_limit_mb to the CcShard constructor.

---

Nitpick comments:
In `@core/src/tx_service_init.cpp`:
- Around line 233-260: Add a startup log that reports which eviction policy and,
if applicable, the LO_LRU threshold were selected: after the branch that sets
cache_evict_policy (using cache_evict_policy_str, txservice::CacheEvictPolicy,
and lolru_large_obj_threshold_kb) emit an informative INFO log like "Selected
cache_evict_policy=<policy> [lolru_large_obj_threshold_kb=<value>]" so operators
can observe the configured policy and threshold at startup; ensure the message
only includes the threshold when cache_evict_policy ==
txservice::CacheEvictPolicy::LO_LRU.

In `@tx_service/src/cc/cc_shard.cpp`:
- Around line 1017-1023: After advancing protected_head_page_ when page ==
protected_head_page_ (and page->large_obj_page_ is false), add an immediate
assertion that the new protected_head_page_ still satisfies the invariant (e.g.,
is not tail_ccp_ and has expected properties), so violations are caught early;
locate the block handling if (page == protected_head_page_) in cc_shard.cpp,
update the branch that sets protected_head_page_ = page->lru_next_ to assert the
new protected_head_page_ is valid (using protected_head_page_, tail_ccp_,
page->large_obj_page_, and lru_next_ to form the check).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6f988722-3931-47a4-8f9e-bdfdc726d812

📥 Commits

Reviewing files that changed from the base of the PR and between 1b27b7a and 04f8368.

📒 Files selected for processing (13)
  • core/src/tx_service_init.cpp
  • tx_service/CMakeLists.txt
  • tx_service/include/cc/cc_entry.h
  • tx_service/include/cc/local_cc_shards.h
  • tx_service/include/cc/object_cc_map.h
  • tx_service/include/cc/template_cc_map.h
  • tx_service/include/store/int_mem_store.h
  • tx_service/src/cc/cc_shard.cpp
  • tx_service/tests/CMakeLists.txt
  • tx_service/tests/CcPage-Test.cpp
  • tx_service/tests/LargeObjLRU-Test.cpp
  • tx_service/tests/StartTsCollector-Test.cpp
  • tx_service/tests/include/mock/mock_catalog_factory.h
🚧 Files skipped from review as they are similar to previous changes (4)
  • tx_service/tests/StartTsCollector-Test.cpp
  • tx_service/tests/CMakeLists.txt
  • tx_service/include/store/int_mem_store.h
  • tx_service/CMakeLists.txt

Comment on lines +1816 to +1826
void SetupPolicyLRU()
{
cache_evict_policy_ = CacheEvictPolicy::LRU;
}

void SetupPolicyLoLRU(uint32_t large_obj_threshold_kb)
{
cache_evict_policy_ = CacheEvictPolicy::LO_LRU;
u_cache_evict_policy_.lo_lru.large_obj_threshold_bytes =
large_obj_threshold_kb * 1024ull;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Make policy switches synchronized or init-only.

SetupPolicyLRU() / SetupPolicyLoLRU() mutate cache_evict_policy_ and large_obj_threshold_bytes with plain writes, while shard code reads the same fields during request processing. Since TxService exposes these setters directly, calling them after startup is a read/write data race, and readers can observe LO_LRU before the threshold update is visible. Either guard this state with synchronization or explicitly restrict these setters to pre-start initialization.

Also applies to: 2060-2065

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tx_service/include/cc/local_cc_shards.h` around lines 1816 - 1826, The
setters SetupPolicyLRU and SetupPolicyLoLRU mutate shared state
(cache_evict_policy_ and u_cache_evict_policy_.lo_lru.large_obj_threshold_bytes)
that shard readers access concurrently; make these operations safe by either (A)
guarding the fields with a mutex used by readers and writers (protect
cache_evict_policy_ and u_cache_evict_policy_.lo_lru.large_obj_threshold_bytes
with a single std::mutex or std::shared_mutex, lock for write in
SetupPolicyLRU/SetupPolicyLoLRU and lock for read where shards consult the
policy), or (B) restrict the setters to init-only by asserting the service is
not started (add a started_ flag check in SetupPolicyLRU/SetupPolicyLoLRU and
document that they must be called before startup). Update all reader sites that
access
cache_evict_policy_/u_cache_evict_policy_.lo_lru.large_obj_threshold_bytes to
use the same synchronization (or skip if you choose the init-only path and
enforce the assert).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Enhance cache eviction policy to support payload-size-aware eviction

1 participant