Skip to content

feat: state cache for ePBS#8868

Draft
ensi321 wants to merge 1 commit intonc/epbs-fcfrom
nc/epbs-state-cache
Draft

feat: state cache for ePBS#8868
ensi321 wants to merge 1 commit intonc/epbs-fcfrom
nc/epbs-state-cache

Conversation

@ensi321
Copy link
Contributor

@ensi321 ensi321 commented Feb 6, 2026

Needs nc/epbs-fc to be merged first. Will point to unstable after #8739 is merged.

Summary

This PR extends the Gloas ePBS state cache architecture to support dual state variants (block state and payload state) by threading the payloadPresent flag through the state cache layer, regeneration system, and archive store.

Context

Building on the fork choice changes in nc/epbs-fc (which stores checkpoints with payload status), this PR completes the state cache implementation for Gloas ePBS by:

  1. Extending all checkpoint cache operations to track payloadPresent
  2. Updating the regeneration layer to explicitly handle both state variants
  3. Propagating payload status from block import through to checkpoint caching

Key Changes

1. State Cache Type System Updates

packages/beacon-node/src/chain/stateCache/types.ts (+208/-111 lines total across files):

  • Renamed CheckpointHexCheckpointHexPayload with required payloadPresent: boolean field
  • Updated CheckpointStateCache interface methods to accept payloadPresent parameter:
    • add(cp, state, payloadPresent) - explicitly marks state variant when adding to cache
    • getLatest(rootHex, maxEpoch, payloadPresent) - retrieves specific state variant
    • getOrReloadLatest(rootHex, maxEpoch, payloadPresent) - reloads specific state variant from disk
    • updatePreComputedCheckpoint(rootHex, epoch, payloadPresent) - tracks payload status for pre-computed states
  • Kept processState() method signature unchanged (manages both variants internally)

2. PersistentCheckpointStateCache Implementation

packages/beacon-node/src/chain/stateCache/persistentCheckpointsCache.ts (~289 lines modified):

  • Extended cache key format from "epoch-rootHex" to "epoch-rootHex-payloadPresent"
  • Updated toCheckpointHexPayload() helper to include payload status in keys
  • Modified all cache operations (add, get, getLatest, etc.) to handle dual state variants
  • Implemented logic to iterate both payloadPresent variants in processPastEpoch() for memory management
  • Updated cache metrics and debugging utilities to reflect dual state architecture

3. Regeneration Layer - Dual State Support

packages/beacon-node/src/chain/regen/interface.ts & queued.ts & regen.ts:

  • Added processPayloadState(payloadState) method for explicit payload state caching (Gloas-only)
    • Called after processExecutionPayloadEnvelope() when payload is revealed
    • Complements processState() which handles block state caching
  • Updated addCheckpointState(cp, state, payloadPresent) to accept payload flag
  • Extended updatePreComputedCheckpoint() with payloadPresent parameter
  • Modified getCheckpointState() and related methods to pass payloadPresent through cache lookups

4. Block Import - Payload Status Propagation

packages/beacon-node/src/chain/blocks/importBlock.ts:

  • Derive payloadPresent from block type:
    • Pre-Gloas: payloadPresent = true (execution payload embedded in block, always FULL variant)
    • Post-Gloas: payloadPresent = false (block state only, PENDING/EMPTY variant, payload not yet revealed)
  • Thread payloadPresent through checkpoint caching operations:
    • regen.addCheckpointState(cp, checkpointState, payloadPresent)

5. Archive Store - Historical State Management

packages/beacon-node/src/chain/archiveStore/:

  • Updated archiveStore.archiveState() to accept payloadPresent parameter
  • Modified archival strategies (frequencyStateArchiveStrategy) to propagate payload status
  • Ensured historical states maintain proper metadata for state variant tracking

6. API & Validation Layer Updates

packages/beacon-node/src/api/impl/:

  • Updated validator API to retrieve correct state variant with payloadPresent flag
  • Modified beacon state utilities to handle payload-aware checkpoint lookups
  • Ensured API endpoints return appropriate state variant based on block type

Technical Details

Checkpoint Key Format (Post-Gloas)

// Pre-Gloas (always single variant):
"100-0x1234abcd"  // epoch-rootHex

// Post-Gloas (dual variants):
"100-0x1234abcd-false"  // epoch-rootHex-payloadPresent (block state)
"100-0x1234abcd-true"   // epoch-rootHex-payloadPresent (payload state)

State Variant Semantics

  • Block State (payloadPresent = false): State after processing beacon block, before execution payload

    • Only exists for Gloas blocks (PENDING/EMPTY variants)
  • Payload State (payloadPresent = true): State after processing execution payload

    • Exists for all pre-Gloas blocks (single variant)
    • Exists for Gloas blocks after payload revelation

Migration & Compatibility

  • Pre-Gloas blocks: All state cache operations default to payloadPresent = true
  • Backward compatibility: Checkpoint cache can handle mix of pre-Gloas and post-Gloas states
  • Cache key migration: Existing cache entries remain valid (treated as payloadPresent = true)

Depends On


AI Disclosure: This PR was written primarily by Claude Code.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ensi321, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the beacon node's state caching mechanism to accommodate the requirements of ePBS. By introducing a payloadPresent flag, the system can now differentiate and cache distinct states for a given checkpoint: one representing the beacon block's state and another including the full execution payload. This foundational change enables more flexible and efficient state management in an ePBS context, where block and payload processing can occur independently. The modifications span type definitions, cache key generation, and core state processing logic across various components of the node.

Highlights

  • Enhanced State Caching for ePBS: Introduced a payloadPresent flag to checkpoint state types (CheckpointHexPayload, CheckpointWithPayload), allowing the state cache to distinguish and store both beacon block states and full execution payload states. This is crucial for supporting the decoupled block and payload processing in ePBS (Ethereum Proposer-Builder Separation).
  • Dynamic Cache Sizing: The FIFOBlockStateCache now dynamically adjusts its maximum capacity, doubling it when the Gloas (ePBS) fork is reached. This ensures adequate space to cache both block and payload states, maintaining effective block depth.
  • Comprehensive Integration: The payloadPresent flag and related logic have been integrated across numerous modules, including API resolvers, archive store, block import, state regenerator, and chain event handlers, ensuring consistent handling of state variants throughout the beacon node.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • packages/beacon-node/src/api/impl/beacon/state/utils.ts
    • Updated resolveStateId to use CheckpointWithPayload instead of CheckpointWithHex.
  • packages/beacon-node/src/api/impl/validator/index.ts
    • Imported PayloadStatus for determining payload presence.
    • Replaced CheckpointHex with CheckpointHexPayload in waitForCheckpointState signature.
    • Modified waitForCheckpointState call to pass payloadPresent based on head.payloadStatus.
  • packages/beacon-node/src/chain/archiveStore/archiveStore.ts
    • Replaced CheckpointWithHex with CheckpointWithPayload in JobItemQueue and onFinalizedCheckpoint.
  • packages/beacon-node/src/chain/archiveStore/interface.ts
    • Updated StateArchiveStrategy interface methods to use CheckpointWithPayload.
  • packages/beacon-node/src/chain/archiveStore/strategies/frequencyStateArchiveStrategy.ts
    • Replaced CheckpointWithHex with CheckpointWithPayload in method signatures.
    • Added fcCheckpointToHexPayload conversion when archiving states to include payloadPresent.
  • packages/beacon-node/src/chain/blocks/importBlock.ts
    • Replaced toCheckpointHex with toCheckpointHexPayload.
    • Added logic to determine payloadPresent based on isGloasBlock and blockSummary.payloadStatus when processing states and emitting checkpoint events.
  • packages/beacon-node/src/chain/chain.ts
    • Updated imports and method signatures (getStateByCheckpoint, getStateOrBytesByCheckpoint, justifiedBalancesGetter, closestJustifiedBalancesStateToCheckpoint, onForkChoiceJustified, onForkChoiceFinalized, updateValidatorsCustodyRequirement) to use CheckpointWithPayload.
    • Modified addCheckpointState to accept a payloadPresent argument.
  • packages/beacon-node/src/chain/interface.ts
    • Added CheckpointWithPayload to imports.
    • Updated getStateOrBytesByCheckpoint signature to use CheckpointWithPayload.
  • packages/beacon-node/src/chain/prepareNextSlot.ts
    • Imported PayloadStatus.
    • Updated updatePreComputedCheckpoint to pass payloadPresent based on headBlock.payloadStatus.
  • packages/beacon-node/src/chain/regen/interface.ts
    • Replaced CheckpointHex with CheckpointHexPayload in method signatures.
    • Added processPayloadState method.
    • Modified addCheckpointState and updatePreComputedCheckpoint to include a payloadPresent parameter.
  • packages/beacon-node/src/chain/regen/queued.ts
    • Replaced CheckpointHex with CheckpointHexPayload and imported PayloadStatus.
    • Updated getPreStateSync, getClosestHeadState, addCheckpointState, and updatePreComputedCheckpoint to handle payloadPresent.
    • Implemented processPayloadState to add payload states to the block state cache.
  • packages/beacon-node/src/chain/regen/regen.ts
    • Imported PayloadStatus and ForkSeq.
    • Updated getPreState and processSlotsToNearestCheckpoint to determine and use payloadPresent based on PayloadStatus or ForkSeq.
  • packages/beacon-node/src/chain/stateCache/fifoBlockStateCache.ts
    • Imported ForkSeq.
    • Introduced DEFAULT_MAX_BLOCK_STATES_GLOAS and logic to dynamically increase maxStates when the Gloas fork is reached.
  • packages/beacon-node/src/chain/stateCache/persistentCheckpointsCache.ts
    • Imported CheckpointWithPayload.
    • Replaced CheckpointHex with CheckpointHexPayload throughout the file.
    • Updated toCacheKey and fromCacheKey functions to incorporate the payloadPresent flag into cache keys.
    • Added fcCheckpointToHexPayload utility for converting fork-choice checkpoints.
    • Modified add, getLatest, getOrReloadLatest, updatePreComputedCheckpoint, findSeedStateToReload, processState, and pruneFinalized to correctly utilize the payloadPresent flag.
  • packages/beacon-node/src/chain/stateCache/types.ts
    • Defined CheckpointHexPayload type to include payloadPresent: boolean.
    • Updated CheckpointStateCache interface methods to use CheckpointHexPayload and accept the payloadPresent parameter where appropriate.
  • packages/beacon-node/test/unit-minimal/chain/stateCache/persistentCheckpointsCache.test.ts
    • Updated imports and test variables to reflect the new CheckpointHexPayload type.
    • Modified calls to add, getLatest, getOrReloadLatest, getStateOrBytes, and findSeedStateToReload to pass the payloadPresent argument.
  • packages/beacon-node/test/unit/chain/regen/regen.test.ts
    • Updated cache.add calls to include the payloadPresent argument.
  • packages/beacon-node/test/utils/node/simTest.ts
    • Modified getCheckpointStateSync call to explicitly set payloadPresent: true for pre-Gloas simulation in tests.
Activity
  • The pull request introduces a new feature (feat: state cache for ePBS).
  • It is noted that this PR depends on another PR (nc/epbs-fc) to be merged first.
  • The branch will point to unstable after issue feat: implement epbs fork choice #8739 is merged, indicating a planned integration into a larger feature rollout.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant refactoring to the state caching mechanism to support ePBS, where a block can have two distinct states: one with a payload and one without. The core of the changes is the introduction of a payloadPresent flag to differentiate these states, which is propagated through various types and functions related to state management. Key updates include modifying the in-memory cache key to incorporate this flag, adjusting state retrieval logic to handle both variants, and dynamically increasing cache sizes for the Gloas fork. The implementation is largely consistent and well-structured. However, I've identified a critical issue in the persistence logic where the datastore key for checkpoint states does not distinguish between the two state variants, potentially leading to data loss when both variants are persisted for the same block. This needs to be addressed to ensure the integrity of the state cache.

Comment on lines 714 to 784
for (const rootHex of cpRootHexes) {
const cpKey = toCacheKey({epoch: epoch, rootHex});
const cacheItem = this.cache.get(cpKey);

if (cacheItem !== undefined && isInMemoryCacheItem(cacheItem)) {
let {persistedKey} = cacheItem;
const {state} = cacheItem;
const logMeta = {
stateSlot: state.slot,
rootHex,
epochBoundaryHex,
persistedKey: persistedKey ? toHex(persistedKey) : "",
};

if (persistedRootHexes.has(rootHex)) {
if (persistedKey) {
// we don't care if the checkpoint state is already persisted
this.logger.verbose("Pruned checkpoint state from memory but no need to persist", logMeta);
} else {
// persist and do not update epochIndex
this.metrics?.cpStateCache.statePersistSecFromSlot.observe(
this.clock?.secFromSlot(this.clock?.currentSlot ?? 0) ?? 0
);
const cpPersist = {epoch: epoch, root: fromHex(rootHex)};
// It's not sustainable to allocate ~240MB for each state every epoch, so we use buffer pool to reuse the memory.
// As monitored on holesky as of Jan 2024:
// - This does not increase heap allocation while gc time is the same
// - It helps stabilize persist time and save ~300ms in average (1.5s vs 1.2s)
// - It also helps the state reload to save ~500ms in average (4.3s vs 3.8s)
// - Also `serializeState.test.ts` perf test shows a lot of differences allocating ~240MB once vs per state serialization
const timer = this.metrics?.stateSerializeDuration.startTimer({
source: AllocSource.PERSISTENT_CHECKPOINTS_CACHE_STATE,
});
persistedKey = await serializeState(
state,
AllocSource.PERSISTENT_CHECKPOINTS_CACHE_STATE,
(stateBytes) => {
timer?.();
return this.datastore.write(cpPersist, stateBytes);
},
this.bufferPool
);
// Process both payloadPresent variants for each rootHex
for (const payloadPresent of [true, false]) {
const cpKey = toCacheKey({epoch: epoch, rootHex, payloadPresent});
const cacheItem = this.cache.get(cpKey);

persistCount++;
this.logger.verbose("Pruned checkpoint state from memory and persisted to disk", {
...logMeta,
persistedKey: toHex(persistedKey),
});
}
// overwrite cpKey, this means the state is deleted from memory
this.cache.set(cpKey, {type: CacheItemType.persisted, value: persistedKey});
} else {
if (persistedKey) {
// persisted file will be eventually deleted by the archive task
// this also means the state is deleted from memory
if (cacheItem !== undefined && isInMemoryCacheItem(cacheItem)) {
let {persistedKey} = cacheItem;
const {state} = cacheItem;
const logMeta = {
stateSlot: state.slot,
rootHex,
payloadPresent,
epochBoundaryHex,
persistedKey: persistedKey ? toHex(persistedKey) : "",
};

if (persistedRootHexes.has(rootHex)) {
if (persistedKey) {
// we don't care if the checkpoint state is already persisted
this.logger.verbose("Pruned checkpoint state from memory but no need to persist", logMeta);
} else {
// persist and do not update epochIndex
this.metrics?.cpStateCache.statePersistSecFromSlot.observe(
this.clock?.secFromSlot(this.clock?.currentSlot ?? 0) ?? 0
);
const cpPersist = {epoch: epoch, root: fromHex(rootHex)};
// It's not sustainable to allocate ~240MB for each state every epoch, so we use buffer pool to reuse the memory.
// As monitored on holesky as of Jan 2024:
// - This does not increase heap allocation while gc time is the same
// - It helps stabilize persist time and save ~300ms in average (1.5s vs 1.2s)
// - It also helps the state reload to save ~500ms in average (4.3s vs 3.8s)
// - Also `serializeState.test.ts` perf test shows a lot of differences allocating ~240MB once vs per state serialization
const timer = this.metrics?.stateSerializeDuration.startTimer({
source: AllocSource.PERSISTENT_CHECKPOINTS_CACHE_STATE,
});
persistedKey = await serializeState(
state,
AllocSource.PERSISTENT_CHECKPOINTS_CACHE_STATE,
(stateBytes) => {
timer?.();
return this.datastore.write(cpPersist, stateBytes);
},
this.bufferPool
);

persistCount++;
this.logger.verbose("Pruned checkpoint state from memory and persisted to disk", {
...logMeta,
persistedKey: toHex(persistedKey),
});
}
// overwrite cpKey, this means the state is deleted from memory
this.cache.set(cpKey, {type: CacheItemType.persisted, value: persistedKey});
// do not update epochIndex
} else {
// delete the state from memory
this.cache.delete(cpKey);
this.epochIndex.get(epoch)?.delete(rootHex);
if (persistedKey) {
// persisted file will be eventually deleted by the archive task
// this also means the state is deleted from memory
this.cache.set(cpKey, {type: CacheItemType.persisted, value: persistedKey});
// do not update epochIndex
} else {
// delete the state from memory
this.cache.delete(cpKey);
this.epochIndex.get(epoch)?.delete(rootHex);
}
this.metrics?.cpStateCache.statePruneFromMemoryCount.inc();
this.logger.verbose("Pruned checkpoint state from memory", logMeta);
}
this.metrics?.cpStateCache.statePruneFromMemoryCount.inc();
this.logger.verbose("Pruned checkpoint state from memory", logMeta);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current implementation for persisting checkpoint states in processPastEpoch may lead to data loss for state variants.

The function iterates through payloadPresent values [true, false] to handle both block and payload state variants. However, when persisting a state, the key used for the datastore (cpPersist) is derived only from epoch and rootHex, without including payloadPresent.

const cpPersist = {epoch: epoch, root: fromHex(rootHex)};
// ...
persistedKey = await serializeState(
  state,
  // ...
  (stateBytes) => {
    return this.datastore.write(cpPersist, stateBytes);
  },
  // ...
);

If both state variants (with payloadPresent: true and payloadPresent: false) for the same block root and epoch are present in memory and need to be persisted, they will be written to the same key in the datastore. This will cause one variant to overwrite the other, leading to the loss of one of the states from persistent storage.

While payload states might not be added to the checkpoint cache yet (as per the TODO in QueuedStateRegenerator), this logic is being introduced now and contains a latent but critical bug.

To fix this, the datastore keying mechanism should be updated to include the payloadPresent flag to uniquely identify and store each state variant.

@@ -111,6 +124,13 @@ export class FIFOBlockStateCache implements BlockStateCache {
* In importBlock() steps, normally it'll call add() with isHead = false first. Then call setHeadState() to set the head.
*/
add(item: CachedBeaconStateAllForks, isHead = false): void {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a 2nd required param payloadPresent: boolean:

  • on importing a gloas block, pass post state and payloadPresent as false
  • on gloas payload post state, pass post state and payloadPreset as true

for pre-gloas blocks, payloadPresent is always true

// Dynamically upgrade maxStates when Gloas fork is reached
// Gloas blocks can have two states (block state and payload state), so we need 2x capacity
if (!this.gloasMaxStatesActive && item.config.getForkSeq(item.slot) >= ForkSeq.gloas) {
this.maxStates = DEFAULT_MAX_BLOCK_STATES_GLOAS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set this once on gloas fork transition, add some logs for it

}

export function toCheckpointKey(cp: CheckpointHexPayload): string {
return `${cp.rootHex}:${cp.epoch}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need payloadPresent as part of the key?

blockStateCache.add(anchorState);
blockStateCache.setHeadState(anchorState);
checkpointStateCache.add(checkpoint, anchorState);
// TODO: For Gloas, determine if anchor state is block state or payload state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be able to determine now with anchorState.executionPayloadAvailability?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants