Conversation
Summary of ChangesHello @ensi321, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the beacon node's state caching mechanism to accommodate the requirements of ePBS. By introducing a Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces significant refactoring to the state caching mechanism to support ePBS, where a block can have two distinct states: one with a payload and one without. The core of the changes is the introduction of a payloadPresent flag to differentiate these states, which is propagated through various types and functions related to state management. Key updates include modifying the in-memory cache key to incorporate this flag, adjusting state retrieval logic to handle both variants, and dynamically increasing cache sizes for the Gloas fork. The implementation is largely consistent and well-structured. However, I've identified a critical issue in the persistence logic where the datastore key for checkpoint states does not distinguish between the two state variants, potentially leading to data loss when both variants are persisted for the same block. This needs to be addressed to ensure the integrity of the state cache.
| for (const rootHex of cpRootHexes) { | ||
| const cpKey = toCacheKey({epoch: epoch, rootHex}); | ||
| const cacheItem = this.cache.get(cpKey); | ||
|
|
||
| if (cacheItem !== undefined && isInMemoryCacheItem(cacheItem)) { | ||
| let {persistedKey} = cacheItem; | ||
| const {state} = cacheItem; | ||
| const logMeta = { | ||
| stateSlot: state.slot, | ||
| rootHex, | ||
| epochBoundaryHex, | ||
| persistedKey: persistedKey ? toHex(persistedKey) : "", | ||
| }; | ||
|
|
||
| if (persistedRootHexes.has(rootHex)) { | ||
| if (persistedKey) { | ||
| // we don't care if the checkpoint state is already persisted | ||
| this.logger.verbose("Pruned checkpoint state from memory but no need to persist", logMeta); | ||
| } else { | ||
| // persist and do not update epochIndex | ||
| this.metrics?.cpStateCache.statePersistSecFromSlot.observe( | ||
| this.clock?.secFromSlot(this.clock?.currentSlot ?? 0) ?? 0 | ||
| ); | ||
| const cpPersist = {epoch: epoch, root: fromHex(rootHex)}; | ||
| // It's not sustainable to allocate ~240MB for each state every epoch, so we use buffer pool to reuse the memory. | ||
| // As monitored on holesky as of Jan 2024: | ||
| // - This does not increase heap allocation while gc time is the same | ||
| // - It helps stabilize persist time and save ~300ms in average (1.5s vs 1.2s) | ||
| // - It also helps the state reload to save ~500ms in average (4.3s vs 3.8s) | ||
| // - Also `serializeState.test.ts` perf test shows a lot of differences allocating ~240MB once vs per state serialization | ||
| const timer = this.metrics?.stateSerializeDuration.startTimer({ | ||
| source: AllocSource.PERSISTENT_CHECKPOINTS_CACHE_STATE, | ||
| }); | ||
| persistedKey = await serializeState( | ||
| state, | ||
| AllocSource.PERSISTENT_CHECKPOINTS_CACHE_STATE, | ||
| (stateBytes) => { | ||
| timer?.(); | ||
| return this.datastore.write(cpPersist, stateBytes); | ||
| }, | ||
| this.bufferPool | ||
| ); | ||
| // Process both payloadPresent variants for each rootHex | ||
| for (const payloadPresent of [true, false]) { | ||
| const cpKey = toCacheKey({epoch: epoch, rootHex, payloadPresent}); | ||
| const cacheItem = this.cache.get(cpKey); | ||
|
|
||
| persistCount++; | ||
| this.logger.verbose("Pruned checkpoint state from memory and persisted to disk", { | ||
| ...logMeta, | ||
| persistedKey: toHex(persistedKey), | ||
| }); | ||
| } | ||
| // overwrite cpKey, this means the state is deleted from memory | ||
| this.cache.set(cpKey, {type: CacheItemType.persisted, value: persistedKey}); | ||
| } else { | ||
| if (persistedKey) { | ||
| // persisted file will be eventually deleted by the archive task | ||
| // this also means the state is deleted from memory | ||
| if (cacheItem !== undefined && isInMemoryCacheItem(cacheItem)) { | ||
| let {persistedKey} = cacheItem; | ||
| const {state} = cacheItem; | ||
| const logMeta = { | ||
| stateSlot: state.slot, | ||
| rootHex, | ||
| payloadPresent, | ||
| epochBoundaryHex, | ||
| persistedKey: persistedKey ? toHex(persistedKey) : "", | ||
| }; | ||
|
|
||
| if (persistedRootHexes.has(rootHex)) { | ||
| if (persistedKey) { | ||
| // we don't care if the checkpoint state is already persisted | ||
| this.logger.verbose("Pruned checkpoint state from memory but no need to persist", logMeta); | ||
| } else { | ||
| // persist and do not update epochIndex | ||
| this.metrics?.cpStateCache.statePersistSecFromSlot.observe( | ||
| this.clock?.secFromSlot(this.clock?.currentSlot ?? 0) ?? 0 | ||
| ); | ||
| const cpPersist = {epoch: epoch, root: fromHex(rootHex)}; | ||
| // It's not sustainable to allocate ~240MB for each state every epoch, so we use buffer pool to reuse the memory. | ||
| // As monitored on holesky as of Jan 2024: | ||
| // - This does not increase heap allocation while gc time is the same | ||
| // - It helps stabilize persist time and save ~300ms in average (1.5s vs 1.2s) | ||
| // - It also helps the state reload to save ~500ms in average (4.3s vs 3.8s) | ||
| // - Also `serializeState.test.ts` perf test shows a lot of differences allocating ~240MB once vs per state serialization | ||
| const timer = this.metrics?.stateSerializeDuration.startTimer({ | ||
| source: AllocSource.PERSISTENT_CHECKPOINTS_CACHE_STATE, | ||
| }); | ||
| persistedKey = await serializeState( | ||
| state, | ||
| AllocSource.PERSISTENT_CHECKPOINTS_CACHE_STATE, | ||
| (stateBytes) => { | ||
| timer?.(); | ||
| return this.datastore.write(cpPersist, stateBytes); | ||
| }, | ||
| this.bufferPool | ||
| ); | ||
|
|
||
| persistCount++; | ||
| this.logger.verbose("Pruned checkpoint state from memory and persisted to disk", { | ||
| ...logMeta, | ||
| persistedKey: toHex(persistedKey), | ||
| }); | ||
| } | ||
| // overwrite cpKey, this means the state is deleted from memory | ||
| this.cache.set(cpKey, {type: CacheItemType.persisted, value: persistedKey}); | ||
| // do not update epochIndex | ||
| } else { | ||
| // delete the state from memory | ||
| this.cache.delete(cpKey); | ||
| this.epochIndex.get(epoch)?.delete(rootHex); | ||
| if (persistedKey) { | ||
| // persisted file will be eventually deleted by the archive task | ||
| // this also means the state is deleted from memory | ||
| this.cache.set(cpKey, {type: CacheItemType.persisted, value: persistedKey}); | ||
| // do not update epochIndex | ||
| } else { | ||
| // delete the state from memory | ||
| this.cache.delete(cpKey); | ||
| this.epochIndex.get(epoch)?.delete(rootHex); | ||
| } | ||
| this.metrics?.cpStateCache.statePruneFromMemoryCount.inc(); | ||
| this.logger.verbose("Pruned checkpoint state from memory", logMeta); | ||
| } | ||
| this.metrics?.cpStateCache.statePruneFromMemoryCount.inc(); | ||
| this.logger.verbose("Pruned checkpoint state from memory", logMeta); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The current implementation for persisting checkpoint states in processPastEpoch may lead to data loss for state variants.
The function iterates through payloadPresent values [true, false] to handle both block and payload state variants. However, when persisting a state, the key used for the datastore (cpPersist) is derived only from epoch and rootHex, without including payloadPresent.
const cpPersist = {epoch: epoch, root: fromHex(rootHex)};
// ...
persistedKey = await serializeState(
state,
// ...
(stateBytes) => {
return this.datastore.write(cpPersist, stateBytes);
},
// ...
);If both state variants (with payloadPresent: true and payloadPresent: false) for the same block root and epoch are present in memory and need to be persisted, they will be written to the same key in the datastore. This will cause one variant to overwrite the other, leading to the loss of one of the states from persistent storage.
While payload states might not be added to the checkpoint cache yet (as per the TODO in QueuedStateRegenerator), this logic is being introduced now and contains a latent but critical bug.
To fix this, the datastore keying mechanism should be updated to include the payloadPresent flag to uniquely identify and store each state variant.
| @@ -111,6 +124,13 @@ export class FIFOBlockStateCache implements BlockStateCache { | |||
| * In importBlock() steps, normally it'll call add() with isHead = false first. Then call setHeadState() to set the head. | |||
| */ | |||
| add(item: CachedBeaconStateAllForks, isHead = false): void { | |||
There was a problem hiding this comment.
need a 2nd required param payloadPresent: boolean:
- on importing a gloas block, pass post state and
payloadPresentasfalse - on gloas payload post state, pass post state and
payloadPresetastrue
for pre-gloas blocks, payloadPresent is always true
| // Dynamically upgrade maxStates when Gloas fork is reached | ||
| // Gloas blocks can have two states (block state and payload state), so we need 2x capacity | ||
| if (!this.gloasMaxStatesActive && item.config.getForkSeq(item.slot) >= ForkSeq.gloas) { | ||
| this.maxStates = DEFAULT_MAX_BLOCK_STATES_GLOAS; |
There was a problem hiding this comment.
set this once on gloas fork transition, add some logs for it
| } | ||
|
|
||
| export function toCheckpointKey(cp: CheckpointHexPayload): string { | ||
| return `${cp.rootHex}:${cp.epoch}`; |
There was a problem hiding this comment.
need payloadPresent as part of the key?
| blockStateCache.add(anchorState); | ||
| blockStateCache.setHeadState(anchorState); | ||
| checkpointStateCache.add(checkpoint, anchorState); | ||
| // TODO: For Gloas, determine if anchor state is block state or payload state |
There was a problem hiding this comment.
should be able to determine now with anchorState.executionPayloadAvailability?
Needs
nc/epbs-fcto be merged first. Will point tounstableafter #8739 is merged.Summary
This PR extends the Gloas ePBS state cache architecture to support dual state variants (block state and payload state) by threading the
payloadPresentflag through the state cache layer, regeneration system, and archive store.Context
Building on the fork choice changes in
nc/epbs-fc(which stores checkpoints with payload status), this PR completes the state cache implementation for Gloas ePBS by:payloadPresentKey Changes
1. State Cache Type System Updates
packages/beacon-node/src/chain/stateCache/types.ts(+208/-111 lines total across files):CheckpointHex→CheckpointHexPayloadwith requiredpayloadPresent: booleanfieldCheckpointStateCacheinterface methods to acceptpayloadPresentparameter:add(cp, state, payloadPresent)- explicitly marks state variant when adding to cachegetLatest(rootHex, maxEpoch, payloadPresent)- retrieves specific state variantgetOrReloadLatest(rootHex, maxEpoch, payloadPresent)- reloads specific state variant from diskupdatePreComputedCheckpoint(rootHex, epoch, payloadPresent)- tracks payload status for pre-computed statesprocessState()method signature unchanged (manages both variants internally)2. PersistentCheckpointStateCache Implementation
packages/beacon-node/src/chain/stateCache/persistentCheckpointsCache.ts(~289 lines modified):"epoch-rootHex"to"epoch-rootHex-payloadPresent"toCheckpointHexPayload()helper to include payload status in keyspayloadPresentvariants inprocessPastEpoch()for memory management3. Regeneration Layer - Dual State Support
packages/beacon-node/src/chain/regen/interface.ts&queued.ts®en.ts:processPayloadState(payloadState)method for explicit payload state caching (Gloas-only)processExecutionPayloadEnvelope()when payload is revealedprocessState()which handles block state cachingaddCheckpointState(cp, state, payloadPresent)to accept payload flagupdatePreComputedCheckpoint()withpayloadPresentparametergetCheckpointState()and related methods to passpayloadPresentthrough cache lookups4. Block Import - Payload Status Propagation
packages/beacon-node/src/chain/blocks/importBlock.ts:payloadPresentfrom block type:payloadPresent = true(execution payload embedded in block, always FULL variant)payloadPresent = false(block state only, PENDING/EMPTY variant, payload not yet revealed)payloadPresentthrough checkpoint caching operations:regen.addCheckpointState(cp, checkpointState, payloadPresent)5. Archive Store - Historical State Management
packages/beacon-node/src/chain/archiveStore/:archiveStore.archiveState()to acceptpayloadPresentparameterfrequencyStateArchiveStrategy) to propagate payload status6. API & Validation Layer Updates
packages/beacon-node/src/api/impl/:payloadPresentflagTechnical Details
Checkpoint Key Format (Post-Gloas)
State Variant Semantics
Block State (
payloadPresent = false): State after processing beacon block, before execution payloadPayload State (
payloadPresent = true): State after processing execution payloadMigration & Compatibility
payloadPresent = truepayloadPresent = true)Depends On
nc/epbs-fc- Fork choice stores checkpoints with payload status (chore: fork choice stores checkpoints with payload status #8845)AI Disclosure: This PR was written primarily by Claude Code.