Skip to content

[codex] add checkpoint replay and shared primal ownership#9

Merged
shinaoka merged 1 commit intomainfrom
checkpoint-replay
Mar 29, 2026
Merged

[codex] add checkpoint replay and shared primal ownership#9
shinaoka merged 1 commit intomainfrom
checkpoint-replay

Conversation

@shinaoka
Copy link
Copy Markdown
Member

Summary

  • add first-class checkpoint replay with CheckpointRecipe, ReplayResult, and Tape::record_checkpointed_op
  • execute pullback and HVP through execution-local replay contexts, with demand-driven HVP tangents and phase-local replay
  • share retained primals between graph nodes and attached TrackedValue handles so retained ops no longer require V: Clone

Root Cause

tidu previously assumed reverse execution always used permanently materialized rules. The checkpoint replay refactor fixed that for pullback/HVP, but the first cut duplicated retained primals between the tape and TrackedValue, which introduced unnecessary V: Clone requirements on retained operations.

Impact

  • checkpointed nodes now replay lazily during pullback and HVP without persisting replay state on the tape
  • retained nodes reuse shared primal storage while checkpointed outputs still keep their forward value available
  • README, crate docs, and plan docs document the replay and shared-ownership model

Validation

  • cargo fmt --all --check
  • cargo clippy --workspace
  • cargo nextest run --release --workspace --no-fail-fast
  • cargo test --doc --release --workspace
  • cargo llvm-cov nextest --workspace --release --json --output-path coverage.json
  • python3 scripts/check-coverage.py coverage.json
  • cargo doc --workspace --no-deps
  • python3 scripts/check-docs-site.py
  • bash scripts/build_docs_site.sh

Generated with Codex.

@shinaoka shinaoka enabled auto-merge (squash) March 29, 2026 04:22
@shinaoka shinaoka merged commit eab2e0c into main Mar 29, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant