Skip to content

feat(core): ✨ add opt-in CAS retry and unify Option interface#164

Merged
justapithecus merged 2 commits intomainfrom
andrew/feat/core/cas-retry
Mar 8, 2026
Merged

feat(core): ✨ add opt-in CAS retry and unify Option interface#164
justapithecus merged 2 commits intomainfrom
andrew/feat/core/cas-retry

Conversation

@justapithecus
Copy link
Member

Summary

Add bounded automatic retry on ErrSnapshotConflict for both Dataset and Volume write paths, and unify the Option interface so NewVolume accepts the same option type as NewDataset.

Highlights

  • WithRetryCount(n) enables opt-in CAS retry on commit conflict — data files are written once, only the manifest re-parent and pointer CAS are retried
  • Unified Option interfaceapplyVolume method added; VolumeOption type and WithVolumeChecksum removed; WithChecksum now works for both Dataset and Volume
  • WithRetryBaseDelay, WithRetryMaxDelay, WithRetryJitter — full backoff customization with sensible defaults (10ms base, 2s cap, full jitter)
  • Single retryOption type with closure pattern eliminates per-option boilerplate (one struct, three methods, four constructors)
  • commitWithRetry helper shared by Write, StreamWriteRecords, and StreamWriter.Commit — no retry code duplication
  • Volume Commit re-merges blocks against the refreshed parent on each retry; ErrOverlappingBlocks terminates immediately (non-retryable)

Breaking Changes

  • VolumeOption type removed — NewVolume now accepts ...Option
  • WithVolumeChecksum removed — use WithChecksum (same function, now works for both)

Test plan

  • Backoff bounds, deterministic no-jitter, cap, context cancellation (retry_test.go)
  • Dataset retry succeeds after CAS conflict
  • Dataset WithRetryCount(0) preserves current behavior (immediate ErrSnapshotConflict)
  • Option validation: negative count, out-of-range jitter, non-positive delays
  • Reader rejects retry options (ErrOptionNotValidForDatasetReader)
  • Volume retry succeeds with re-merged blocks (3 cumulative blocks after retry)
  • Volume WithRetryCount(0) preserves current behavior
  • Volume overlap on retry terminates with ErrOverlappingBlocks
  • Unified WithChecksum works for Volume
  • Dataset-only options rejected by NewVolume (ErrOptionNotValidForVolume)
  • All existing tests pass unchanged
  • Linter clean (0 issues)

Closes #163

🤖 Generated with Claude Code

justapithecus and others added 2 commits March 8, 2026 10:17
Introduce bounded automatic retry on ErrSnapshotConflict for both
Dataset and Volume write paths. Data files are written once; on
conflict only the manifest is re-parented and the pointer CAS retried.

Unify the Option system: Volume now accepts the same Option interface
as Dataset (applyVolume added). WithVolumeChecksum and VolumeOption
type removed; WithChecksum works for both Dataset and Volume.

- Add WithRetryCount, WithRetryBaseDelay, WithRetryMaxDelay, WithRetryJitter
- Single retryOption type with closure to eliminate boilerplate
- Extract commitWithRetry helper shared by Write, StreamWriteRecords,
  and StreamWriter.Commit
- Volume.Commit re-merges blocks against refreshed parent on retry;
  ErrOverlappingBlocks terminates immediately (non-retryable)
- Add ErrOptionNotValidForVolume sentinel
- Update contracts and PUBLIC_API.md

Closes #163

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
commitWithRetry was generating a new snapshotID on each retry attempt,
but data files were already written under paths derived from the original
ID. After retry, the manifest would advertise the new ID while FileRef
paths pointed to the old snapshot directory, breaking layout co-location.

Fix: keep the same snapshotID across retries. On CAS failure, no manifest
was written for this ID (pointer is written before manifest), so the
namespace is safe to reuse. Only the parent ID and pointer CAS change.

Also fixes CONTRACT_VOLUME.md code example that still showed the removed
VolumeOption type in the NewVolume signature.

- Add path-consistency assertion to retry test
- Update contract docs to say "same snapshot ID" not "new snapshot ID"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@justapithecus justapithecus merged commit be520ea into main Mar 8, 2026
8 checks passed
@justapithecus justapithecus deleted the andrew/feat/core/cas-retry branch March 8, 2026 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: CAS retry on snapshot conflict in commit path

1 participant