HoloCompute provides a carefully designed consistency model that balances performance with correctness. It uses a lease-based coherence protocol with explicit synchronization barriers to provide predictable behavior for distributed computations.
Each page in a SharedArray follows a Single Writer, Multiple Readers (SWMR) model:
- Only one node can hold a write lease for a page at any time
- Multiple nodes can hold read leases simultaneously
- Readers access versioned, immutable snapshots of pages
- Writers modify pages exclusively and atomically
Page access is controlled through leases managed by the control plane:
- Read Leases: Allow reading a specific version of a page
- Write Leases: Allow reading and modifying a page
- Lease Duration: Short-lived (default: seconds) to minimize blocking
- Lease Expiration: Automatic cleanup of stale leases on node failure
Deterministic computation phases are achieved through explicit synchronization:
Sync()operations act as global barriers- All pending writes are flushed before barrier completion
- Reader caches are invalidated after barrier completion
- Array versions are incremented to ensure visibility of new data
- Cache Check: Reader first checks local page cache
- Lease Acquisition: If not cached, acquire read lease from control plane
- Page Fetch: Retrieve page from owner node if needed
- Local Caching: Cache page locally for subsequent accesses
- Data Access: Access data from cached page
- Lease Check: Verify no conflicting leases exist
- Lease Acquisition: Acquire write lease from control plane
- Page Fetch: Retrieve current page content from owner
- Local Modification: Modify page in local memory
- Write Buffering: Queue page for eventual flush
- Write Flush: All buffered writes are flushed to owner nodes
- Lease Revocation: All write leases are revoked
- Cache Invalidation: Reader caches are invalidated cluster-wide
- Version Bump: Array version is incremented
- Barrier Completion: Sync operation returns to caller
If a writer node fails before committing:
- Lease Expiration: Write lease expires after timeout
- State Reversion: Page reverts to previous version
- Task Replay: Failed tasks are rescheduled elsewhere
- No Data Loss: Committed data remains consistent
If a reader node fails:
- Lease Cleanup: Read leases are automatically cleaned up
- No Impact: No effect on writers or other readers
- Cache Loss: Only local cache is lost (re-fetch on next access)
During network partitions:
- Local Continuation: Nodes continue with cached data
- Lease Expiration: Remote leases expire in partition
- Partitioned Operation: Each partition operates independently
- Merge Recovery: Partitions reconcile on reconnection
- Atomic Writes: Page modifications are atomic
- Sequential Consistency: Sync() provides sequential consistency
- Linearizability: Within a single page, operations appear linearizable
- Durability: Committed writes survive node failures
- Cache Coherency: Reader caches eventually reflect latest writes
- Membership Updates: Cluster topology changes propagate eventually
- Metric Aggregation: Observability data is eventually consistent
// All writes visible to subsequent reads after Sync()
err := c.ParallelFor(1000, func(i int) error {
return arr.Set(i, compute(i)) // Writer 1
})
arr.Sync() // Barrier - all writes committed
err = c.ParallelFor(1000, func(i int) error {
v, _ := arr.Get(i) // Reader sees committed values
return arr.Set(i, v*2) // Writer 2 (different pages OK)
})// Writer conflict - second writer blocked until first commits
go writer1() // Acquires write lease on page 0
go writer2() // Blocks waiting for page 0 lease
func writer1() {
arr.Set(0, 42) // Acquires lease
time.Sleep(time.Second)
arr.Sync() // Releases lease
}
func writer2() {
arr.Set(0, 84) // Waits for lease, then succeeds
}- Acquisition: Single RTT to control plane
- Renewal: Background renewal before expiration
- Revocation: Async notification to lease holders
- Hit Ratio: High for repeated accesses
- Prefetching: Background fetch of adjacent pages
- Eviction: Cost-aware policy minimizes re-fetches
- Latency: Proportional to pending write volume
- Throughput: Pipeline-friendly with overlapping barriers
- Scalability: Distributed coordination with minimal contention