HoloCompute is a distributed memory + compute virtualization layer that turns a set of heterogeneous devices into one logical "personal supercomputer." It provides a unified memory space and distributed compute capabilities with strong consistency guarantees, fault tolerance, and security.
The Hyperbus is the foundational network layer that provides secure, reliable communication between nodes.
Key Features:
- Transport: QUIC protocol for connection-oriented, multiplexed streams with congestion control
- Security: Noise protocol framework with hybrid post-quantum KEM (X25519 + ML-KEM-768)
- Message Framing: Protocol Buffers for efficient serialization
- Stream Types: Control Plane (cluster management) and Data Plane (memory/compute operations)
The membership layer manages cluster topology and consensus.
Key Components:
- SWIM Gossip: Scalable Weakly-consistent Infection-style process group Membership for failure detection
- Raft Consensus: For cluster state management (ring assignments, shard mappings)
- Node Inventory: Tracks hardware capabilities (CPU, memory, GPU, NUMA topology)
- Placement Algorithms: Rendezvous hashing for shard ownership, consistent hash rings for resource classes
The DSM provides a unified virtual memory space across the cluster.
Key Features:
- Abstractions:
SharedArray[T]andDistBufferwith page-granular layout - Page Management: 64 KiB pages with locality-aware caching (2Q/ARC)
- Coherence Model: Lease-based write ownership (single-writer/multi-reader)
- Paging Protocol: Background prefetching, compression (LZ4/Zstd), checksums (BLAKE3)
- Eviction Policy: Cost-aware eviction based on size, latency, and access heat
The scheduler orchestrates distributed computation across the cluster.
Key Features:
- API:
ParallelFor,Map,Reduce,SubmitTask - Scheduling: Work-stealing with data locality scoring
- Fault Tolerance: Task replay, idempotency tokens, speculative execution
- Flow Control: Credit-based backpressure, adaptive concurrency
Secure execution environment for user code and hardware acceleration.
Key Components:
- WASM Sandbox: Deterministic execution with restricted syscalls
- GPU Plugins: gRPC interface for vector/matrix operations
- Serialization: CBOR/Protobuf contracts for data exchange
Persistent storage for cluster metadata and optional page caching.
Key Features:
- Metadata Store: BadgerDB/Pebble for Raft logs and DSM metadata
- Page Spill: Optional encrypted local disk cache
- Tunables: Configurable cache sizes and spill thresholds
Comprehensive monitoring and debugging capabilities.
Components:
- Metrics: Prometheus endpoint with key performance indicators
- Tracing: OpenTelemetry spans across the request lifecycle
- Profiling: pprof endpoints for performance analysis
- UIs: TUI ("HoloTop") and Web UI for cluster visualization
Command-line interface for cluster management and operations.
Commands:
holo agent: Run a nodeholo join/leave: Cluster membershipholo status/top: Cluster monitoringholo alloc: Resource allocationholo run: Task executionholo drain: Node maintenance
- Cluster Formation: Nodes join using SWIM gossip, establish QUIC connections via Hyperbus
- State Consensus: Raft elects leaders and replicates cluster state (rings, shard maps)
- Memory Allocation: Client allocates
SharedArray, control plane assigns pages to nodes - Data Access: Reader caches fetch pages; writers acquire leases from control plane
- Computation: Scheduler places tasks near data, executes via WASM or plugins
- Synchronization:
Sync()barriers flush writes, invalidate reader caches, bump versions
- Transport Security: Noise over QUIC with XChaCha20-Poly1305 or AES-GCM
- Post-Quantum: Hybrid KEM (X25519 + ML-KEM-768) with feature flag
- Identity: Ed25519 node identities with trust-on-first-use or pre-shared keys
- Access Control: RBAC with roles bound to node/user tokens
- Rate Limiting: Control endpoint protection
- Constant-Time: Key comparisons to prevent timing attacks
- Single Writer: Lease-based exclusive write access per page
- Multi Reader: Versioned immutable pages for concurrent reads
- Synchronization: Explicit
Sync()barriers for deterministic phases - Failure Recovery: Lease expiration on writer crash, Raft log replay for state recovery
- Page Fetch: P50 < 2ms, P99 < 10ms for 64 KiB over 1 Gbps LAN
- Speedup: 3.2× on 4-node
ParallelForof 100M elements vs. single node - Overhead: < 5% CPU on idle cluster
- Backpressure: Bounded queues, adaptive concurrency control