A modular distributed file storage system for learning and experimentation
Sandstore is a production-inspired distributed file storage system built in Go, designed as a hands-on learning playground for students and engineers exploring distributed systems concepts. Think of it as your personal sandbox for understanding how systems like Google File System (GFS) and Hadoop Distributed File System (HDFS) work under the hood.
Distributed systems can feel abstract and intimidating. Sandstore bridges that gap by providing:
- Real implementations of core distributed systems concepts (consensus, replication, fault tolerance)
- Modular architecture that lets you experiment with different components independently
- Production-grade patterns in a learner-friendly environment
- Hands-on experience with Raft consensus, leader election, and log replication
- Safe experimentation - break things, fix them, and learn without consequences
Whether you're a student diving into distributed systems for the first time or an experienced engineer wanting to understand these concepts more deeply, Sandstore gives you a real system to tinker with.
- Chunked file storage with configurable chunk sizes (default: 8MB)
- Automatic file splitting and reassembly across multiple nodes
- Metadata management for file locations and chunk mappings
- Multi-node cluster support with health monitoring
- Raft consensus algorithm for metadata consistency
- Leader election with automatic failover
- Log replication across cluster nodes
- Chunk replication for data durability
- Fault tolerance with node failure detection
- FileService - High-level file operations (store, read, delete)
- ChunkService - Low-level chunk storage and retrieval
- MetadataService - File metadata and chunk location tracking
- ClusterService - Node discovery and health management
- Communication - gRPC-based inter-node messaging
- Multiple server configurations (basic, replicated, Raft-enabled)
- Comprehensive logging for debugging and learning
- Test scenarios for failure simulation
- MCP integration for enhanced tooling support
Sandstore now ships with a single configurable entry point (cmd/sandstore) that can start different server modes:
-
Simple server (single node, no replication):
make simple
This uses
scripts/dev/run-simple.shand keeps the server in the foreground so you can stop it withCtrl+C.
Optional overrides:NODE_ID=<id> LISTEN_ADDR=:8090 DATA_DIR=./run/simple make simple -
Raft demo cluster (5 nodes with consensus and replication):
make cluster
This runs
scripts/dev/run-5.sh, launches five Raft nodes, and tails their logs torun/logs/. Stop the cluster withCtrl+C.
If you prefer to run individual nodes manually, you can call go run ./cmd/sandstore --server raft ... directly (see below for flag details).
-
Start a server (choose your complexity level):
# Single-node sandbox make simple # Multi-node Raft example make cluster
-
Store a file from another terminal:
make client
-
Watch the magic happen:
- Leader election occurs automatically when you run the Raft cluster
- Files get chunked and distributed across nodes
- Metadata is replicated via consensus
- Logs are written under
run/logs/(cluster) or the configuredDATA_DIR(simple server)
// Store a file
storeRequest := communication.StoreFileRequest{
Path: "my-document.txt",
Data: []byte("Hello, distributed world!"),
}
// The system automatically:
// 1. Chunks the file (if large enough)
// 2. Replicates chunks across nodes
// 3. Updates metadata via Raft consensus
// 4. Returns success once committedSandstore follows a layered, service-oriented architecture:
- File Service: Handles high-level file operations, coordinates chunking
- Chunk Service: Manages physical chunk storage on local disk
- Metadata Service: Tracks file-to-chunk mappings and locations
- Cluster Service: Maintains node membership and health status
- Raft Implementation: Ensures metadata consistency across nodes
- Leader Election: Automatic leader selection and failover
- Log Replication: Distributes state changes to all nodes
- gRPC Protocol: Type-safe, efficient inter-node communication
- Message Routing: Handles request forwarding and response aggregation
- Health Checking: Monitors node availability and network partitions
- Local Disk Storage: Chunks stored as individual files
- Configurable Paths: Separate storage per node instance
- Atomic Operations: Ensures data consistency during writes
- Enhanced Raft implementation with log compaction
- Improved failure detection and recovery
- Performance benchmarking and optimization
- Web-based cluster monitoring dashboard
- Erasure coding for efficient storage
- Dynamic cluster membership changes
- Cross-datacenter replication
- Compression and deduplication
- REST API alongside gRPC
- Interactive tutorials for each distributed systems concept
- Failure scenario simulations
- Performance analysis tools
- Architecture deep-dive documentation
We welcome contributions from learners and experts alike! Whether you're fixing bugs, adding features, improving documentation, or sharing learning resources, your contributions help make distributed systems more accessible.
See CONTRIBUTING.md for detailed guidelines on:
- Setting up your development environment
- Code style and testing requirements
- Submitting issues and pull requests
- Joining our community discussions
New to distributed systems? Try this progression:
- Start simple: Run
make simpleto explore single-node behavior - Scale out: Use
make cluster(or customizescripts/dev/run-5.sh) for a full Raft experience - Simulate failures: Kill nodes from the cluster and observe recovery
- Extend the system: Tweak the services in
servers/andcmd/sandstore
This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: Found a bug or have a feature request? Open an issue
- Discussions: Want to chat about distributed systems? Start a discussion
- Learning: Stuck on a concept? We're here to help - don't hesitate to ask!