The Fault-Tolerant File Storage System is a distributed storage architecture designed to ensure data reliability, high availability, and automatic self-recovery from node failures.
This project implements triple-replication across multiple storage nodes using Flask microservices, utilizes checkpointing for persistent metadata, and provides automatic node recovery. The entire system is containerized for realistic, isolated deployment using Docker Compose.
| Feature | Description |
|---|---|
| π§± Triple Replication | Each uploaded file is redundantly stored on 3 independent storage nodes to guarantee data persistence. |
| β‘ Fault Tolerance | The system remains fully operational and data remains accessible even if up to two nodes fail simultaneously. |
| πΎ Checkpointing | Nodes periodically save their current state (file data and metadata) to disk, enabling quick and consistent restoration. |
| π Automatic Recovery | A node that rejoins the system automatically re-syncs missing files from active replicas to restore its complete dataset. |
| π Web Dashboard | A user-friendly Bootstrap UI to manage file uploads, downloads, and control actions like manual checkpointing and recovery. |
| π³ Dockerized Deployment | All components run in isolated Docker containers managed by Docker Compose for ease of setup and a realistic distributed environment. |
The system follows a classic Coordinator-Worker pattern. The Coordinator handles client requests and metadata, while the Nodes manage file storage and replication.
| Component | Technology | Role |
|---|---|---|
| Language | Python 3.x | Core implementation logic for all services. |
| Microservice Framework | Flask | Provides the REST API for both Coordinator and Node services. |
| Frontend | HTML + Bootstrap | Simple, functional web dashboard for interaction. |
| Containerization | Docker + Docker Compose | Defines, builds, and runs the multi-container distributed environment. |
| Storage | Local Volumes / JSON Metadata | Persistent storage for files and metadata persistence. |
| Communication | REST APIs | Inter-service communication between Coordinator and Nodes. |
fault_tolerant_storage/
βββ docker-compose.yml
βββ README.md
βββ coordinator/
β βββ app.py
β βββ Dockerfile
β βββ requirements.txt
βββ coordinator_data/
β βββ metadata.json
βββ node/
β βββ app.py
β βββ Dockerfile
β βββ requirements.txt
βββ node1_data/
βββ node2_data/
βββ node3_data/
git clone https://github.com/TUSHAR91316/Fault_tolrent_storage.git
cd fault_tolerant_storagedocker compose up --build -ddocker ps| Endpoint | Method | Description |
|---|---|---|
/ |
GET |
Main Web Dashboard |
/files |
POST |
Upload a new file |
/files/<file_id> |
GET |
Download file |
/checkpoint |
POST |
Trigger checkpoint |
/recover/<node_name> |
POST |
Recover node |
/status |
GET |
Retrieve system metadata |
| Endpoint | Method | Description |
|---|---|---|
/store |
POST |
Store a file |
/store/<file_id> |
GET |
Retrieve a file |
/checkpoint |
POST |
Create checkpoint |
/health |
GET |
Health check |
- Understanding Fault-Tolerant Distributed Systems.
- Implementing Replication and Recovery Protocols.
- Developing Flask Microservices & REST APIs.
- Deploying Multi-Container Systems with Docker Compose.
- Applying Checkpointing Concepts for Consistency.
- Automated Health Checks and Self-Healing.
- Auto-Checkpoint Timer.
- PostgreSQL / Redis Integration.
- File Versioning and Integrity Checks.
- Kubernetes Deployment.
Educational project under 21CSE479T β Fault Tolerant Systems.
π‘ 'A truly fault-tolerant system doesnβt prevent failure β it recovers from it automatically.'
.
