Skip to content

TUSHAR91316/Fault_tolrent_storage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ’Ύ Fault-Tolerant File Storage System

🧠 Course: 21CSE479T β€” Fault Tolerant Systems

πŸ‘¨β€πŸ’» Developed by: Tushar


πŸ“– Overview

The Fault-Tolerant File Storage System is a distributed storage architecture designed to ensure data reliability, high availability, and automatic self-recovery from node failures.

This project implements triple-replication across multiple storage nodes using Flask microservices, utilizes checkpointing for persistent metadata, and provides automatic node recovery. The entire system is containerized for realistic, isolated deployment using Docker Compose.


βš™οΈ Key Features

Feature Description
🧱 Triple Replication Each uploaded file is redundantly stored on 3 independent storage nodes to guarantee data persistence.
⚑ Fault Tolerance The system remains fully operational and data remains accessible even if up to two nodes fail simultaneously.
πŸ’Ύ Checkpointing Nodes periodically save their current state (file data and metadata) to disk, enabling quick and consistent restoration.
πŸ”„ Automatic Recovery A node that rejoins the system automatically re-syncs missing files from active replicas to restore its complete dataset.
🌐 Web Dashboard A user-friendly Bootstrap UI to manage file uploads, downloads, and control actions like manual checkpointing and recovery.
🐳 Dockerized Deployment All components run in isolated Docker containers managed by Docker Compose for ease of setup and a realistic distributed environment.

πŸ—οΈ System Architecture

The system follows a classic Coordinator-Worker pattern. The Coordinator handles client requests and metadata, while the Nodes manage file storage and replication.

Gemini_Generated_Image_b3ivnzb3ivnzb3iv

🧰 Tech Stack

Component Technology Role
Language Python 3.x Core implementation logic for all services.
Microservice Framework Flask Provides the REST API for both Coordinator and Node services.
Frontend HTML + Bootstrap Simple, functional web dashboard for interaction.
Containerization Docker + Docker Compose Defines, builds, and runs the multi-container distributed environment.
Storage Local Volumes / JSON Metadata Persistent storage for files and metadata persistence.
Communication REST APIs Inter-service communication between Coordinator and Nodes.

πŸ“‚ Folder Structure

fault_tolerant_storage/
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ README.md
β”œβ”€β”€ coordinator/
β”‚   β”œβ”€β”€ app.py
β”‚   β”œβ”€β”€ Dockerfile
β”‚   └── requirements.txt
β”œβ”€β”€ coordinator_data/
β”‚   └── metadata.json
β”œβ”€β”€ node/
β”‚   β”œβ”€β”€ app.py
β”‚   β”œβ”€β”€ Dockerfile
β”‚   └── requirements.txt
β”œβ”€β”€ node1_data/
β”œβ”€β”€ node2_data/
└── node3_data/

πŸš€ Setup Instructions

1️⃣ Clone the Repository

git clone https://github.com/TUSHAR91316/Fault_tolrent_storage.git
cd fault_tolerant_storage

2️⃣ Build and Run All Services

docker compose up --build -d

3️⃣ Verify Running Containers

docker ps

4️⃣ Access the Web Dashboard

πŸ‘‰ http://localhost:5000


πŸ§ͺ How to Test and Demonstrate Fault Tolerance

Step 1: Upload and Replicate

Step 2: Simulate Node Failure

Step 3: Recover and Re-synchronize

Step 4: Create Checkpoint (Optional)

Step 5: Stop All Containers


🧩 Endpoints Summary (for Developers)

Coordinator API

Endpoint Method Description
/ GET Main Web Dashboard
/files POST Upload a new file
/files/<file_id> GET Download file
/checkpoint POST Trigger checkpoint
/recover/<node_name> POST Recover node
/status GET Retrieve system metadata

Node API

Endpoint Method Description
/store POST Store a file
/store/<file_id> GET Retrieve a file
/checkpoint POST Create checkpoint
/health GET Health check

🧠 Learning Outcomes

  • Understanding Fault-Tolerant Distributed Systems.
  • Implementing Replication and Recovery Protocols.
  • Developing Flask Microservices & REST APIs.
  • Deploying Multi-Container Systems with Docker Compose.
  • Applying Checkpointing Concepts for Consistency.

πŸ“Š Possible Extensions

  • Automated Health Checks and Self-Healing.
  • Auto-Checkpoint Timer.
  • PostgreSQL / Redis Integration.
  • File Versioning and Integrity Checks.
  • Kubernetes Deployment.

πŸ“œ License

Educational project under 21CSE479T – Fault Tolerant Systems.


πŸ’‘ 'A truly fault-tolerant system doesn’t prevent failure β€” it recovers from it automatically.' Screenshot 2025-11-03 214216.

About

Fault-Tolerant File Storage System is a distributed, Dockerized storage solution that ensures high availability and data reliability using triple replication across independent nodes, allowing the system to remain operational even during node failures.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors