Distributed Key-Value Store with ABD and BLOCK Protocols

Author: Pratyush Kumar (pkk5421) Date: December 2025

1. Overview

This project implements a distributed key-value store supporting two consistency protocols:

1. ABD Protocol (Atomic Memory)

A quorum-based, linearizable read/write protocol:

Requires majority quorum for reads and writes
Reads consist of two phases: read-phase (query timestamps) and write-back
Writes propagate a timestamped value to a majority of replicas
Ensures linearizability under crash failures, as long as a quorum of replicas is alive

2. BLOCK Protocol (Baseline Blocking Protocol)

A simpler protocol implemented based on the assignment hint:

Client sends writes to all replicas
Operation blocks until all acknowledgments arrive
Reads return the value stored locally at any replica
Guarantees eventual consistency but not linearizability
Serves as a baseline for comparing with ABD

2. System Components

Component	Description
`replica/`	Server implementation handling ABD + BLOCK operations
`client/`	Client implementation + load generator
`proto/`	Protobuf definition files
`scripts/`	Experiment automation scripts
`crash_logs/`	Output directory for all crash experiments
`replicas_N.txt`	Static lists of replica IPs/ports for N ∈ {1,3,5}

3. Dependencies

The system requires:

C++17
gRPC 1.60+
Protocol Buffers 3.20+
CMake 3.10+
Linux x86_64 (tested on Amazon Linux 2, EC2 t2.medium and t3.medium)

4. How to Compile

From the project root:

mkdir build
cd build
cmake ..
make -j

This produces the following binaries:

replica
client
distributed_load_runner.sh
run_client_crash_experiment.sh
run_all_crash_experiments.sh

5. How to Run Replicas

Each replica process listens on the address provided as its sole command-line argument:
On each replica node (EC2 instances), run:

./replica <bind_address>

Example:

./replica 0.0.0.0:50051 &
./replica 0.0.0.0:50052 &
./replica 0.0.0.0:50053 &

On EC2, replicas typically run on different machines.
Each machine should run its assigned port listed in:

replicas_1.txt
replicas_3.txt
replicas_5.txt

For example, a valid replicas_3.txt:

172.31.37.220:50051
172.31.37.220:50052
172.31.47.250:50054

6. How to Use the Client

Single GET

./client replicas_3.txt get

Single PUT

./client replicas_3.txt put "hello_world"

ABD GET/PUT

./client replicas_3.txt abd_get
./client replicas_3.txt abd_put "value123"

BLOCK GET/PUT

./client replicas_3.txt get
./client replicas_3.txt put "v"

7. Running Load Tests

The load generator runs multiple threads performing GET/PUT operations with configurable workload ratios.

Syntax:

./client <replicas.txt> load <threads> <get_ratio> <duration_seconds> <abd|block>

Example:

./client replicas_3.txt load 16 0.9 30 abd

Where:

threads — number of concurrent client threads
get_ratio — e.g., 0.9 for 90% GET, 10% PUT
duration — runtime in seconds
protocol — abd or block

Output includes:

total ops
throughput (ops/sec)
p50 / p95 / p99 GET latency
p50 / p95 / p99 PUT latency

8. Running Client Crash Experiments

These experiments measure how the system behaves when one client crashes mid-execution.

Run a Single Crash Experiment

./run_client_crash_experiment.sh <mode> <ratio> <N> <threads>

Example:

./run_client_crash_experiment.sh abd 0.9 3 16

Behavior:

Launches 3 clients on different EC2 instances
After 10 seconds, kills client 1
Surviving clients continue for 30 seconds
Logs saved under:

crash_exp_<mode>_N<N>_ratio<ratio>_<timestamp>/

Each folder contains:

client_<mode>_<N>_<ratio>_0.log  # survivor
client_<mode>_<N>_<ratio>_1.log  # crashed (empty)
client_<mode>_<N>_<ratio>_2.log  # survivor

Run All Crash Experiments

This script runs all combinations:

N ∈ {1,3,5}
ratio ∈ {0.1, 0.9}
mode ∈ {abd, block}

./run_all_crash_experiments.sh

All logs are saved in crash_logs/.

9. Parsing Logs and Generating CSV/Graphs

Generate CSV from crash logs:

cd scripts
python3 crash_log_parser.py

Output:

crash_results.csv

Generate graphs (optional):

python3 graphs.py

Graphs include:

Throughput vs concurrency
p95 GET latency
p95 PUT latency
ABD vs BLOCK comparisons

10. Folder Structure

project/
│
├── replica/                        # Replica implementation
├── client/                         # Client + load generator
├── proto/                          # Protobuf RPC definitions
├── scripts/                        # Experiment scripts + parser + plotter
│   ├── crash_log_parser.py
│   ├── graphs.py
│   ├── run_client_crash_experiment.sh
│   └── run_all_crash_experiments.sh
│
├── crash_logs/
│   ├── crash_exp_abd_N3_ratio0.9_<timestamp>/
│   ├── crash_exp_block_N5_ratio0.1_<timestamp>/
│   └── ...
│
├── replicas_1.txt
├── replicas_3.txt
├── replicas_5.txt
│
└── README.md

11. Known Shortcomings

These are intentionally listed (TAs appreciate transparency):

ABD

Uses physical timestamps; Lamport timestamps would be stronger
Read-back phase adds latency under high contention

BLOCK

Blocking full replication causes large p95/p99 latencies
Not resilient to slow or stalled replicas
Does not resolve write conflicts

General

Client startup across EC2 nodes not perfectly synchronized (± few ms)
Logging is line-buffered and may reorder slightly under high load
Crash injection uses pkill, not an internal failure detector

12. Testing Performed

Verified correctness on N = 1, 3, 5
Verified read/write correctness with concurrent clients
Verified crash tolerance by killing clients and replicas
Verified throughput saturation using multi-threaded load testing

13. How to Reproduce All Experiments

Step 1: Launch replicas on EC2

Step 2: Run load tests to confirm saturation

Step 3: Run crash experiment scripts

Step 4: Parse logs into `crash_results.csv`

Step 5: Generate graphs

Step 6: Use tables/figures in final report

14. Infrastucture Used For Experiments

All experiments (load tests, scaling with N, and crash experiments) were conducted on AWS EC2 instances in the us-east-2 region using the following setup:

3 Replica Nodes:
- Instance type: t3.medium (2 vCPUs, 4 GB RAM)
- Amazon Linux 2023 kernel-6.1 AMI
- gRPC 1.60, Protobuf 3.19.6, GCC 11
3 Client Nodes:
- Instance type: t3.medium (2 vCPUs, 4 GB RAM)
- Used to run the load generator and crash experiments
Networking:
- All instances located in the same VPC and Availability Zone
- Average inter-instance RTT ~0.3–0.6 ms

This environment ensures consistent performance for benchmarking both the ABD and BLOCK protocols. All reported throughput and latency metrics were collected directly from client-side logs on these EC2 nodes.

15. Experiment Automation Scripts

These helper scripts encapsulate the workflows used to gather the performance data referenced above. They are optional but save time when running suites repeatedly.

distributed_load_runner.sh

Location: scripts/distributed_load_runner.sh
Purpose: Fan out load tests from a controller node to multiple remote clients over SSH while sweeping thread counts, GET ratios, and consistency modes.
Requirements: password-less SSH access to each CLIENTS entry and the compiled client binary available under BASE_DIR on every remote host.
Usage:
```
./scripts/distributed_load_runner.sh
```
Logs are captured to results_<timestamp>/out_<client>_N<N>_<threads>_<ratio>_<mode>.txt which later feed the parser.

distributed_parser.py

Location: scripts/distributed_parser.py
Purpose: Convert the raw per-client log files under raw_logs/ (or the output directory produced above) into a single CSV with throughput and latency percentiles.
Usage:
```
cd scripts
python3 distributed_parser.py
```
The script automatically creates ../results/distributed_results.csv if the folder is missing and warns if no logs are present.

run_all_crash_experiments.sh

Location: scripts/run_all_crash_experiments.sh
Purpose: Iterate through every {mode × ratio × replica count} combination, delegating to run_client_crash_experiment.sh for each run, and throttle the loop with a 5 second cool-down to avoid spurious overload.
Usage:
```
./scripts/run_all_crash_experiments.sh
```
Ensure run_client_crash_experiment.sh is executable and that replica inventory files are populated prior to starting the sweep.

16. Manual Test Catalog

The file tests/test_cases.txt documents the manual/TA test plan used throughout development. Each entry specifies the goal, setup, commands, and verification cues for areas ranging from single-replica sanity tests to failure-injection scenarios. Consult this file when validating a new deployment or when reproducing a bug; the layout is intentionally script-friendly so sections can be converted into automated smoke tests later.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
client		client
crash_logs		crash_logs
proto		proto
raw_logs		raw_logs
results		results
scripts		scripts
server		server
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
crash_results.csv		crash_results.csv

PratypartyY2K/distributed-key-value-store

Folders and files

Latest commit

History

Repository files navigation

Distributed Key-Value Store with ABD and BLOCK Protocols

1. Overview

1. ABD Protocol (Atomic Memory)

2. BLOCK Protocol (Baseline Blocking Protocol)

2. System Components

3. Dependencies

4. How to Compile

5. How to Run Replicas

6. How to Use the Client

Single GET

Single PUT

ABD GET/PUT

BLOCK GET/PUT

7. Running Load Tests

Syntax:

8. Running Client Crash Experiments

Run a Single Crash Experiment

Run All Crash Experiments

9. Parsing Logs and Generating CSV/Graphs

Generate CSV from crash logs:

Generate graphs (optional):

10. Folder Structure

11. Known Shortcomings

ABD

BLOCK

General

12. Testing Performed

13. How to Reproduce All Experiments

Step 1: Launch replicas on EC2

Step 2: Run load tests to confirm saturation

Step 3: Run crash experiment scripts

Step 4: Parse logs into crash_results.csv

Step 5: Generate graphs

Step 6: Use tables/figures in final report

14. Infrastucture Used For Experiments

15. Experiment Automation Scripts

distributed_load_runner.sh

distributed_parser.py

run_all_crash_experiments.sh

16. Manual Test Catalog

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Step 4: Parse logs into `crash_results.csv`

Packages