- Released: Week 1 of C3 Part 1
- Due: End of Week 5 C3 Part 1
This MP (Machine Programming assignment) is about implementing a membership protocol similar to one discussed in class.
Since it is infeasible to run a thousand cluster nodes (peers) over a real network, we provide an implementation of an emulated network layer (EmulNet). Your membership protocol implementation will sit above EmulNet in a peer-to-peer (P2P) layer, but below an Application layer.
Conceptually this is a three-layer protocol stack:
Application Layer
P2P Layer
EmulNet Layer
Your protocol must satisfy:
-
Completeness at all times:
Every non-faulty process must detect every node join, failure, and leave. -
Accuracy of failure detection:
When there are no message losses and message delays are small, failure detection must be accurate. -
Robustness under message loss:
When messages are lost, completeness must still hold and accuracy should remain high. -
The protocol must function correctly even under simultaneous multiple failures.
You may implement any of the membership protocols discussed in class:
- All-to-all heartbeating
- Gossip-style heartbeating
- SWIM-style membership
Recommendation: Implement gossip or SWIM, as they provide deeper learning.
The template code is written in C++, one of the most commonly used languages in systems programming. You will need C++11 or later (gcc ≥ 4.7).
Template code download:
https://spark-public.s3.amazonaws.com/cloudcomputing/assignments/mp1/mp1.zip
Autograder scripts (unit tests) will also be provided to test your program.
All work must be individual.
You may discuss:
- The MP specification
- General concepts
You may not:
- Discuss solutions
- Share code
- Copy code
Your code may be checked for structural similarity.
The framework allows running multiple peers inside a single process using a single-threaded simulation engine.
EmulNet provides the following functions that your membership protocol should use:
void *ENinit(Address *myaddr, short port);
int ENsend(Address *myaddr, struct address *addr, string data);
int ENrecv(Address *myaddr, int (* enqueue)(void *, char *, int), struct timeval *t, int times, void *queue);
int ENcleanup();ENinit
- Called once by each node
- Initializes its address (
myaddr)
ENsend
- Sends a message to another peer
ENrecv
- Receives messages waiting for a node
- Enqueues them using the function pointer
enqueue()
ENcleanup
- Cleans up the EmulNet implementation at the end of the simulation
Notes:
- Parameters
tandtimesinENrecvare currently unused. ENsendandENrecvcan be assumed reliable when the network has no message losses.
EmulNet.cpp
EmulNet.h
These files will be replaced during testing. Access the network layer only through the provided functions.
This layer drives the simulation.
Files:
Application.cpp
Application.h
Look at the main() function.
The simulation runs in synchronous periods, controlled by the variable: globaltime
During each period:
- Some peers may start
- Some peers may crash-stop
For every peer that is alive, the function: nodeLoop() is called.
nodeLoop() is implemented in the P2P layer (MP1Node.cpp).
Its responsibilities include:
- Receiving messages sent during the previous period
- Checking whether the application has new requests
Application.cpp
Application.h
Files:
MP1Node.cpp
MP1Node.h
This layer implements the membership protocol. All your implementation code should go here.
This layer could eventually support distributed system operations such as:
- File insertion
- File lookup
- File removal
Currently the code prints debugging messages into: dbg.log
Format:
node_address [globaltime] message
Debugging can be enabled or disabled by commenting/uncommenting:
#define DEBUGLOGin the header file: stdincludes.h
Two message types currently exist in the P2P layer:
JOINREQJOINREP
JOINREQmessages are received by the introducer.- The introducer is the first peer to join the system.
For Linux systems this is typically: 1.0.0.0:0 due to big-endianness.
A good starting point is to implement JOINREP responses from the introducer.
All code must go inside:
MP1Node.cpp
MP1Node.h
Do not modify any other files, since they will be replaced during grading.
You must implement:
nodeLoopOps()recvCallBack()
These functions are invoked by nodeLoop() to perform periodic protocol operations.
Each new peer contacts a well-known peer (the introducer).
This uses:
JOINREQJOINREP
Current behavior:
JOINREQreaches the introducerJOINREPis not implemented
You must implement JOINREP so that it includes the cluster membership list.
The introducer does not need a full list of all peers. A partial list of fixed size is acceptable.
Your protocol must satisfy:
- Completeness for joins and failures
- Accuracy when there are no delays or message losses
- High accuracy when losses or delays occur
Recommended approaches:
- Gossip-style heartbeating
- SWIM membership
You will likely need to modify:
struct member
enum MsgTypesin: Mp1Node.h
New message types should be processed using functions similar to those used for JOINREQ and JOINREP.
Logging is critical because the grader uses the logs to evaluate correctness.
Files:
Log.cpp
Log.h
The logging system includes:
LOG()→ prints node status todbg.loglogNodeAdd()→ logs a node being addedlogNodeRemove()→ logs a node being removed
Whenever a node adds or removes a member, call the appropriate function.
Function parameters:
- Address of the recording process
- Address of the node being added or removed
Example:
logNodeAdd(self_address, added_node_address)
logNodeRemove(self_address, removed_node_address)
The grader will check these entries.
Contains: setparams()
This initializes parameters such as:
- Number of peers (
EN_GPSZ) - Global time (
globaltime)
Contains necessary definitions and declarations.
Two main reasons:
The EN*() functions can be replaced by implementations using TCP sockets.
With minor changes such as:
- Replacing periodic calls with threads
- Modifying
nodeLoop()scheduling
the system can run on a real distributed network.
Running many distributed processes on a real network is hard to debug.
This simulation framework allows:
- Debugging hundreds of peers
- Measuring protocol behavior
- Running everything on one host machine
Once the protocol works in simulation, it can be migrated to real networks.
Compile the code:
make
Run the program:
./Application testcases/<test_name>.conf
Configuration files contain parameters such as:
MAX_NNB: val
SINGLE_FAILURE: val
DROP_MSG: val
MSG_DROP_PROB: val
Parameter meanings:
- MAX_NNB – Maximum number of neighbors
- SINGLE_FAILURE – Single vs multiple failure scenario
- MSG_DROP – Whether message dropping is enabled
- MSG_DROP_PROB – Probability of message drop (0–1)
Script for grading is:
./Grader.sh
It evaluates three scenarios:
- Single node failure
- Multiple node failures
- Single node failure under a lossy network
Metrics tested:
- All nodes join correctly
- Failed nodes are detected (completeness)
- Correct node is identified as failed (accuracy)
Test configurations are stored in the: testcases/ directory.
Total points: 90 points
The grader reports how many points your implementation receives.
Your goal is to pass all tests on the Coursera grading platform.
Make the script executable:
chmod +x Grader.sh
Run the grader:
./Grader.sh
Ensure you are using C++11 or newer (gcc ≥ 4.7).