perform pre-processing and set initial cluster points (See issue #1 )
assign N/P data points to each machine
while not converged:
broadcast current centroids to all machines
on each machine: <-- MPI
compute membership of each point <-- OpenMP
broadcast membership vector
compute new centroids based on membership vectors <-- OpenMP