Needed for python utils like data_generator and scatter_plot.
pip install -r py_utils/requirements.txtUnit testing for C++.
git clone https://github.com/catchorg/Catch2.gitThis script can be used to generate random datasets. If attribute -k is specified the dataset contains clusterized points
python3 py_utils/data_generator.py -n 1000 -d 3 -k 4 -min 0 -max 10 -o datasets/3Dpoints.csvThis script can be used to display results (reading csv output files).
python3 py_utils/scatter_plot.py -f 3Dpoints.csv -d 3These are the commands that can be used to compile GPU Kmeans.
Notice: cmake is required.
mkdir -p build
cd build
cmake ..
cmake --build .This will build the target gpukmeans to build/src/bin/gpukmeans.
This code has is provided in scripts/build.sh so that (from the root of the project) you can run ./scripts/build.sh to build gpukmeans.
The target unit_kernels is excluded from ALL_TARGETS therefore it has to be compiled explicitly.
cmake --build . --target unit_kernelsUse the script scripts/run_tests.sh to build and run unit tests.
The executable gpukmeans reads inputs from stdin or from csv file. The directory datasets contains some examples of data input.
./build/src/bin/gpukmeans --help
gpukmeans is an implementation of the K-means algorithm that uses a GPU
Usage:
gpukmeans [OPTION...]
-h, --help Print usage
-d, --n-dimensions arg Number of dimensions of a point
-n, --n-samples arg Number of points
-k, --n-clusters arg Number of clusters
-m, --maxiter arg Maximum number of iterations
-o, --out-file arg Output filename
-i, --in-file arg Input filename
-r, --runs arg Number of k-means runs (default: 1)
-s, --seed arg Seed for centroids generator
-t, --tolerance arg Tolerance to declare convergenceExample:
./build/src/bin/gpukmeans -d 64 -n 1797 -k 10 -m 300 -o res.csv -i ./datasets/N1797_D64_digits-sklearn.csv -t 0.0001Runs gpukmeans on the dataset N1797_D64_digits-sklearn.csv considering: 64 dimensions, 1797 points, 10 clusters, 300 maximum iterations, tolerance 1e-4, and output the results to res.csv.