Clustering

We will implement some clustering algorithms from scratch and we will test on two data sets constituted by some 2-dimensional distributions of points. Then we will apply our algorithms to a real-word data set.

TODO: Generate data sets DS1 (non overlapping blobs), DS2 (overlapping blobs) and load DS3 from the file iris.csv.

Implement K-Means, Fuzzy C- Means and Graded Possibilistic C-Means.

Implement WTA and the α−cut defuzzifiers of fuzzy partitions.

Implement RAND and Jaccard Indeces for hard partition comparison

Apply K-Means, Fuzzy C-Means and Graded Possibilistic C-Means to the 3 data sets using a multi-start approach; search for 2, 3, and 4 clusters.

Defuzzify the soft partitions of Fuzzy C- Means and Graded Possibilistic C-Means using the WTA (Winner-Takes-All) criterion.

Visualize the results on the scatter plot, highlighting the centroids and using a different color for each cluster.

Measure the accuracy of the hard partitions by comparing them with the ground-truth constituted by the targets of the data sets. For the comparison use RAND and Jaccard indeces.

For the Graded Possibilistic C-Means use a possibilistic degree 𝛽=0.8 and a value of 𝜂 (identical for each cluster) comparable with the standard_dev 2 for data sets DS1 and DS2. For DS3 (Iris data set) 𝜂 must be selected by checking the value of the accuracy (model selection - grid search).

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
HW_CI_4-clustering.py		HW_CI_4-clustering.py
README.md		README.md
consensus-matrix.jpg		consensus-matrix.jpg
iris.csv		iris.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Clustering

About

Uh oh!

Releases

Packages

Languages

Amaan895469/Clustering

Folders and files

Latest commit

History

Repository files navigation

Clustering

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages