GapStatistics

Using Gap Statistics to find the optimal number of clusters in a dataset

I was working on a Seminar Project with this topic and made a few examples myself on RStudio to explain it properly. I also made a Powerpoint Presentation and my own research paper on the topic which will also be uploaded soon here. You will find here the following three examples :

NewDatasetExample - this is a dataset i created myself with latitude and logitude data and a third coloumn with waste. This has no optimal clusters because the gap is negative and keeps on decreasing. So this dataset has no clusters.
USArrestsExample - this is a pre-loaded dataset in RStudio which is widely used in cluster analysis projects. In this example I calculated the distance and also created a distance matrix. Then we calculate the gap statistics and plot the results and find the optimal number of clusters. I also analysed the data with Silhouette method and elbow method for comparison.
RockExample - another pre-loaded dataset in RStudio. I calculated the distance and plotted the distance matrix just like the previous example. Here I just have the gap statistics calculated to explain once again the method more throughly.
ISLR NCI60 - this is the original dataset used in the paper "Estimating the number of clusters in a data set via gap statistics : Robert Tibshirani, Guenther Walther and Trevor Hastie - Stanford University, USA". Here I have just plotted the hierarchical clustering using average linkange and eucledian distance, just to explain another approach for cluster analysis other than K-Means.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.RData		.RData
.Rhistory		.Rhistory
.gitignore		.gitignore
Gap Statistics.pdf		Gap Statistics.pdf
GapStatistics.Rproj		GapStatistics.Rproj
ISLR NCI60.R		ISLR NCI60.R
NewdatasetExample.R		NewdatasetExample.R
README.md		README.md
RockExample.R		RockExample.R
USArrestsExample.R		USArrestsExample.R
usarrr2.jpeg		usarrr2.jpeg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GapStatistics

About

Uh oh!

Releases

Packages

Languages

richienod0llar/GapStatistics

Folders and files

Latest commit

History

Repository files navigation

GapStatistics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages