Skip to content

rodinaahmed66/K-Means-Clustering-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

K-Means Clustering – README

This project demonstrates K-Means clustering using synthetic data generated with make_blobs. It includes data visualization, elbow method for optimal cluster selection, and clustering visualization with centroids.


📌 Project Overview

This notebook covers:

  • Synthetic dataset generation
  • Applying the K-Means algorithm
  • Visualizing clusters and centroids
  • Using the Elbow Method to find the optimal number of clusters
  • Plotting results clearly

📁 Dataset Generation

The dataset is generated using:

X, y = make_blobs(n_samples=2000, random_state=130)

This creates:

  • 2000 samples
  • Multiple cluster centers (randomly assigned)
  • Features suitable for clustering algorithms

🧭 Workflow Summary

1. Import Required Libraries

Includes NumPy, Pandas, Matplotlib, Seaborn, and Scikit-Learn.

2. Generate Synthetic Data

make_blobs creates a well-separated dataset ideal for clustering.

3. Apply K-Means Clustering

Initial clustering is done with:

k = 5
kmeans = KMeans(n_clusters=k)

Outputs:

  • Predicted cluster labels
  • Cluster centroids
  • Scatter plot of clustered data

4. Find Optimal Number of Clusters (k)

Using the Elbow Method:

  • Run K-Means for k = 1 to 10
  • Store inertia (WCSS)
  • Plot WCSS vs. k to identify the elbow point

5. Re-Cluster Using Optimal k

Example:

k = 3
kmeans = KMeans(n_clusters=k)

Visualizes:

  • Data points colored by cluster assignment
  • Centroids marked with X
  • Grid, labels, and layout improvements for readability

📊 Visualizations Included

  • Raw clustered data (k=5)
  • Elbow Method WCSS plot
  • Final clustering visualization (k=3) with centroids

🎯 Goal of the Project

To understand:

  • How K-Means clustering works
  • How synthetic datasets are useful for unsupervised learning experiments
  • How to select an appropriate number of clusters
  • How clustering results can be visualized effectively

⚙️ Technologies Used

  • Python 3
  • NumPy
  • Matplotlib & Seaborn
  • Scikit-Learn

🚀 How to Run This Notebook

  1. Install required dependencies:

    pip install numpy pandas matplotlib seaborn scikit-learn
  2. Run the notebook sequentially.

  3. Modify k values to experiment with different cluster counts.


Author

This project demonstrates basic unsupervised learning using K-Means clustering.


If you'd like, I can also:

  • Add silhouette score analysis
  • Add comparison with DBSCAN or hierarchical clustering
  • Combine all your READMEs into one portfolio guide

Releases

No releases published

Packages

 
 
 

Contributors