This project demonstrates K-Means clustering using synthetic data generated with make_blobs. It includes data visualization, elbow method for optimal cluster selection, and clustering visualization with centroids.
This notebook covers:
- Synthetic dataset generation
- Applying the K-Means algorithm
- Visualizing clusters and centroids
- Using the Elbow Method to find the optimal number of clusters
- Plotting results clearly
The dataset is generated using:
X, y = make_blobs(n_samples=2000, random_state=130)This creates:
- 2000 samples
- Multiple cluster centers (randomly assigned)
- Features suitable for clustering algorithms
Includes NumPy, Pandas, Matplotlib, Seaborn, and Scikit-Learn.
make_blobs creates a well-separated dataset ideal for clustering.
Initial clustering is done with:
k = 5
kmeans = KMeans(n_clusters=k)Outputs:
- Predicted cluster labels
- Cluster centroids
- Scatter plot of clustered data
Using the Elbow Method:
- Run K-Means for k = 1 to 10
- Store inertia (WCSS)
- Plot WCSS vs. k to identify the elbow point
Example:
k = 3
kmeans = KMeans(n_clusters=k)Visualizes:
- Data points colored by cluster assignment
- Centroids marked with
X - Grid, labels, and layout improvements for readability
- Raw clustered data (k=5)
- Elbow Method WCSS plot
- Final clustering visualization (k=3) with centroids
To understand:
- How K-Means clustering works
- How synthetic datasets are useful for unsupervised learning experiments
- How to select an appropriate number of clusters
- How clustering results can be visualized effectively
- Python 3
- NumPy
- Matplotlib & Seaborn
- Scikit-Learn
-
Install required dependencies:
pip install numpy pandas matplotlib seaborn scikit-learn
-
Run the notebook sequentially.
-
Modify
kvalues to experiment with different cluster counts.
This project demonstrates basic unsupervised learning using K-Means clustering.
If you'd like, I can also:
- Add silhouette score analysis
- Add comparison with DBSCAN or hierarchical clustering
- Combine all your READMEs into one portfolio guide