A comprehensive Data Science & Machine Learning foundation repository built using Python.
This repo focuses on hands-on analysis, real datasets, and practical ML workflows.
🔗 All notebooks are directly runnable on Google Colab — no local setup required.
This repository demonstrates end-to-end data analysis, statistics, and machine learning techniques, covering everything from raw data exploration to clustering and decision tree models.
Designed as a learning + portfolio repository, not just theory.
- Data Visualization & EDA
- Statistical Analysis & Probability
- Missing Data Handling
- Data Integration
- Clustering (K-Means, Hierarchical)
- PCA (Dimensionality Reduction)
- Decision Trees
- Hypothesis Testing
- Domain-based analysis (Movies, Music, Retail)
-
Basic Data Visualization
👉 Open in Colab -
Central Tendency & Data Dispersion
👉 Open in Colab -
Missing Data Handling
👉 Open in Colab -
Data Integration
👉 Open in Colab -
Probability-Based Sampling
👉 Open in Colab
-
Clustering (K-Means & Hierarchical)
👉 Open in Colab -
Decision Tree – Data Mining Algorithm
👉 Open in Colab -
Hypothesis Testing
👉 Open in Colab
Includes:
- Movie datasets
- Music datasets (ragas, emotions, mental health tags)
- Retail & transactional datasets
- Classification & clustering datasets
Used across notebooks for realistic analysis.
- Python
- NumPy
- Pandas
- Matplotlib
- Seaborn
- Scikit-learn
- Jupyter Notebook / Google Colab
✔ Practical Data Science skills
✔ Clean analytical workflows
✔ ML concepts applied on real data
✔ Portfolio-ready notebooks
✔ Industry-style experimentation
All notebooks can be opened and executed directly in Google Colab using the links above.
Ashish Kumar
GitHub: https://github.com/Ashishbadal-source
⭐ If this repository helped you or inspired you, consider giving it a star!