Machine Learning Learning Guide

Overview

Machine learning is a collection of algorithms that learn patterns from data to make predictions or decisions. This learning material systematically covers from basic concepts of machine learning to key algorithms and practical applications.

Learning Roadmap

ML Overview → Linear Regression → Logistic Regression → Model Evaluation → Cross-Validation/Hyperparameters
                                                ↓
                Practical Projects ← Pipelines ← Dimensionality Reduction ← Clustering ← k-NN/Naive Bayes
                        ↓                                                                  ↑
                Feature Engineering → Explainability → Imbalanced Data        Decision Trees → Ensemble(Bagging)
                        ↓                                                         → Ensemble(Boosting) → SVM ──┘
                Time Series ML → AutoML → Anomaly Detection → Advanced Ensemble
                        ↓
                Production ML Serving → A/B Testing for ML → Symbolic Regression

File List

File	Topic	Key Content
01_ML_Overview.md	ML Overview	Supervised/Unsupervised/Reinforcement Learning, ML Workflow, Bias-Variance Tradeoff
02_Linear_Regression.md	Linear Regression	Simple/Multiple Regression, Gradient Descent, Regularization (Ridge/Lasso)
03_Logistic_Regression.md	Logistic Regression	Binary Classification, Sigmoid Function, Multiclass (Softmax)
04_Model_Evaluation.md	Model Evaluation	Accuracy, Precision, Recall, F1-score, ROC-AUC
05_Cross_Validation_Hyperparameters.md	Cross-Validation & Hyperparameters	K-Fold CV, GridSearchCV, RandomizedSearchCV
06_Decision_Trees.md	Decision Trees	CART, Entropy, Gini Impurity, Pruning
07_Ensemble_Bagging.md	Ensemble - Bagging	Random Forest, Feature Importance, OOB Error
08_Ensemble_Boosting.md	Ensemble - Boosting	AdaBoost, Gradient Boosting, XGBoost, LightGBM
09_SVM.md	SVM	Support Vectors, Margin, Kernel Trick
10_kNN_and_Naive_Bayes.md	k-NN & Naive Bayes	Distance-based Classification, Probability-based Classification
11_Clustering.md	Clustering	K-Means, DBSCAN, Hierarchical Clustering
12_Dimensionality_Reduction.md	Dimensionality Reduction	PCA, t-SNE, Feature Selection
13_Pipelines_and_Practice.md	Pipelines & Practice	sklearn Pipeline, ColumnTransformer, Model Saving
14_Practical_Projects.md	Practical Projects	Kaggle Problem Solving, Classification/Regression Practice
15_Feature_Engineering.md	Feature Engineering	Numerical/Categorical/Temporal Transforms, Feature Selection, Featuretools
16_Model_Explainability.md	Model Explainability	SHAP, LIME, PDP/ICE, Fairness Metrics
17_Imbalanced_Data.md	Imbalanced Data	SMOTE/ADASYN, Cost-sensitive Learning, Threshold Optimization
18_Time_Series_ML.md	Time Series ML	Lag/Rolling Features, TimeSeriesSplit, Prophet, Tree-based Forecasting
19_AutoML_Hyperparameter_Optimization.md	AutoML & Hyperparameter Optimization	Optuna, Auto-sklearn, FLAML, H2O AutoML
20_Anomaly_Detection.md	Anomaly Detection	Isolation Forest, LOF, One-Class SVM, PyOD
21_Advanced_Ensemble.md	Advanced Ensemble	Stacking, Blending, Meta-Learner, Diverse Base Learners, Competition Strategies
22_Production_ML_Serving.md	Production ML Serving	Model Optimization, Serving Patterns, Training-Serving Skew, Drift Detection
23_AB_Testing_for_ML.md	A/B Testing for ML	Power Analysis, Hypothesis Testing, Sequential Testing, Multi-Armed Bandits, Interleaving
24_Symbolic_Regression.md	Symbolic Regression	Expression Trees, Genetic Programming, Pareto Front, PySR, gplearn, SINDy

Environment Setup

Install Required Libraries

# Using pip
pip install numpy pandas matplotlib seaborn scikit-learn

# Additional libraries (boosting)
pip install xgboost lightgbm catboost

# Jupyter Notebook (recommended)
pip install jupyter
jupyter notebook

Version Check

import sklearn
import xgboost
import lightgbm

print(f"scikit-learn: {sklearn.__version__}")
print(f"XGBoost: {xgboost.__version__}")
print(f"LightGBM: {lightgbm.__version__}")

Recommended Versions

Python: 3.9+
scikit-learn: 1.2+
XGBoost: 1.7+
LightGBM: 3.3+

Recommended Learning Order

Stage 1: Basic Theory (01-04)

Understand machine learning concepts
Basics of regression and classification
Model evaluation methods

Stage 2: Model Tuning (05)

Cross-validation
Hyperparameter optimization

Stage 3: Tree-based Models (06-08)

Decision trees
Ensemble techniques

Stage 4: Other Algorithms (09-10)

SVM
k-NN, Naive Bayes

Stage 5: Unsupervised Learning (11-12)

Clustering
Dimensionality reduction

Stage 6: Practice & Projects (13-14)

Building pipelines
Real-world problem solving

Stage 7: Advanced Topics (15-21)

Feature engineering and model explainability
Handling imbalanced data and time series
AutoML, hyperparameter optimization, anomaly detection
Advanced ensemble methods (stacking, blending)

Stage 8: Production (22-23)

Model optimization and serving patterns
A/B testing and online experimentation

Stage 9: Interpretable Discovery (24)

Symbolic regression: discovering equations from data

Algorithm Selection Guide

Identify Problem Type
    │
    ├── Has Labels (Supervised Learning)
    │       ├── Continuous Target → Regression
    │       │       ├── Linear Relationship → Linear Regression
    │       │       ├── Non-linear → Trees, Ensemble
    │       │       └── Interpretability Important → Linear Regression, Decision Trees
    │       │
    │       └── Categorical Target → Classification
    │               ├── Binary Classification → Logistic, SVM, Trees
    │               ├── Multiclass → Logistic (softmax), Trees
    │               └── Need Probabilities → Logistic, Naive Bayes
    │
    └── No Labels (Unsupervised Learning)
            ├── Grouping → Clustering
            │       ├── Spherical Clusters → K-Means
            │       └── Arbitrary Shapes → DBSCAN
            │
            └── Dimensionality Reduction → PCA, t-SNE

References

Official Documentation

Recommended Datasets

Recommended Books

"Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow" - Aurélien Géron
"An Introduction to Statistical Learning" - James et al.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Machine Learning Learning Guide

Overview

Learning Roadmap

File List

Environment Setup

Install Required Libraries

Version Check

Recommended Versions

Recommended Learning Order

Stage 1: Basic Theory (01-04)

Stage 2: Model Tuning (05)

Stage 3: Tree-based Models (06-08)

Stage 4: Other Algorithms (09-10)

Stage 5: Unsupervised Learning (11-12)

Stage 6: Practice & Projects (13-14)

Stage 7: Advanced Topics (15-21)

Stage 8: Production (22-23)

Stage 9: Interpretable Discovery (24)

Algorithm Selection Guide

References

Official Documentation

Recommended Datasets

Recommended Books

FilesExpand file tree

00_Overview.md

Latest commit

History

00_Overview.md

File metadata and controls

Machine Learning Learning Guide

Overview

Learning Roadmap

File List

Environment Setup

Install Required Libraries

Version Check

Recommended Versions

Recommended Learning Order

Stage 1: Basic Theory (01-04)

Stage 2: Model Tuning (05)

Stage 3: Tree-based Models (06-08)

Stage 4: Other Algorithms (09-10)

Stage 5: Unsupervised Learning (11-12)

Stage 6: Practice & Projects (13-14)

Stage 7: Advanced Topics (15-21)

Stage 8: Production (22-23)

Stage 9: Interpretable Discovery (24)

Algorithm Selection Guide

References

Official Documentation

Recommended Datasets

Recommended Books