Exploring exploratory data analysis (EDA), feature engineering, and model evaluation on the classic Titanic dataset.
This project is designed as a hands‑on learning resource for understanding how preprocessing choices and algorithm selection impact predictive performance.
The goal is to predict passenger survival on the Titanic using machine learning.
This matters because the dataset is a benchmark problem for classification, widely used to practice end‑to‑end ML workflows: data cleaning, feature engineering, model training, evaluation, and interpretability.
- Source: Kaggle Titanic Dataset
- Key Features:
- Passenger demographics (Age, Sex, SibSp, Parch)
- Ticket and cabin information
- Socio‑economic indicators (Fare, Class)
- Survival outcome (target variable)
- EDA → Inspect missing values, distributions, correlations
- Feature Engineering → Encode categorical variables, impute missing data, create derived features
- Modeling → Train multiple ML algorithms (Logistic Regression, Decision Trees, Random Forests, Gradient Boosting)
- Evaluation → Compare models using accuracy, precision, recall, F1 score, and ROC‑AUC
pip install -r requirements.txt