Titanic ML Pipeline

Exploring exploratory data analysis (EDA), feature engineering, and model evaluation on the classic Titanic dataset.
This project is designed as a hands‑on learning resource for understanding how preprocessing choices and algorithm selection impact predictive performance.

Project Overview

The goal is to predict passenger survival on the Titanic using machine learning.
This matters because the dataset is a benchmark problem for classification, widely used to practice end‑to‑end ML workflows: data cleaning, feature engineering, model training, evaluation, and interpretability.

Dataset Description

Source: Kaggle Titanic Dataset
Key Features:
- Passenger demographics (Age, Sex, SibSp, Parch)
- Ticket and cabin information
- Socio‑economic indicators (Fare, Class)
- Survival outcome (target variable)

Pipeline Summary

EDA → Inspect missing values, distributions, correlations
Feature Engineering → Encode categorical variables, impute missing data, create derived features
Modeling → Train multiple ML algorithms (Logistic Regression, Decision Trees, Random Forests, Gradient Boosting)
Evaluation → Compare models using accuracy, precision, recall, F1 score, and ROC‑AUC

⚙️ How to Run

Setup

pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
notebooks		notebooks
src		src
tests		tests
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Titanic ML Pipeline

Project Overview

Dataset Description

Pipeline Summary

⚙️ How to Run

Setup

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Titanic ML Pipeline

Project Overview

Dataset Description

Pipeline Summary

⚙️ How to Run

Setup

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages