🚢 Titanic Survival Prediction using Machine Learning Pipeline

This project demonstrates the use of Machine Learning Pipelines in Python using scikit-learn to predict survival on the Titanic dataset. The pipeline handles all preprocessing steps and applies a classifier in a clean, reproducible way.

📁 Project Structure

├── Machine pipeline.ipynb             # Main notebook with model building pipeline
├── predict using pipeline.ipynb       # Notebook to use trained pipeline for predictions
├── pipe.pkl                           # Trained pipeline saved as a pickle file
├── README.md                          # Project documentation

🚀 Features

Complete ML pipeline including preprocessing and model training
Handling missing values and categorical encoding
Pipeline serialization using joblib
Inference using the saved pipeline
Simple and extendable structure

📦 Requirements

Install dependencies using:

pip install -r requirements.txt

You’ll need:

scikit-learn
pandas
numpy
joblib
matplotlib (optional for visualizations)

📊 Dataset

The dataset used is the classic Titanic dataset.
It includes features such as Pclass, Sex, Age, Fare, and survival labels (Survived).

🧠 Model Pipeline

The pipeline includes the following steps:

Imputation: Filling missing values (e.g., age, embarked).
Encoding: Converting categorical variables (Sex, Embarked) using OneHotEncoding.
Feature Scaling: StandardScaler for numeric features.
Feature Selection: (Optional) using SelectKBest.
Classification: Using RandomForestClassifier.

🛠 How to Use

Train the model: Open Machine pipeline.ipynb and run all cells. This notebook creates the pipeline, trains it, and saves it to pipe.pkl.
Predict using the saved model: Open predict using pipeline.ipynb to load the trained model and make predictions on new or test data.

🔍 Example Prediction

import joblib
import pandas as pd

pipe = joblib.load("pipe.pkl")
new_data = pd.DataFrame([{
    "Pclass": 3,
    "Sex": "male",
    "Age": 22,
    "Parch": 0,
    "Embarked": "S"
}])
prediction = pipe.predict(new_data)
print("Survived" if prediction[0] == 1 else "Did not survive")

📚 Learn More

🙌 Acknowledgements

Kaggle for the dataset.
scikit-learn for the pipeline and modeling tools.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Machine pipleline.ipynb		Machine pipleline.ipynb
README.md		README.md
pipe.pkl		pipe.pkl
pridict using pipeline.ipynb		pridict using pipeline.ipynb
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚢 Titanic Survival Prediction using Machine Learning Pipeline

📁 Project Structure

🚀 Features

📦 Requirements

📊 Dataset

🧠 Model Pipeline

🛠 How to Use

🔍 Example Prediction

📚 Learn More

🙌 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

atul2501/titanic_train

Folders and files

Latest commit

History

Repository files navigation

🚢 Titanic Survival Prediction using Machine Learning Pipeline

📁 Project Structure

🚀 Features

📦 Requirements

📊 Dataset

🧠 Model Pipeline

🛠 How to Use

🔍 Example Prediction

📚 Learn More

🙌 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages