Skip to content

atul2501/titanic_train

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🚒 Titanic Survival Prediction using Machine Learning Pipeline

This project demonstrates the use of Machine Learning Pipelines in Python using scikit-learn to predict survival on the Titanic dataset. The pipeline handles all preprocessing steps and applies a classifier in a clean, reproducible way.


πŸ“ Project Structure

β”œβ”€β”€ Machine pipeline.ipynb             # Main notebook with model building pipeline
β”œβ”€β”€ predict using pipeline.ipynb       # Notebook to use trained pipeline for predictions
β”œβ”€β”€ pipe.pkl                           # Trained pipeline saved as a pickle file
β”œβ”€β”€ README.md                          # Project documentation

πŸš€ Features

  • Complete ML pipeline including preprocessing and model training
  • Handling missing values and categorical encoding
  • Pipeline serialization using joblib
  • Inference using the saved pipeline
  • Simple and extendable structure

πŸ“¦ Requirements

Install dependencies using:

pip install -r requirements.txt

You’ll need:

  • scikit-learn
  • pandas
  • numpy
  • joblib
  • matplotlib (optional for visualizations)

πŸ“Š Dataset

The dataset used is the classic Titanic dataset.
It includes features such as Pclass, Sex, Age, Fare, and survival labels (Survived).


🧠 Model Pipeline

The pipeline includes the following steps:

  1. Imputation: Filling missing values (e.g., age, embarked).
  2. Encoding: Converting categorical variables (Sex, Embarked) using OneHotEncoding.
  3. Feature Scaling: StandardScaler for numeric features.
  4. Feature Selection: (Optional) using SelectKBest.
  5. Classification: Using RandomForestClassifier.

πŸ›  How to Use

  1. Train the model: Open Machine pipeline.ipynb and run all cells. This notebook creates the pipeline, trains it, and saves it to pipe.pkl.

  2. Predict using the saved model: Open predict using pipeline.ipynb to load the trained model and make predictions on new or test data.


πŸ” Example Prediction

import joblib
import pandas as pd

pipe = joblib.load("pipe.pkl")
new_data = pd.DataFrame([{
    "Pclass": 3,
    "Sex": "male",
    "Age": 22,
    "Parch": 0,
    "Embarked": "S"
}])
prediction = pipe.predict(new_data)
print("Survived" if prediction[0] == 1 else "Did not survive")

πŸ“š Learn More


πŸ™Œ Acknowledgements

  • Kaggle for the dataset.
  • scikit-learn for the pipeline and modeling tools.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published