This repository contains a complete data science pipeline for predicting food delivery times using Linear Regression. The project explores operational, temporal, and order-based features to build an accurate, interpretable model aimed at enhancing customer satisfaction and optimizing fleet management for Porter.
This notebook-based project is built on real-world food delivery data. It follows a structured machine learning workflow that includes:
- Data loading and cleaning
- Feature engineering (e.g., time features, one-hot encoding)
- Exploratory Data Analysis (EDA)
- Outlier handling and normalization
- Linear regression modeling and performance evaluation
- Business-focused insights and recommendations
The objective is to predict time_taken (in minutes) for food delivery based on various features like order size, distance, number of dashers, and time of day.
A full ML pipeline is implemented using Python and libraries such as pandas, matplotlib, seaborn, and sklearn.
Created features like:
hour,day_of_week,isWeekend- Dummy variables for
store_primary_categoryandorder_protocol - Target variable:
time_takenfrom order timestamps
- Distribution plots, boxplots, scatter plots
- Correlation heatmap
- Insights on peak hours, order complexity, and delivery patterns
- Used IQR-based clipping to retain most data while handling extreme values
- Trained a Linear Regression model with R² ≈ 0.86
- Performed Recursive Feature Elimination (RFE) to identify top predictors
- Conducted residual analysis to validate model assumptions
- Emphasize order size in ETA estimates
- Dynamically allocate delivery partners during peak hours
- Use complexity-based pricing for better revenue planning
- Improve restaurant operations for large orders
- Histograms & KDE plots for numeric features
- Boxplots for outlier detection
- Scatter plots for feature-target relationships
- Correlation matrix heatmap
- Residual diagnostics (Q-Q plots, residual vs. predicted)
- Feature importance bar charts
- Distance and price features are on natural scales (e.g., kilometers, INR).
- Timestamp conversions are in the dataset's original timezone.
- No rows were deleted—extreme values were clipped, not removed.
LR_Delivery_Time_Estimation_Starter_Manish_Atwal.ipynb– Jupyter Notebook with full implementationLR_Delivery_Time_Estimation_Starter_Manish_Atwal.pdf– Final report with executive summary, EDA, and business insights
This project is inspired by real-world delivery optimization use cases and follows standard machine learning and EDA methodologies.
Manish Atwal