The Toronto Transit Commission (TTC) serves as a lifeline for millions of daily commuters across the city's subway, streetcar, and bus networks. However, recurring delays across various routes pose significant challenges to its reliability, which undermines public confidence and complicates efforts to promote sustainable transportation. This project addresses these issues by analyzing TTC subway delay data, building a predictive machine learning model, and providing an easy-to-use web application for real-time predictions.
The project consists of three parts:
- Exploratory Data Analysis (EDA): Visualizing key trends in TTC subway delays to identify patterns and gain business insights.
- Machine Learning Model: Using a Random Forest algorithm to predict delays based on factors like starting point, destination, and departure time.
- Web Application: A user-friendly interface that integrates the machine learning model, allowing users to input travel information and receive delay predictions.
- Authors/Contributors
- Motivation
- Problem Statement
- Features
- Technical Implementation
- Setup and Installation
- Usage Guide
- ML Model
- License
- Feedback and Contributions
- Credits and Acknowledgements
- Zhe Wang
- Virat Talan
- Daivi Shah
- Euichan Kim
Public transit delays can undermine the effectiveness of urban transportation systems, leading to frustration and decreased usage. By forecasting delays, commuters can make more informed decisions, and the TTC can improve service reliability. This project aims to provide actionable insights that contribute to enhancing the TTC’s operational efficiency and promoting sustainable transportation alternatives.
TTC’s subway network faces persistent delays that affect commuters' daily routines. Addressing these delays through predictive analytics can improve the transit experience and support the city's long-term goals of reducing carbon emissions. This project aims to develop a model that predicts delays based on a variety of factors, improving both operational efficiency and commuter satisfaction.
- Interactive Web Application: Allows users to input travel details (e.g., starting point, destination, departure time) and receive a delay prediction.
- Real-Time Predictions: Get estimated delays and probabilities of delays based on historical data.
- Comprehensive EDA: Visualizations showcasing key trends and patterns in subway delays.
- Machine Learning Model: A Random Forest model for delay prediction with an accuracy of 73%.
- Frontend: HTML, CSS, JavaScript, React, Tailwind CSS
- Backend: Flask
- Machine Learning: Random Forest, Scikit-learn, Pandas
- Database: MySQL (for storing delay data)
- Deployment: Vercel for the web application, GitHub for version control
- Pandas: Data manipulation and analysis.
- Scikit-learn: Machine learning library for building and training the model.
- NumPy: Library for numerical computations.
- Plotly: Interactive graphing library for creating visualizations.
- Matplotlib: Plotting library for static, animated, and interactive visualizations.
-
Clone the repository:
git clone https://github.com/your-username/ttc-delay-prediction.git cd ttc-delay-prediction -
Install required dependencies:
pip install -r requirements.txt
-
Run the backend server:
python app.py
-
Access the web app in your browser at
http://127.0.0.1:5000.
Navigate to the website: https://uoft-datathon.vercel.app/

Enter your travel details (starting point, destination, and departure time) in the specified format (ALL CAPITAL).
Click on Submit to view the predicted delay and probability.

We trained a Random Forest model using the scikit-learn library. The model predicts whether a delay is expected, based on input parameters like starting point, destination, and departure time.
- The model was evaluated using a validation set of 4800 samples, achieving an accuracy of 73%.
This project is licensed under the MIT License. See the LICENSE file for more details.
This project was part of a SDSS datathon competition and is no longer accepting contributions. However, feedback is welcome! If you encounter any issues or have suggestions, feel free to open an issue on GitHub.
- Special thanks to SDSS and Toronto Transit Commission (TTC) for providing the data used in this project.