Skip to content

Data Mining Project Searching for Influencing Factors in Motor Vehicle Collision Fatalities in NYC

Notifications You must be signed in to change notification settings

swish0621/MVFatalities

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

12 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Influencing Factors in Motor Vehicle Fatalities in NYC

๐Ÿ”Ž Description

This project aims to discover the factors that influence the likelihood of fatality in motor vehicle crashes in NYC. The data used to conduct this analysis was the NYC Motor Vehicle Collisions dataset, which is a collection of police crash reports in NYC ranging from 2012-2025 that involved injury, death or at least $1,000 in damages.

After preprocessing and feature engineering, a binary target FATAL is defined (1 if a crash involved at least one death, else 0). The analysis includes:

  • Pearson correlation to rank factor associations with fatalities
  • Relative risk analysis to measure how much each top factor increases fatality likelihood
  • Temporal aggregation to examine trends over time and seasonality
  • Logistic regression classification to test whether top factors meaningfully predict fatal crashes

โ“ Research Questions

  • Which factors are most strongly associated with fatal outcomes in crashes in NYC?
  • How much do top factors increase the likelihood of a fatal crash compared to crashes without it?
  • Do the highest risk factors show meaningful temporal or seasonal patterns?
  • Do the same factors remain important when used to predict fatalities in a supervised model?

๐Ÿ“ Link to Paper

Influencing Factors in Motor Vehicle Fatalities in NYC

๐Ÿ“น Link to Presentation

Influencing Factors in Motor Vehicle Fatalities in NYC

๐Ÿ› ๏ธ Setup Instructions

git clone https://github.com/swish0621/MVFatalities.git
cd MVFatalities 

# create and activate virtual environment
python -m venv venv
source venv/bin/activate    # macOS / Linux
# venv\Scripts\activate     # Windows

# install dependencies
pip install -r requirements.txt

๐Ÿ—‚๏ธ Data Source

This project uses the NYC Motor Vehicle Collisions dataset.

Download it manually from: https://data.cityofnewyork.us/Public-Safety/Motor-Vehicle-Collisions-Crashes/h9gi-nx95

Place the CSV in the root directory before running and ensure it is named:

"Motor_Vehicle_Collisions_-_Crashes.csv"

โ–ถ๏ธ Run Analysis

python -m project 

For faster processing uncomment the line in load.py (may affect results)

# df = df.head(10000)

About

Data Mining Project Searching for Influencing Factors in Motor Vehicle Collision Fatalities in NYC

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages