Road traffic injuries in France 🚘 🤕 🚑

Project description

Road accidents are a major public safety concern, causing injuries, fatalities, and significant economic losses. This project aims to analyze annual datasets of traffic accidents in France, sourced from data.gouv.fr. By leveraging data engineering tools, we will process, clean, and integrate these datasets to uncover key insights, such as accident hotspots, trends over time, and the impact of factors like speed limits, day of the week and season.

Disclaimer: This analysis focuses on data from 2021 to 2023, as earlier datasets have a different CSV format that would require additional adaptations.

Preview

Link to the dashboard

Technologies Used

Infrastructure as Code (IaC): Terraform
Cloud Platform: GCP (Google Cloud Platform)
Data Lake: GCS Bucket
Data Warehouse: BigQuery
Data Transformation: dbt
Dashboard Tool: Looker Studio
Containerization: Docker
Workflow Orchestration: Kestra

Data pipeline

The basic data are retrieved from data.gouv.fr. They include four csv files for each year:

details (caractéristiques in french)
users (usagers in french)
vehicles (véhicules in french)
places (lieux in french)

Kestra creates four main BigQuery tables (details.sql, users.sql, vehicles.sql, places.sql) and four related staging tables. Staging tables are used to ensure data consistency and allow incremental updates before merging with the main tables.

Each csv file from data.gouv.fr is uploaded by Kestra to a Google Cloud bucket. It is then transformed into a BigQuery staging table. The staging table is then merged with the main table, and truncated.

Then, with dbt, we perform a number of clean-ups and improvements:

keep only data from metropolitan France
rework the format of dates and identifier fields
create an actual geospatial POINT field,
transform key values into meaningful names via macros (gravity_description, gender_description etc).

We also add 2 seeds, still supplied by data.gouv.fr:

french_departments.csv: mapping between the codes and the names of the French departments
french_postal_codes.csv: mapping between the INSEE code of a municipality and its name

Finally, we create a fact_accidents.sql table which aggregates the data for a given accident as follows:

number of users involved
number of users by severity of injury (four categories: uninjured, slightly injured, hospitalized and killed)
number of vehicles involved
city name
department name
maximum speed at accident location

Running the project

Installation and Setup

Possible Improvements

Older datasets (2005–2020) follow a slightly different schema, which requires custom parsing and mapping to align with the current data model.

Acknowledgements

This project was completed as part of Data Engineering Zoomcamp - a free course organized by the DataTalks.Club community.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
dbt		dbt
docs		docs
images		images
kestra		kestra
terraform		terraform
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Road traffic injuries in France 🚘 🤕 🚑

Project description

Preview

Technologies Used

Data pipeline

Running the project

Possible Improvements

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Road traffic injuries in France 🚘 🤕 🚑

Project description

Preview

Technologies Used

Data pipeline

Running the project

Possible Improvements

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages