Dashboards for Ferrari team: 2023 season results

Summary

Create virtual env
Read data from PostgreSQL to Pandas dataframe
Transfer data from PostgreSQL to Snowflake
Connect dbt and Snowflake
Build dbt data pipelines
Establish data quality tests

Create virtual env

pip install virtualenv
virtualenv f1dbt
f1dbt\Scripts\activate

Read data from PostgreSQL to Pandas dataframe

I have 2 tables in PostgreSQL. To read it in Pandas dataframe - execute step 1 from migration.ipynb

Transfer data from PostgreSQL to Snowflake

First of all I created database, scema and dbt user in Snowflake

-- Use an admin role
USE ROLE ACCOUNTADMIN;

-- Create the `transform` role
CREATE ROLE IF NOT EXISTS transform;
GRANT ROLE TRANSFORM TO ROLE ACCOUNTADMIN;

-- Create the default warehouse if necessary
CREATE WAREHOUSE IF NOT EXISTS COMPUTE_WH;
GRANT OPERATE ON WAREHOUSE COMPUTE_WH TO ROLE TRANSFORM;

-- Create the `dbt` user and assign to role
CREATE USER IF NOT EXISTS dbt
  PASSWORD='f1dbt'
  LOGIN_NAME='hannaf1'
  MUST_CHANGE_PASSWORD=FALSE
  DEFAULT_WAREHOUSE='COMPUTE_WH'
  DEFAULT_ROLE='transform'
  DEFAULT_NAMESPACE='F1.RAW'
  COMMENT='DBT user used for data transformation';
GRANT ROLE transform to USER dbt;

-- Create our database and schemas
CREATE DATABASE IF NOT EXISTS F1;
CREATE SCHEMA IF NOT EXISTS F1.RAW;

-- Set up permissions to role `transform`
GRANT ALL ON WAREHOUSE COMPUTE_WH TO ROLE transform;
GRANT ALL ON DATABASE F1 to ROLE transform;
GRANT ALL ON ALL SCHEMAS IN DATABASE F1 to ROLE transform;
GRANT ALL ON FUTURE SCHEMAS IN DATABASE F1 to ROLE transform;
GRANT ALL ON ALL TABLES IN SCHEMA F1.RAW to ROLE transform;
GRANT ALL ON FUTURE TABLES IN SCHEMA F1.RAW to ROLE transform;

To Migrate datasets from PostgreSQL to Snowflake - complete step 2 migration.ipynb

Connect dbt and Snowflake

pip install dbt-snowflake==1.7.2
dbt init dbt_f1

account: lkhpmcc-cn69015
password: f1dbt
role: transform
warehouse: COMPUTE_WH
database: F1
schema: RAW
threads: 1
user: hannaf1 (in profiles.yml)

cd dbt_f1
dbt debug

Build dbt data pipelines

Created separate folders for models to represent logical levels

mkdir models\staging
mkdir models\transform
mkdir models\mart

Modified dbt_project.yml file to reflect model structur. Staging layer is having view materialization, while transform and mart - table

models:
  dbt_hol:
      # Applies to all files under models/example/
      staging:
          schema: staging
          materialized: view
      transform:
          schema: transform
          materialized: table
      mart:
          schema: mart
          materialized: mart

- Staging layer. In this layer I slightly modified rresults and qresults by making column names more meaningful. Executed dbt run.

- Transformed layer. In this step I changed data types for columns, replaced letter designations for drivers who didn't start/finish the race for 20 in points, null in time columns. Queries for cleansed_rresults and cleansed_qresults. Executed dbt run.

- Seeds. Uploaded reces orders to seeds folder. Seeds are usually small datasets in csv format (ex. catalogs). Executed dbt seed.

- Mart layer. This layer will be accessible for BI tools. I created two files mart_rresults and mart_qresults, where join race order with transformed table.

Running test

- Generic tests. In schema.yml added generic test 'not null' for track country and deriver number columns (cleansed_qresults and cleansed_rresults).

- Singular tests. In tests folder created files to check position columns for values (from 1 to 20). Executed dbt test.

Analyses

In analyses folder created sql files to answer the questions:

Average position of every racer during the championship; total points, the share of points contributed by the team's lead driver - average_in_championship

Driver's championship points for each race; a driver's position in the championship for each race (Ferrari team) - driver_points

Vizualization

Connect to Power BI

server: lkhpmcc-cn69015.snowflakecomputing.com
warehouse: COMPUTE_WH
user name: hannaf1
password: f1dbt

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.vscode		.vscode
dbt_f1		dbt_f1
images		images
logs		logs
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
f1.pbix		f1.pbix
migration.ipynb		migration.ipynb
qualifyingResults2023.csv		qualifyingResults2023.csv
raceResults2023.csv		raceResults2023.csv
racesOrder2023.csv		racesOrder2023.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dashboards for Ferrari team: 2023 season results

Summary

Create virtual env

Read data from PostgreSQL to Pandas dataframe

Transfer data from PostgreSQL to Snowflake

Connect dbt and Snowflake

Build dbt data pipelines

Running test

Analyses

Vizualization

Connect to Power BI

About

Uh oh!

Releases

Packages

Languages

HannaStselmashok/f1_dbt

Folders and files

Latest commit

History

Repository files navigation

Dashboards for Ferrari team: 2023 season results

Summary

Create virtual env

Read data from PostgreSQL to Pandas dataframe

Transfer data from PostgreSQL to Snowflake

Connect dbt and Snowflake

Build dbt data pipelines

Running test

Analyses

Vizualization

Connect to Power BI

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages