- GitHub
- Creation of the Database
- Docker Desktop
Windows:
python3 -m venv venv
source venv/bin/Activate.ps1Install dbt for Snowflake
python -m pip install dbt-core dbt-snowflakeCreate requirements.txt file:
pip freeze -l > requirements.txtCheck if we have dbt
dbt --versiondbt initWe can us ENV variables to avoid putting password into profile.
Finally, we will move all content from dbtworkshop into the root directory.
We can test that dbt is working with `dbt
dbt run
Review folder logs
Let's delete the models and create new models for an example.
Create folder and 3 files: .sql + .yml for each model
after that:
And modify the dbt_project.yaml
Generic tests are simple:
data_tests:
- unique
- not_null
data_tests:
- accepted_values:
values:
- completed
- shipped
- returned
- return_pending
- placedWe can also add a Singular test as an example:
`tests/fct_orders_negative_discount_amount.sql
--If no discount is given it should be equal to zero
--Otherwise it will be negative
select *
from {{ ref('fct_orders') }}
where item_discount_amount > 0As a result we have 3 models and their tests.
We can run models with
dbt run --select tag:stagingAnd test:
dbt test --select tag:stagingFor source models we can add a freshness test base on specific timestamp column
version: 2
models:
- name: _tpch_sources.yml
description: This model cleans up order data
loaded_at_field: _etl_loaded_at
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
## Adding Pre-Commit
We can add Pre-Commit File to check local quality of code
Install the `pre-commit` package
```bash
pip install pre-commitCreate file .pre-commit-config.yaml
Activate pre-commit
pre-commit installAdd into the .gitignore folder with VENV venv/.
Let's create actual branch for development work (usually it starts before anything else)
git checkout -b feature/new-dbt-modelgit add .
git commit -m "Adding initial models"It will test everything and commit.
We also would like to add a template for PR. Create a new folder .github and inside a new file pull_request_template.md
We want to make sure same checks run in GitHub.
Let's create a GitHub Action to run it
Add new file .github/workflows/dbt-ci-jobs.yml
name: Run Pre-commit Hooks
on:
pull_request:
branches:
- main
push:
branches:
- main
jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
# Checkout the code and fetch all branches
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0 # Ensure the full history is fetched, not just the last commit
# Set up Python environment
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.11
# Install dependencies
- name: Install dependencies
run: |
python -m venv venv
. venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
pre-commit install --install-hooks
# Fetch the main branch to ensure it's available for comparison
- name: Fetch main branch
run: git fetch origin main
# Run pre-commit on all files changed between the current branch and main
- name: Run pre-commit on all changed files
run: |
. venv/bin/activate
# Get the list of files changed between the current branch and main
files=$(git diff --name-only origin/main)
if [ -n "$files" ]; then
pre-commit run --files $files
else
echo "No modified files to check."
fiWe leveraged dbt package: dbt_utils and we need to add package and install.
Add file packages.yml
packages:
- package: dbt-labs/dbt_utils
version: 1.3.0
- package: calogica/dbt_expectations
version: 0.10.4
- package: calogica/dbt_date
version: 0.10.1And install packages
dbt depsWe also need to update dbt_project.yml file:
models:
dbtworkshop:
# Config indicated by + and applies to all files under models/example/
staging:
+materialized: view
+tags: staging
mart:
+materialized: view
+tags: martWe can run the models
dbt build --select tag:martWhen we run dbt we can specify target env and profile path.
dbt build --select fct_orders --target dev --profiles-dir ~/.dbt/Add file with macro macros/add_etl_timestamp.sql
{% macro add_etl_timestamp() %}
CURRENT_TIMESTAMP as etl_timestamp_utc
{% endmacro %}And we can add this marco into existing models:
dbt docs generate
dbt docs serve --host localhost --port 8091Open in browser http://localhost:8091/#!/overview.
We need to update our GitHub actions to run dbt checks because it will require to run dbt and connect database.
DBT checkpoint is checking the dbt artifacts, we should make sure it can run and compile dbt.
For this we will add new profile profiles.yaml
default:
outputs:
dev:
type: snowflake
host: '{{ env_var("DBT_HOST") }}'
user: '{{ env_var("DBT_USER") }}'
password: '{{ env_var("DBT_PASSWORD") }}'
port: 5432
dbname: '{{ env_var("DBT_DATABASE") }}'
schema: dev
target: devIn GitHub we can assign values for these env variables in GitHub Repository -> Settings.
We can modify our GitHub Actions to include the dbt complie part:
name: Run Pre-commit Hooks with Postgres
on:
pull_request:
branches:
- main
push:
branches:
- main
jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
# Checkout the code
- name: Checkout code
uses: actions/checkout@v3
with:
fetch-depth: 0
# Set up Python environment
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.11
# Install dependencies
- name: Install dependencies
run: |
python -m venv venv
. venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
# Install dbt package dependencies
- name: Install dbt package dependencies
run: |
. venv/bin/activate
dbt deps
# Copy the profiles.yml file to the correct location
- name: Copy profiles.yml to /home/runner/.dbt
run: |
mkdir -p /home/runner/.dbt
cp ./profiles.yml /home/runner/.dbt/profiles.yml
# Compile the dbt project
- name: Compile dbt project
run: |
. venv/bin/activate
dbt compile
env:
DBT_USER: ${{ secrets.DBT_USER }}
DBT_PASSWORD: ${{ secrets.DBT_PASSWORD }}
DBT_HOST: ${{ secrets.DBT_HOST }}
DBT_DATABASE: ${{ secrets.DBT_DATABASE }}
# Run pre-commit hooks
- name: Run pre-commit on all changed files
run: |
. venv/bin/activate
files=$(git diff --name-only origin/main)
if [ -n "$files" ]; then
pre-commit run --files $files
else
echo "No modified files to check."
fiTo make is pass, lets delete the singular test.
We can also add one more step to run dbt models in another Schema CI.
Let's add the schema CI into Postgres
create schema if not exists ci;We should add one more profile:
dbtworkshop:
outputs:
dev:
type: postgres
host: '{{ env_var("DBT_HOST") }}'
user: '{{ env_var("DBT_USER") }}'
password: '{{ env_var("DBT_PASSWORD") }}'
port: 5432
dbname: '{{ env_var("DBT_DATABASE") }}'
schema: dev
ci:
type: postgres
host: '{{ env_var("DBT_HOST") }}'
user: '{{ env_var("DBT_USER") }}'
password: '{{ env_var("DBT_PASSWORD") }}'
port: 5432
dbname: '{{ env_var("DBT_DATABASE") }}'
schema: ci_schema
target: dev