- Clone this repository on your local machine and create a branch like this:
feature-[your_first_name]-starbucks-data-models - Set up your local enviroment with Docker or Python virtual environment.
- Solve the exercises provided during the classroom sessions by committing your changes.
- Once all exercises are completed, create a pull request to this repository. Please follow the instructions.
- Ensure your code passes the automated tests.
Firstly, ensure you have Docker installed and running.
Ensure you have Python 3.12 installed in your machine.
- Install pre-commit hooks by running the following command:
pre-commit install
- Run pre-commit hooks against all the files
pre-commit run --all-files
- Rename the
.env_editfile to.envand fill up with the required information (available onstarbucks_dw/folder):
-
GCLOUD_CREDENTIALS_FILEPATH: pointing to the path ofgclouddefault credentials. Usually it's available on~/.config/gcloud/application_default_credentials.json -
GCP_PROJECT_ID: BigQuery project identifier. It will be provided during the onboarding session.
- Create a new environment variable called
SCHEMA_PREFIXand set it to your first name. This variable will be used to add a prefix to BigQuery dataset. Make sure you store this environment variable in~/.bashrcor~/.zshrc(for MacOs users).
export SCHEMA_PREFIX='your_first_name'
Navigate to starbucks_dw folder and follow the instructions.
- Build and start the container:
docker compose up --build -d
- Check if the container is running:
docker ps
- Run container in interactive mode with a bash terminal:
docker compose exec dbt bash
- Now you are in an interactive terminal where you can run dbt commands. Run the following command to check if dbt is running properply:
dbt debug
- To close the terminal, type:
exit
- To stop the dbt container, run the following command:
docker compose stop dbt
Ensure you have Python 3.12 installed in your machine.
-
Install
gcloudon your computer. Please, follow this guide for it. -
Activate
gcloudauthentication via terminal by running the following command:
gcloud auth application-default login
Make sure you store all the environment variables in ~/.bashrc or ~/.zshrc (for MacOs users).
- Create a new environment variable called
PROJECT_IDand set it to the BigQuery project identifier:
export PROJECT_ID="data-eng-dev-xxxx"
- Create a new environment variable called
SCHEMA_PREFIXand set it to your first name. This variable will be used to add a prefix to BigQuery dataset.
export SCHEMA_PREFIX='your_first_name'
- Create a Python virtual environment:
python3 -m venv venv
pip instal -U pip
- Install Python dependencies:
pip install -r requirements.txt
- Activate the virtual environment. Make sure you always have the virtual environment activated when running dbt.
source ./venv/bin/activate
- Navigate to
starbucks_dwfolder and run the following command to check if dbt is running properply. All the dbt commands must be executed understarbucks_dwfolder.
dbt debug --profiles-dir .
- Install pre-commit hooks by running the following command:
pre-commit install
- Run pre-commit hooks against all the files
pre-commit run --all-files
- Your branch must follow pattern:
feature-[your_first_name]-starbucks-data-models. - Change
schema_prefixvariable indbt_project.ymlfile and set it to the same you used on your environment variableSCHEMA_PREFIX:
Example: if your variable is equal to export SCHEMA_PREFIX='your_first_name, then in dbt_project.yml file you must have a similar thing:
vars:
schema_prefix: "your_first_name"
This will ensure your models be materialised in your personal dataset in BigQuery.
- Commit all the changes.
- Push your branch to remote repository.
- Create a pull request in Github. If you are not sure how to do it, please follow this guide.