GitHub - paty-oliveira/edit-analytics-eng-bigquery: Repository for Analytics Engineering Module - EDIT academy

How to start

Clone this repository on your local machine and create a branch like this: feature-[your_first_name]-starbucks-data-models
Set up your local enviroment with Docker or Python virtual environment.
Solve the exercises provided during the classroom sessions by committing your changes.
Once all exercises are completed, create a pull request to this repository. Please follow the instructions.
Ensure your code passes the automated tests.

Setup the enviroment with Docker

Firstly, ensure you have Docker installed and running.

Pre-commit hooks

Ensure you have Python 3.12 installed in your machine.

Install pre-commit hooks by running the following command:

pre-commit install

Run pre-commit hooks against all the files

pre-commit run --all-files

Add environment variables

Rename the .env_edit file to .env and fill up with the required information (available on starbucks_dw/ folder):

GCLOUD_CREDENTIALS_FILEPATH: pointing to the path of gcloud default credentials. Usually it's available on ~/.config/gcloud/application_default_credentials.json
GCP_PROJECT_ID: BigQuery project identifier. It will be provided during the onboarding session.

Create a new environment variable called SCHEMA_PREFIX and set it to your first name. This variable will be used to add a prefix to BigQuery dataset. Make sure you store this environment variable in ~/.bashrc or ~/.zshrc (for MacOs users).

export SCHEMA_PREFIX='your_first_name'

Build and start Docker container

Navigate to starbucks_dw folder and follow the instructions.

Build and start the container:

docker compose up --build -d

Check if the container is running:

docker ps

Run container in interactive mode with a bash terminal:

docker compose exec dbt bash

Now you are in an interactive terminal where you can run dbt commands. Run the following command to check if dbt is running properply:

dbt debug

To close the terminal, type:

exit

To stop the dbt container, run the following command:

docker compose stop dbt

Setup the enviroment with Python virtual environment

Ensure you have Python 3.12 installed in your machine.

Google Cloud CLI

Install gcloud on your computer. Please, follow this guide for it.
Activate gcloud authentication via terminal by running the following command:

gcloud auth application-default login

Add environment variables

Make sure you store all the environment variables in ~/.bashrc or ~/.zshrc (for MacOs users).

Create a new environment variable called PROJECT_ID and set it to the BigQuery project identifier:

export PROJECT_ID="data-eng-dev-xxxx"

Create a new environment variable called SCHEMA_PREFIX and set it to your first name. This variable will be used to add a prefix to BigQuery dataset.

export SCHEMA_PREFIX='your_first_name'

Create Python virtual environment

Create a Python virtual environment:

python3 -m venv venv
pip instal -U pip

Install Python dependencies:

pip install -r requirements.txt

Activate the virtual environment. Make sure you always have the virtual environment activated when running dbt.

source ./venv/bin/activate

Navigate to starbucks_dw folder and run the following command to check if dbt is running properply. All the dbt commands must be executed under starbucks_dw folder.

dbt debug --profiles-dir .

Install pre-commit hooks by running the following command:

pre-commit install

Run pre-commit hooks against all the files

pre-commit run --all-files

Pull Request

Your branch must follow pattern: feature-[your_first_name]-starbucks-data-models.
Change schema_prefix variable in dbt_project.yml file and set it to the same you used on your environment variable SCHEMA_PREFIX:

Example: if your variable is equal to export SCHEMA_PREFIX='your_first_name, then in dbt_project.yml file you must have a similar thing:

vars:
  schema_prefix: "your_first_name"

This will ensure your models be materialised in your personal dataset in BigQuery.

Commit all the changes.
Push your branch to remote repository.
Create a pull request in Github. If you are not sure how to do it, please follow this guide.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github/workflows		.github/workflows
scripts		scripts
starbucks_dw		starbucks_dw
.dbt-checkpoint.yaml		.dbt-checkpoint.yaml
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How to start

Setup the enviroment with Docker

Pre-commit hooks

Add environment variables

Build and start Docker container

Setup the enviroment with Python virtual environment

Google Cloud CLI

Add environment variables

Create Python virtual environment

Pull Request

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

How to start

Setup the enviroment with Docker

Pre-commit hooks

Add environment variables

Build and start Docker container

Setup the enviroment with Python virtual environment

Google Cloud CLI

Add environment variables

Create Python virtual environment

Pull Request

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages