Skip to content

tikalk/data-platform-dagster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CNE Dagster Template

A comprehensive data engineering template that integrates Dagster for orchestration with dbt for data transformation, providing a robust foundation for modern data pipelines.

πŸ—οΈ Architecture Overview

This template consists of three main components:

  1. Dagster Orchestration Layer (cne_dagster/) - Asset-based pipeline orchestration
  2. dbt Transformation Layer (cne-dbt-template/) - SQL-based data transformations
  3. Custom CLI Tool - Enhanced dbt workflow management with validation and automation

Key Features

  • βœ… Dagster + dbt Integration: Seamless orchestration of dbt models as Dagster assets
  • βœ… Multi-Cloud Support: BigQuery and Snowflake connectors
  • βœ… Custom CLI: Enhanced dbt workflows with validation and automation
  • βœ… Data Quality: Built-in testing with dbt-expectations and Elementary
  • βœ… Code Quality: Pre-commit hooks, SQL formatting, and validation
  • βœ… Docker Support: Containerized deployment ready
  • βœ… Task Automation: Go-task based workflow automation

πŸ“‹ Prerequisites

Before setting up the project, ensure you have:

  • Python 3.12+ (recommended 3.13)
  • Go-task (Installation Guide)
  • Git and GitHub CLI (optional but recommended)
  • Docker (for containerized deployment)
  • Access to BigQuery or Snowflake data warehouse

πŸš€ Quick Start

1. Environment Setup

Clone the repository and set up your development environment:

# Clone the repository
git clone <repository-url>
cd cne-dagster-template

# Option A: Automated setup (recommended)
task setup-env

# Option B: Manual setup
# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate
uv pip install -e .

2. Configuration

Copy the environment template and configure your settings:

cp cne-dbt-template/.env_example cne-dbt-template/.env

Edit .env with your warehouse connection details:

# BigQuery Configuration
BIGQUERY_ACCOUNT=your-project-id
BIGQUERY_DATABASE=your-dataset
BIGQUERY_KEYFILE_PATH=path/to/service-account.json
TARGET_NAME=dev

# Organization Settings
ORG_ID=your-org-id

3. Verify Setup

Test your configuration:

# Verify environment setup
task test-setup

# Test dbt connection
task dbt:debug

# Launch CLI interface
task cli

4. Run Dagster

Start the Dagster web interface:

# Development mode
cd cne_dagster
dagster dev

# Or using Docker
docker build -t cne-dagster-template .
docker run -p 3000:3000 cne-dagster-template

Access Dagster UI at http://localhost:3000

πŸ—οΈ Project Structure

cne-dagster-template/
β”œβ”€β”€ cne_dagster/                 # Dagster orchestration layer
β”‚   β”œβ”€β”€ cne_dagster/
β”‚   β”‚   β”œβ”€β”€ assets.py           # dbt assets definition
β”‚   β”‚   β”œβ”€β”€ definitions.py      # Dagster definitions
β”‚   β”‚   β”œβ”€β”€ project.py          # dbt project configuration
β”‚   β”‚   └── schedules.py        # Pipeline schedules
β”‚   └── pyproject.toml          # Dagster dependencies
β”œβ”€β”€ cne-dbt-template/           # dbt transformation layer
β”‚   β”œβ”€β”€ models/                 # dbt models (staging, marts)
β”‚   β”œβ”€β”€ macros/                 # Reusable SQL macros
β”‚   β”œβ”€β”€ tests/                  # Data quality tests
β”‚   β”œβ”€β”€ cli/                    # Custom CLI tool
β”‚   β”‚   β”œβ”€β”€ commands/           # CLI command implementations
β”‚   β”‚   β”œβ”€β”€ utils/              # Utility functions
β”‚   β”‚   └── validate/           # Validation plugins
β”‚   β”œβ”€β”€ dbt_project.yml         # dbt project configuration
β”‚   └── Taskfile.yml            # Task automation
β”œβ”€β”€ Dockerfile                  # Container configuration
└── pyproject.toml             # Root project dependencies

πŸ’» Usage

Dagster Operations

# Start Dagster development server
cd cne_dagster
dagster dev

# Materialize all assets
dagster asset materialize --select "*"

# Run specific dbt models through Dagster
dagster asset materialize --select "tikal_dbt_dbt_assets"

dbt Operations via CLI

The project includes a custom CLI with enhanced dbt workflows:

# Launch interactive CLI
task cli

# Available commands in CLI:
create model --name my_model --type staging
create domain --name user_analytics
validate --all
select models --pattern "staging.*"

Direct dbt Commands

# Navigate to dbt project
cd cne-dbt-template

# Install dbt packages
dbt deps

# Run models
dbt run

# Run tests
dbt test

# Generate documentation
dbt docs generate
dbt docs serve

Task Automation

Common workflows are automated using Go-task:

# View all available tasks
task --list

# Development tasks
task dbt:run          # Run dbt models
task dbt:test         # Run dbt tests
task dbt:docs         # Generate and serve docs
task validate:all     # Run all validations
task format:sql       # Format SQL files

# CI/CD tasks
task ci:test          # Run CI tests
task ci:lint          # Run linters
task ci:security      # Security checks

πŸ§ͺ Data Quality & Testing

Built-in Testing Framework

  • dbt Tests: Schema tests, data tests, and custom tests
  • dbt-expectations: Great Expectations integration for advanced data quality
  • Elementary: Data observability and monitoring

Validation Pipeline

The project includes comprehensive validation:

# Run all validations
task validate:all

# Specific validations
cli_validate --check-model-names
cli_validate --check-sql-style
cli_validate --check-yaml-exists

Pre-commit Hooks

Automated code quality checks:

  • Security: Private key detection, branch protection
  • SQL: SQLFluff formatting and linting
  • Python: Black, isort, flake8, mypy
  • dbt: Model validation, macro documentation

πŸš€ Deployment

Docker Deployment

# Build container
docker build -t cne-dagster-template .

# Run container
docker run -p 3000:3000 \
  -e BIGQUERY_ACCOUNT=your-project \
  -e BIGQUERY_KEYFILE_PATH=/keys/service-account.json \
  -v /path/to/keys:/keys \
  cne-dagster-template

Environment Variables

Key environment variables for deployment:

# Dagster
DAGSTER_HOME=/opt/dagster/app

# dbt Profile
DBT_PROFILE_PROJECT=your-project
DBT_PROFILE=tikal_dbt
TARGET_NAME=prod

# BigQuery
BIGQUERY_DATABASE=your-dataset
BIGQUERY_KEYFILE_PATH=/path/to/keyfile.json
SOURCE_DATABASE=your-source-db

# Organization
ORG_ID=your-organization-id

πŸ”§ Configuration

dbt Configuration

Key configuration in cne-dbt-template/dbt_project.yml:

name: "tikal_dbt"
profile: "tikal_dbt"

vars:
  organization_id: "{{ env_var('ORG_ID') }}"
  source_database: "SAAS_STAGING"
  enable_separate_db: False

models:
  tikal_dbt:
    +on_schema_change: "sync_all_columns"

Dagster Configuration

Dagster is configured in cne_dagster/cne_dagster/definitions.py:

defs = Definitions(
    assets=[tikal_dbt_dbt_assets],
    schedules=schedules,
    resources={
        "dbt": DbtCliResource(project_dir=tikal_dbt_project),
    },
)

πŸ“š Development Workflows

Creating New Models

  1. Using CLI (Recommended):

    task cli
    create model --name user_metrics --type marts
  2. Manual Creation:

    # Create model file
    touch cne-dbt-template/models/marts/user_metrics.sql
    
    # Create corresponding YAML
    touch cne-dbt-template/models/marts/user_metrics.yml

Model Organization

Follow the medallion architecture:

  • Staging (models/*/staging/): Clean and standardize raw data
  • Marts (models/*/marts/): Business-defined entities for reporting
  • Gold (models/*/gold/): Aggregated, analysis-ready datasets

Testing New Models

# Test specific model
dbt test --select user_metrics

# Test with dependencies
dbt test --select +user_metrics+

# Run in Dagster
dagster asset materialize --select "user_metrics"

πŸ” Monitoring & Observability

Elementary Integration

The project includes Elementary for data observability:

# Generate Elementary report
dbt run --select elementary

# Serve Elementary UI
elementary monitor --project-dir cne-dbt-template

Dagster Monitoring

  • Asset Lineage: Visual representation of data dependencies
  • Run History: Track pipeline execution history
  • Alerts: Configure alerts for failed runs
  • Metrics: Monitor asset freshness and quality

🀝 Contributing

Development Setup

  1. Fork the repository
  2. Create a feature branch
  3. Set up pre-commit hooks:
    pre-commit install
  4. Make your changes
  5. Run tests and validations:
    task ci:test
    task validate:all
  6. Submit a pull request

Code Style

  • SQL: Follow SQLFluff configuration
  • Python: Black formatting, flake8 linting
  • Documentation: Update relevant docs for new features

πŸ“– Additional Resources

Documentation

IDE Extensions

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE.md file for details.

πŸ†˜ Troubleshooting

Common Issues

  1. dbt Connection Issues:

    task dbt:debug
    # Check your profiles.yml and environment variables
  2. Dagster Asset Loading:

    # Ensure dbt project is parsed
    cd cne-dbt-template && dbt parse
  3. CLI Not Working:

    # Reinstall in development mode
    uv pip install -e .

Getting Help

  • Check the Issues page for known problems
  • Review logs in cne-dbt-template/logs/dbt.log
  • Use task --list to see all available commands
  • Run commands with --help for detailed usage

Built with ❀️ by the Tikal CNE Team

About

Dagster data platform for data-platform

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published