A comprehensive data engineering template that integrates Dagster for orchestration with dbt for data transformation, providing a robust foundation for modern data pipelines.
This template consists of three main components:
- Dagster Orchestration Layer (
cne_dagster/) - Asset-based pipeline orchestration - dbt Transformation Layer (
cne-dbt-template/) - SQL-based data transformations - Custom CLI Tool - Enhanced dbt workflow management with validation and automation
- β Dagster + dbt Integration: Seamless orchestration of dbt models as Dagster assets
- β Multi-Cloud Support: BigQuery and Snowflake connectors
- β Custom CLI: Enhanced dbt workflows with validation and automation
- β Data Quality: Built-in testing with dbt-expectations and Elementary
- β Code Quality: Pre-commit hooks, SQL formatting, and validation
- β Docker Support: Containerized deployment ready
- β Task Automation: Go-task based workflow automation
Before setting up the project, ensure you have:
- Python 3.12+ (recommended 3.13)
- Go-task (Installation Guide)
- Git and GitHub CLI (optional but recommended)
- Docker (for containerized deployment)
- Access to BigQuery or Snowflake data warehouse
Clone the repository and set up your development environment:
# Clone the repository
git clone <repository-url>
cd cne-dagster-template
# Option A: Automated setup (recommended)
task setup-env
# Option B: Manual setup
# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate
uv pip install -e .Copy the environment template and configure your settings:
cp cne-dbt-template/.env_example cne-dbt-template/.envEdit .env with your warehouse connection details:
# BigQuery Configuration
BIGQUERY_ACCOUNT=your-project-id
BIGQUERY_DATABASE=your-dataset
BIGQUERY_KEYFILE_PATH=path/to/service-account.json
TARGET_NAME=dev
# Organization Settings
ORG_ID=your-org-idTest your configuration:
# Verify environment setup
task test-setup
# Test dbt connection
task dbt:debug
# Launch CLI interface
task cliStart the Dagster web interface:
# Development mode
cd cne_dagster
dagster dev
# Or using Docker
docker build -t cne-dagster-template .
docker run -p 3000:3000 cne-dagster-templateAccess Dagster UI at http://localhost:3000
cne-dagster-template/
βββ cne_dagster/ # Dagster orchestration layer
β βββ cne_dagster/
β β βββ assets.py # dbt assets definition
β β βββ definitions.py # Dagster definitions
β β βββ project.py # dbt project configuration
β β βββ schedules.py # Pipeline schedules
β βββ pyproject.toml # Dagster dependencies
βββ cne-dbt-template/ # dbt transformation layer
β βββ models/ # dbt models (staging, marts)
β βββ macros/ # Reusable SQL macros
β βββ tests/ # Data quality tests
β βββ cli/ # Custom CLI tool
β β βββ commands/ # CLI command implementations
β β βββ utils/ # Utility functions
β β βββ validate/ # Validation plugins
β βββ dbt_project.yml # dbt project configuration
β βββ Taskfile.yml # Task automation
βββ Dockerfile # Container configuration
βββ pyproject.toml # Root project dependencies
# Start Dagster development server
cd cne_dagster
dagster dev
# Materialize all assets
dagster asset materialize --select "*"
# Run specific dbt models through Dagster
dagster asset materialize --select "tikal_dbt_dbt_assets"The project includes a custom CLI with enhanced dbt workflows:
# Launch interactive CLI
task cli
# Available commands in CLI:
create model --name my_model --type staging
create domain --name user_analytics
validate --all
select models --pattern "staging.*"# Navigate to dbt project
cd cne-dbt-template
# Install dbt packages
dbt deps
# Run models
dbt run
# Run tests
dbt test
# Generate documentation
dbt docs generate
dbt docs serveCommon workflows are automated using Go-task:
# View all available tasks
task --list
# Development tasks
task dbt:run # Run dbt models
task dbt:test # Run dbt tests
task dbt:docs # Generate and serve docs
task validate:all # Run all validations
task format:sql # Format SQL files
# CI/CD tasks
task ci:test # Run CI tests
task ci:lint # Run linters
task ci:security # Security checks- dbt Tests: Schema tests, data tests, and custom tests
- dbt-expectations: Great Expectations integration for advanced data quality
- Elementary: Data observability and monitoring
The project includes comprehensive validation:
# Run all validations
task validate:all
# Specific validations
cli_validate --check-model-names
cli_validate --check-sql-style
cli_validate --check-yaml-existsAutomated code quality checks:
- Security: Private key detection, branch protection
- SQL: SQLFluff formatting and linting
- Python: Black, isort, flake8, mypy
- dbt: Model validation, macro documentation
# Build container
docker build -t cne-dagster-template .
# Run container
docker run -p 3000:3000 \
-e BIGQUERY_ACCOUNT=your-project \
-e BIGQUERY_KEYFILE_PATH=/keys/service-account.json \
-v /path/to/keys:/keys \
cne-dagster-templateKey environment variables for deployment:
# Dagster
DAGSTER_HOME=/opt/dagster/app
# dbt Profile
DBT_PROFILE_PROJECT=your-project
DBT_PROFILE=tikal_dbt
TARGET_NAME=prod
# BigQuery
BIGQUERY_DATABASE=your-dataset
BIGQUERY_KEYFILE_PATH=/path/to/keyfile.json
SOURCE_DATABASE=your-source-db
# Organization
ORG_ID=your-organization-idKey configuration in cne-dbt-template/dbt_project.yml:
name: "tikal_dbt"
profile: "tikal_dbt"
vars:
organization_id: "{{ env_var('ORG_ID') }}"
source_database: "SAAS_STAGING"
enable_separate_db: False
models:
tikal_dbt:
+on_schema_change: "sync_all_columns"Dagster is configured in cne_dagster/cne_dagster/definitions.py:
defs = Definitions(
assets=[tikal_dbt_dbt_assets],
schedules=schedules,
resources={
"dbt": DbtCliResource(project_dir=tikal_dbt_project),
},
)-
Using CLI (Recommended):
task cli create model --name user_metrics --type marts
-
Manual Creation:
# Create model file touch cne-dbt-template/models/marts/user_metrics.sql # Create corresponding YAML touch cne-dbt-template/models/marts/user_metrics.yml
Follow the medallion architecture:
- Staging (
models/*/staging/): Clean and standardize raw data - Marts (
models/*/marts/): Business-defined entities for reporting - Gold (
models/*/gold/): Aggregated, analysis-ready datasets
# Test specific model
dbt test --select user_metrics
# Test with dependencies
dbt test --select +user_metrics+
# Run in Dagster
dagster asset materialize --select "user_metrics"The project includes Elementary for data observability:
# Generate Elementary report
dbt run --select elementary
# Serve Elementary UI
elementary monitor --project-dir cne-dbt-template- Asset Lineage: Visual representation of data dependencies
- Run History: Track pipeline execution history
- Alerts: Configure alerts for failed runs
- Metrics: Monitor asset freshness and quality
- Fork the repository
- Create a feature branch
- Set up pre-commit hooks:
pre-commit install
- Make your changes
- Run tests and validations:
task ci:test task validate:all
- Submit a pull request
- SQL: Follow SQLFluff configuration
- Python: Black formatting, flake8 linting
- Documentation: Update relevant docs for new features
- dbt: dbt Documentation
- Dagster: Dagster Documentation
- Dagster + dbt: Integration Guide
- VS Code: dbt Power User
- IntelliJ: dbt Plugin
This project is licensed under the MIT License - see the LICENSE.md file for details.
-
dbt Connection Issues:
task dbt:debug # Check your profiles.yml and environment variables -
Dagster Asset Loading:
# Ensure dbt project is parsed cd cne-dbt-template && dbt parse
-
CLI Not Working:
# Reinstall in development mode uv pip install -e .
- Check the Issues page for known problems
- Review logs in
cne-dbt-template/logs/dbt.log - Use
task --listto see all available commands - Run commands with
--helpfor detailed usage
Built with β€οΈ by the Tikal CNE Team