Skip to content

PrenilS/vibeops-dbt-template

Repository files navigation

VibeOps dbt Template

A skeleton dbt project implementing the VibeOps analytics engineering governance approach — a framework for AI-assisted analytics engineering where the environment governs agent behaviour, not the prompt.

Based on: VibeOps: A Governance and Optimisation Framework for Agentic Coding Environments — Natu Lauchande, Sanlam Fintech, 2026.


What This Is

Standard dbt projects have SQL and config. This template adds a governance layer — three files and four slash commands that ensure an AI coding assistant (Claude Code, Cursor, etc.) builds models consistently, correctly, and with awareness of your data's specific quirks.

The core insight: the quality of AI-assisted analytics work is determined by the environment the agent operates in, not the model version or prompt style. When the agent knows that your CRM deduplicates on max(modified_at) and your analytics platform's sessions can span midnight, it stops making mistakes that compile and pass tests but produce wrong numbers.


The Three Governance Files

your-dbt-project/
├── AGENTS.md                  ← What is true about THIS project
├── ANALYTICS_ENG_SKILL.md     ← How to work in ANY dbt project
├── DBT_STYLE_GUIDE.md         ← How to write SQL in THIS project

These three files are the entire governance system. Everything else follows from them.

File What goes in it Who changes it
AGENTS.md Hard constraints, layer rules, tribal knowledge about your sources You + agent (with approval)
ANALYTICS_ENG_SKILL.md Universal methodology: explore → contract → build → grade Rarely — it's stack-agnostic
DBT_STYLE_GUIDE.md CTE patterns, materialization per layer, naming rules You, when conventions change

AGENTS.md — The Environment Contract

This is the highest-value file. It has three sections:

  1. Hard Constraints — rules the agent must never break (layer boundaries, sacred tests, build protocol)
  2. Architecture — your four-layer setup, databases, naming conventions
  3. Tribal Knowledge — data truths that would otherwise live only in your team's heads

The Tribal Knowledge section is where most of the value lives. Example entries:

### Salesforce CRM
- Contact records are soft-deleted: `is_deleted = true`, not physically removed
- `account_id` can be NULL for leads that haven't been converted — always LEFT JOIN
- Fivetran syncs create duplicate rows during schema changes; deduplicate on max(systemmodstamp)

### Google Analytics 4
- Sessions can span midnight. Never use event_date for session boundaries — use event_timestamp
- user_pseudo_id is a device identifier, not a user identifier

Every time a session discovers something surprising about a source system, the agent proposes adding it here. You approve it. The next session starts with that knowledge.

ANALYTICS_ENG_SKILL.md — The Methodology

The universal workflow — same regardless of what you're building:

  1. Explore — query the source before writing any SQL
  2. Data Contract — write the spec (grain, primary key, guarantees) before writing the model
  3. Build — one model at a time, run and test before moving on
  4. Grade — AnalyticsCheck self-score at session end

You generally don't need to change this file. The methodology travels; configuration stays in AGENTS.md.

DBT_STYLE_GUIDE.md — The Blueprint

Canonical SQL patterns for this project: staging CTE structure, incremental model config, multi-platform union pattern, incremental lookback, Snowflake-specific functions in use. When the agent wants to write a fact table, it reads this file to know the exact config block format.


The Four Slash Commands

.claude/commands/
├── explore-data.md      ← Understand a source. No building.
├── new-source.md        ← Integrate a new source end-to-end
├── edit-model.md        ← Modify existing models with lineage awareness
└── analytics-check.md  ← Self-grade the session

Usage in Claude Code (or any tool that supports slash commands mapped to markdown files):

/explore-data  — "What does the orders table look like? What's the grain?"
/new-source    — "Integrate the Stripe payments source"
/edit-model    — "Add refund_amount to fact_orders"
/analytics-check — run at end of session

Each command is a markdown file that loads a structured workflow. The agent reads it and follows the steps. /new-source delegates its exploration phase to /explore-data — methodology doesn't duplicate, it delegates.


The Data Contract

Before writing any model, the agent must produce a data contract from actual queries, not assumptions:

model: stg_stripe__charges
grain: "1 row per Stripe charge"
primary_key: charge_id
source_table: raw.stripe.charges
row_count_range: [50000, 500000]

guarantees:
  - charge_id is unique and not null
  - created_at is not null
  - amount is not null and > 0

known_issues:
  - ~2% of records have status = 'pending' indefinitely (Stripe quirk, not failures)

join_keys:
  - customer_id → dim_customers.stripe_customer_id (match rate: ~95%)

You cannot write a correct data contract without understanding the data. That's the point. The contract becomes the basis for dbt tests that enforce it on every run.


The AnalyticsCheck Score

AnalyticsCheck = (0.25 × env_health)
               + (0.30 × output_quality)
               + (0.30 × (1 - architecture_distance))
               + (0.15 × process_compliance)

Target: ≥ 0.90 | Acceptable: ≥ 0.80 | Review required: < 0.80
Dimension What it measures
Environment Health Did tribal knowledge cover the sources worked on?
Output Quality Do dbt run and dbt test pass? Row counts within contract?
Architecture Distance Layer boundaries respected? Correct patterns per layer?
Process Compliance Queried before writing? Contract first? Incremental build?

Low scores map directly to specific improvements: low Architecture Distance → update style guide; low Environment Health → add tribal knowledge to AGENTS.md.


How to Adapt This Template

Step 1 — Clone and rename

git clone <this-repo> my-dbt-project
cd my-dbt-project

Step 2 — Fill in AGENTS.md

Replace the placeholder content with your actual project details:

  • Your data warehouse (Snowflake, BigQuery, Databricks, Redshift)
  • Your source systems
  • Your database/schema names
  • Start the Tribal Knowledge section empty — it fills itself as you work

Step 3 — Update dbt_project.yml

Replace your_project_name, database names, and schema names.

Step 4 — Update DBT_STYLE_GUIDE.md

Replace the Snowflake-specific functions with your warehouse's equivalents if needed. The patterns are generic — only the functions in the "Warehouse-Specific Patterns" section need updating.

Step 5 — Leave ANALYTICS_ENG_SKILL.md alone

It's universal. The only thing you might update is the data contract format section if your use case requires additional fields.

Step 6 — Set up Claude Code (or equivalent)

Place the .claude/commands/ files where your AI tool expects slash commands. In Claude Code, these are read automatically from .claude/commands/ at the project root.


Project Structure

your-dbt-project/
│
├── AGENTS.md                          # Edit this for your project
├── ANALYTICS_ENG_SKILL.md             # Universal — leave mostly unchanged
├── DBT_STYLE_GUIDE.md                 # Edit warehouse-specific patterns
│
├── dbt_project.yml                    # Standard dbt config
├── packages.yml                       # dbt packages
├── profiles.yml                       # Local connection (gitignored)
│
├── .claude/
│   └── commands/                      # Slash commands
│       ├── explore-data.md
│       ├── new-source.md
│       ├── edit-model.md
│       └── analytics-check.md
│
├── models/
│   ├── staging/                       # views — column rename only
│   │   └── <source_name>/
│   │       ├── _sources.yml
│   │       └── stg_<source>__<table>.sql
│   │
│   ├── intermediate/                  # table/incremental — business logic
│   │   └── int_<name>.sql
│   │
│   ├── marts/                         # dim_* (table), fact_* (incremental)
│   │   ├── dim_<name>.sql
│   │   └── fact_<name>.sql
│   │
│   └── products/                      # prod_* — BI-ready aggregations
│       └── prod_<name>.sql
│
├── macros/
│   ├── generate_database_name.sql     # Multi-env database routing
│   └── get_custom_schema.sql
│
├── analyses/                          # Exploration docs (markdown + SQL)
│   └── exploration_template.md
│
├── seeds/
├── snapshots/
└── tests/

The Self-Improving Loop

Session runs
    ↓
New data truth discovered
    ↓
Agent proposes addition to AGENTS.md Tribal Knowledge
    ↓
You approve
    ↓
Next session starts with that knowledge
    ↓
AnalyticsCheck score rises over time

The environment gets smarter with every session. That's the system.


Example Models

This template includes minimal working examples in each layer using a fictitious e-commerce scenario (orders, customers, products). They demonstrate the correct structure for each layer — not production-ready models.

See:


Further Reading

  • VibeOps paper: VibeOps: A Governance and Optimisation Framework for Agentic Coding Environments — Natu Lauchande, Sanlam Fintech, 2026
  • Reference software implementation: github.com/nlauchande/fastapi-vibeops-template
  • dbt docs: docs.getdbt.com

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors