Add data-audit specification and dbt tooling infrastructure by michaelbarton · Pull Request #80 · michaelbarton/dotfiles

michaelbarton · 2026-03-10T01:45:45Z

Summary

This PR introduces a comprehensive specification for a new data-audit Python package alongside supporting dbt analysis tooling and Neovim integration. The changes establish the foundation for LLM-powered auditing of heterogeneous data artifacts (dbt models, notebooks, flat files) with autonomous orchestration capabilities.

Key Changes

Core Specification (`dbt/SPEC.md`)

Comprehensive specification for the data-audit package covering:
- Problem statement and goals for unified artifact auditing
- Support for multiple artifact types: dbt models, schema files, Jupyter notebooks, Quarto documents, and flat data files
- Architecture including core data models (Finding, AuditResult, AuditPlan, AuditTask)
- Auditor protocol for extensible artifact handling
- LLM backend abstraction supporting multiple providers (Anthropic, cursor-agent, litellm)
- Orchestrator ("project manager") for autonomous follow-up audits with budget controls
- CLI interface and programmatic API design
- Prompt template system with conditional rendering

dbt Analysis Tools

dbt_batch_audit.py: Batch auditing script that:
- Compiles dbt models and gathers context (compiled SQL, sample rows, lineage, existing tests)
- Runs audits in parallel across multiple models and LLMs
- Synthesizes findings into a consolidated report with cross-model impact analysis
- Includes downstream propagation tracing for bug impact assessment
dbt_analyse.py: Interactive single-model analysis tool that:
- Compiles a model and gathers full context
- Launches cursor-agent for interactive LLM-driven analysis
- Supports custom prompt templates
Prompt Templates:
- dbt_deep_analysis.md: Comprehensive audit checklist covering schema, joins, filters, grain, data quality, performance, and test coverage
- dbt_quick_analysis.md: Quick inline code review with improvement suggestions

Neovim Integration (`nvim/lua/config/dbt.lua`)

Comprehensive dbt keymaps and helpers:
- <leader>dg: Jump to ref/source under cursor
- <leader>df: Fuzzy model picker with run/build/test actions
- <leader>do: Open compiled SQL in split
- <leader>d/: Search across models
- <leader>dr/dR: Run model (with/without downstream)
- <leader>db: Build model (run + test)
- <leader>dc: Compile model
- <leader>dt: Test model
- <leader>ds: Show preview
- <leader>dp: Preview sample rows in split
- <leader>dv: Pipe output to visidata for interactive exploration

Configuration Updates

nvim/lua/config/keymaps.lua: Migrate wiki search from fzf-lua to telescope for consistency
ansible/tasks/neovim.yml: Add dbt directory symlink alongside ftplugin
.github/workflows/ansible.yml: Add CI workflow for playbook syntax checking and execution
nvim/lua/plugins/language.lua: Add SQL to treesitter language list

Notable Implementation Details

Template rendering uses simple Handlebars-style syntax ({{var}}, {{#if var}}, {{^if var}}) for flexibility
Batch audit includes sophisticated synthesis prompt that traces bug propagation through model dependency chains
dbt commands use uv run for consistent Python environment management
Neovim integration uses toggleterm for async command execution while maintaining editor focus
Budget controls in orchestrator prevent runaway analysis (max_depth, max_tasks, max_wall_clock, max_tokens)

https://claude.ai/code/session_01BPxJNKYipwwrx17jvyv6rM

Removes the dependency on the community.general collection which was producing a warning about not supporting the installed Ansible version. Uses npm install --prefix directly instead. https://claude.ai/code/session_01Vm5EEsQ5uFKoni6qWEDQd8

Runs the full playbook (minus macOS-only launch agents) on ubuntu-latest: syntax check then an actual apply. Installs Neovim, Python 3.11, and virtualenv so the neovim setup tasks (pip venv, Lazy sync, treesitter) can run too. https://claude.ai/code/session_01Vm5EEsQ5uFKoni6qWEDQd8

Add <leader>dr (dbt run), <leader>dc (dbt compile), and <leader>dt (dbt test) keybindings that format the current SQL file with sqlfmt via conform, save, then send the dbt command to toggleterm. Also add sql to treesitter ensure_installed for better SQL highlighting. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ

- <leader>dR: dbt run -s model+ (model and all downstream dependents) - <leader>db: dbt build (run + test in DAG order) - <leader>ds: dbt show (preview query results without materializing) https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ

- <leader>dg: jump to ref() or source() under cursor - <leader>df: fzf model picker with ctrl-r/b/t to run/build/test - <leader>do: open compiled SQL in readonly vsplit - <leader>d/: grep across all models (find columns, CTEs, etc.) Inspired by fzf-dbt CLI tool, adapted for neovim with fzf-lua. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ

<leader>da: quick analysis with sonnet — reviews the current model and suggests improvements in non-interactive mode. <leader>dA: deep analysis with sonnet thinking — interrogates the duckdb database and cross-references with the current model to check data quality, joins, types, and more. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ

The <leader>dA keymap now compiles the model first, then passes the compiled SQL and first 20 rows from dbt show as extra context to the claude agent for more informed analysis. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ

Prompts now live in nvim/prompts/ as markdown files with {{var}} template placeholders. The quick analysis prompt is loaded and substituted in Lua; the deep analysis prompt uses sed at runtime to inject compiled SQL and sample rows before passing to claude. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ

- <leader>da now replaces the buffer with the annotated SQL (undo with :u) - Updated prompt to allow brief explanations alongside suggestions - All dbt commands (run, build, test, compile, show) now use uv run https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ

<leader>dA now opens a new tmux window named 'dbt:<model>' that compiles the model, gathers sample rows, then starts an interactive claude session with that context so you can discuss the model. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ

Runs dbt show --limit 20 asynchronously and displays the results in a read-only scratch buffer. Press q to close. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ

https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ

fzf-lua is not installed; the config uses telescope via LazyVim. Converts wiki search, wiki insert link, dbt model finder (with C-r/C-b/C-t actions), and dbt grep to telescope equivalents. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ

Pipes `dbt show --output csv --limit 500` into visidata via toggleterm. Formats and saves the file first like other dbt commands. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ

Uses a dedicated toggleterm Terminal with close_on_exit and an on_exit callback instead of the shared terminal, so focus returns to the previous window when visidata is quit. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ

dbt show only supports --output json/text, not csv. Use --output json --log-format json and pipe through a python script that extracts the preview data and converts to CSV. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ

- Close file handles properly in dbt_analyse.py (use `with` statements) - Add PEP 723 inline metadata to dbt_analyse.py for uv compatibility - Move `import re` to module level in dbt_batch_audit.py - Write context to a temp file in dbt_analyse.py to avoid ARG_MAX limits - Add subprocess timeouts (900s audit, 1200s synthesis) to prevent hangs - Order template substitutions to avoid placeholder injection https://claude.ai/code/session_01RHSUYqsWy6xLfAC9tFTy66

- Rewrite dbt_deep_analysis.md with a comprehensive 8-section audit checklist covering schema/types, join correctness, filters, grain, data quality, performance, test coverage gaps, and upstream risks - Add structured output format (findings table, evidence queries, suggested dbt test YAML snippets) - Add conditional template sections ({{#if lineage}}, {{#if existing_tests}}) with a lightweight Handlebars-style renderer in both scripts - Gather model lineage (parents/children) via `dbt ls` selectors - Scan schema.yml files for existing test definitions and include them so the LLM focuses on coverage gaps rather than redundant suggestions - Add pyyaml dependency to both scripts' PEP 723 metadata https://claude.ai/code/session_01RHSUYqsWy6xLfAC9tFTy66

- Remove unused `import json` - Remove stub `get_data_profile` that always returned empty string - Simplify redundant glob patterns in get_existing_tests (*.yml already covers schema.yml, _schema.yml, *_models.yml) https://claude.ai/code/session_01RHSUYqsWy6xLfAC9tFTy66

Move all dbt helpers and keymaps from keymaps.lua into a dedicated config/dbt.lua, loaded via require("config.dbt"). Keeps keymaps.lua focused on general-purpose bindings. https://claude.ai/code/session_01RHSUYqsWy6xLfAC9tFTy66

Covers artifact types (dbt models, notebooks, flat files, Quarto docs), the PM orchestrator loop, structured findings model, LLM backend abstraction, CLI/API design, and phased implementation plan. https://claude.ai/code/session_019YdrjfrB6Lgu5QTZzrnXdb

Merge master and reformat dbt_analyse.py and dbt_batch_audit.py to pass the CI formatting checks. https://claude.ai/code/session_01BPxJNKYipwwrx17jvyv6rM

…hecks-0nymq # Conflicts: # .github/workflows/ansible.yml

claude and others added 28 commits March 1, 2026 23:55

Add <leader>dp to preview dbt model rows in a split

2038e17

Runs dbt show --limit 20 asynchronously and displays the results in a read-only scratch buffer. Press q to close. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ

Return cursor to code buffer after dbt_cmd sends to toggleterm

0885d15

https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ

Add <leader>dv to show dbt model output in visidata

35131c3

Pipes `dbt show --output csv --limit 500` into visidata via toggleterm. Formats and saves the file first like other dbt commands. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ

Return focus to code buffer after visidata exits

4983db0

Uses a dedicated toggleterm Terminal with close_on_exit and an on_exit callback instead of the shared terminal, so focus returns to the previous window when visidata is quit. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ

Rename dbt prompt files

b4fe1b8

Add explicit python scripts for db analsis

054a855

Update dbt nvim keymaps

b5a745b

Split dbt nvim keymaps into their own file

9eb9e43

Move all dbt helpers and keymaps from keymaps.lua into a dedicated config/dbt.lua, loaded via require("config.dbt"). Keeps keymaps.lua focused on general-purpose bindings. https://claude.ai/code/session_01RHSUYqsWy6xLfAC9tFTy66

Merge branch 'master' into claude/fix-status-checks-0nymq

9b6993f

Fix black formatting in dbt Python scripts

01a367b

Merge master and reformat dbt_analyse.py and dbt_batch_audit.py to pass the CI formatting checks. https://claude.ai/code/session_01BPxJNKYipwwrx17jvyv6rM

Merge remote-tracking branch 'origin/master' into claude/fix-status-c…

7367bf9

…hecks-0nymq # Conflicts: # .github/workflows/ansible.yml

Delete dbt/SPEC.md

e1269e9

michaelbarton merged commit ca41eb3 into master Mar 10, 2026
2 checks passed

michaelbarton deleted the claude/fix-status-checks-0nymq branch March 10, 2026 01:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add data-audit specification and dbt tooling infrastructure#80

Add data-audit specification and dbt tooling infrastructure#80
michaelbarton merged 28 commits intomasterfrom
claude/fix-status-checks-0nymq

michaelbarton commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

michaelbarton commented Mar 10, 2026

Summary

Key Changes

Core Specification (dbt/SPEC.md)

dbt Analysis Tools

Neovim Integration (nvim/lua/config/dbt.lua)

Configuration Updates

Notable Implementation Details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Core Specification (`dbt/SPEC.md`)

Neovim Integration (`nvim/lua/config/dbt.lua`)