Add data-audit specification and dbt tooling infrastructure#80
Merged
michaelbarton merged 28 commits intomasterfrom Mar 10, 2026
Merged
Add data-audit specification and dbt tooling infrastructure#80michaelbarton merged 28 commits intomasterfrom
michaelbarton merged 28 commits intomasterfrom
Conversation
Removes the dependency on the community.general collection which was producing a warning about not supporting the installed Ansible version. Uses npm install --prefix directly instead. https://claude.ai/code/session_01Vm5EEsQ5uFKoni6qWEDQd8
Runs the full playbook (minus macOS-only launch agents) on ubuntu-latest: syntax check then an actual apply. Installs Neovim, Python 3.11, and virtualenv so the neovim setup tasks (pip venv, Lazy sync, treesitter) can run too. https://claude.ai/code/session_01Vm5EEsQ5uFKoni6qWEDQd8
Add <leader>dr (dbt run), <leader>dc (dbt compile), and <leader>dt (dbt test) keybindings that format the current SQL file with sqlfmt via conform, save, then send the dbt command to toggleterm. Also add sql to treesitter ensure_installed for better SQL highlighting. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ
- <leader>dR: dbt run -s model+ (model and all downstream dependents) - <leader>db: dbt build (run + test in DAG order) - <leader>ds: dbt show (preview query results without materializing) https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ
- <leader>dg: jump to ref() or source() under cursor - <leader>df: fzf model picker with ctrl-r/b/t to run/build/test - <leader>do: open compiled SQL in readonly vsplit - <leader>d/: grep across all models (find columns, CTEs, etc.) Inspired by fzf-dbt CLI tool, adapted for neovim with fzf-lua. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ
<leader>da: quick analysis with sonnet — reviews the current model and suggests improvements in non-interactive mode. <leader>dA: deep analysis with sonnet thinking — interrogates the duckdb database and cross-references with the current model to check data quality, joins, types, and more. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ
The <leader>dA keymap now compiles the model first, then passes the compiled SQL and first 20 rows from dbt show as extra context to the claude agent for more informed analysis. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ
Prompts now live in nvim/prompts/ as markdown files with {{var}}
template placeholders. The quick analysis prompt is loaded and
substituted in Lua; the deep analysis prompt uses sed at runtime
to inject compiled SQL and sample rows before passing to claude.
https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ
- <leader>da now replaces the buffer with the annotated SQL (undo with :u) - Updated prompt to allow brief explanations alongside suggestions - All dbt commands (run, build, test, compile, show) now use uv run https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ
<leader>dA now opens a new tmux window named 'dbt:<model>' that compiles the model, gathers sample rows, then starts an interactive claude session with that context so you can discuss the model. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ
Runs dbt show --limit 20 asynchronously and displays the results in a read-only scratch buffer. Press q to close. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ
fzf-lua is not installed; the config uses telescope via LazyVim. Converts wiki search, wiki insert link, dbt model finder (with C-r/C-b/C-t actions), and dbt grep to telescope equivalents. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ
Pipes `dbt show --output csv --limit 500` into visidata via toggleterm. Formats and saves the file first like other dbt commands. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ
Uses a dedicated toggleterm Terminal with close_on_exit and an on_exit callback instead of the shared terminal, so focus returns to the previous window when visidata is quit. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ
dbt show only supports --output json/text, not csv. Use --output json --log-format json and pipe through a python script that extracts the preview data and converts to CSV. https://claude.ai/code/session_016johXLfEd6P4umaT14YQEQ
- Close file handles properly in dbt_analyse.py (use `with` statements) - Add PEP 723 inline metadata to dbt_analyse.py for uv compatibility - Move `import re` to module level in dbt_batch_audit.py - Write context to a temp file in dbt_analyse.py to avoid ARG_MAX limits - Add subprocess timeouts (900s audit, 1200s synthesis) to prevent hangs - Order template substitutions to avoid placeholder injection https://claude.ai/code/session_01RHSUYqsWy6xLfAC9tFTy66
- Rewrite dbt_deep_analysis.md with a comprehensive 8-section audit
checklist covering schema/types, join correctness, filters, grain,
data quality, performance, test coverage gaps, and upstream risks
- Add structured output format (findings table, evidence queries,
suggested dbt test YAML snippets)
- Add conditional template sections ({{#if lineage}}, {{#if existing_tests}})
with a lightweight Handlebars-style renderer in both scripts
- Gather model lineage (parents/children) via `dbt ls` selectors
- Scan schema.yml files for existing test definitions and include them
so the LLM focuses on coverage gaps rather than redundant suggestions
- Add pyyaml dependency to both scripts' PEP 723 metadata
https://claude.ai/code/session_01RHSUYqsWy6xLfAC9tFTy66
- Remove unused `import json` - Remove stub `get_data_profile` that always returned empty string - Simplify redundant glob patterns in get_existing_tests (*.yml already covers schema.yml, _schema.yml, *_models.yml) https://claude.ai/code/session_01RHSUYqsWy6xLfAC9tFTy66
Move all dbt helpers and keymaps from keymaps.lua into a dedicated
config/dbt.lua, loaded via require("config.dbt"). Keeps keymaps.lua
focused on general-purpose bindings.
https://claude.ai/code/session_01RHSUYqsWy6xLfAC9tFTy66
Covers artifact types (dbt models, notebooks, flat files, Quarto docs), the PM orchestrator loop, structured findings model, LLM backend abstraction, CLI/API design, and phased implementation plan. https://claude.ai/code/session_019YdrjfrB6Lgu5QTZzrnXdb
Merge master and reformat dbt_analyse.py and dbt_batch_audit.py to pass the CI formatting checks. https://claude.ai/code/session_01BPxJNKYipwwrx17jvyv6rM
…hecks-0nymq # Conflicts: # .github/workflows/ansible.yml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a comprehensive specification for a new
data-auditPython package alongside supporting dbt analysis tooling and Neovim integration. The changes establish the foundation for LLM-powered auditing of heterogeneous data artifacts (dbt models, notebooks, flat files) with autonomous orchestration capabilities.Key Changes
Core Specification (
dbt/SPEC.md)data-auditpackage covering:Finding,AuditResult,AuditPlan,AuditTask)dbt Analysis Tools
dbt_batch_audit.py: Batch auditing script that:dbt_analyse.py: Interactive single-model analysis tool that:Prompt Templates:
dbt_deep_analysis.md: Comprehensive audit checklist covering schema, joins, filters, grain, data quality, performance, and test coveragedbt_quick_analysis.md: Quick inline code review with improvement suggestionsNeovim Integration (
nvim/lua/config/dbt.lua)<leader>dg: Jump to ref/source under cursor<leader>df: Fuzzy model picker with run/build/test actions<leader>do: Open compiled SQL in split<leader>d/: Search across models<leader>dr/dR: Run model (with/without downstream)<leader>db: Build model (run + test)<leader>dc: Compile model<leader>dt: Test model<leader>ds: Show preview<leader>dp: Preview sample rows in split<leader>dv: Pipe output to visidata for interactive explorationConfiguration Updates
nvim/lua/config/keymaps.lua: Migrate wiki search from fzf-lua to telescope for consistencyansible/tasks/neovim.yml: Add dbt directory symlink alongside ftplugin.github/workflows/ansible.yml: Add CI workflow for playbook syntax checking and executionnvim/lua/plugins/language.lua: Add SQL to treesitter language listNotable Implementation Details
{{var}},{{#if var}},{{^if var}}) for flexibilityuv runfor consistent Python environment managementhttps://claude.ai/code/session_01BPxJNKYipwwrx17jvyv6rM