Summary
Given existing SQL (stored procedures, views, dbt models, complex queries), the LLM reads the code and generates human-readable documentation: what each CTE does, which tables are read/written, implicit business rules, and data flow diagrams.
Problem
- Teams inherit undocumented ETL pipelines with hundreds of lines of SQL
- Understanding a complex stored procedure or dbt model requires deep reading
- Business rules are buried in CASE WHEN logic that non-technical stakeholders can't parse
- Onboarding new data engineers to existing pipelines is slow without docs
- No tool combines SQL parsing with LLM comprehension for documentation generation
Proposed Solution
Documentation Generated
- Overview: One-paragraph summary of what the SQL does
- Step-by-step breakdown: Each CTE/subquery explained in plain language
- Input/Output tables: Which tables are read, which are written/created
- Business rules extracted: CASE WHEN logic, filter conditions, join logic documented as business rules
- Data flow diagram: Mermaid flowchart showing table → CTE → output flow
- Complexity assessment: Number of joins, CTEs, aggregations, nesting depth
- Refactoring suggestions: Identify opportunities to simplify or break apart complex queries
How It Works
- User pastes SQL or uploads .sql files
- LLM analyzes the code with optional schema context from connected sources
- Platform returns structured documentation with sections
- User can regenerate sections, edit descriptions, and export
Export Formats
- Markdown: For GitHub/wiki documentation
- Confluence-ready HTML: Formatted for Confluence pages
- Inline comments: Annotated SQL with -- comments explaining each section
- README section: Summary suitable for dbt model description or README
Technical Notes
- No SQL parser needed: LLM handles SQL comprehension directly — no need for sqlglot or similar
- Schema enrichment: If referenced tables match connected sources, add column descriptions and profiling context
- Batch processing: Support uploading multiple .sql files and generating docs for each
- New endpoint: `POST /api/tools/document-sql` with `sql`, `source_ids[]` (optional), `format`
- Frontend: Split-view — SQL on the left, generated docs on the right
Acceptance Criteria
Summary
Given existing SQL (stored procedures, views, dbt models, complex queries), the LLM reads the code and generates human-readable documentation: what each CTE does, which tables are read/written, implicit business rules, and data flow diagrams.
Problem
Proposed Solution
Documentation Generated
How It Works
Export Formats
Technical Notes
Acceptance Criteria