dbt Model & Schema.yml Generator from Connected Sources

## Summary

Generate dbt project artifacts (SQL models, schema.yml, sources.yml) from connected data sources using AI-assisted code generation. The platform already parses dbt manifests (`ask_dbt.py`) — this closes the loop by generating dbt code back.

## Problem

- Writing dbt staging models is repetitive: SELECT + CAST + rename for every source table
- schema.yml with column descriptions and tests requires manual effort per column
- sources.yml must be kept in sync with actual database schemas
- The platform already has all the metadata needed (table_infos, profiling, relationships) but doesn't generate dbt artifacts
- Teams spend hours on boilerplate that could be auto-generated from profiling

## Proposed Solution

### Generated Artifacts

1. **sources.yml**: Source definitions with database/schema/table references, column lists, and freshness config
2. **Staging models** (`stg_*.sql`): SELECT from source with column renaming, type casting, and basic cleaning
3. **Intermediate models** (`int_*.sql`): Join/union staging models based on detected relationships
4. **Mart models** (`fct_*.sql`, `dim_*.sql`): Fact and dimension tables from Gold layer suggestions
5. **schema.yml**: Column descriptions (LLM-generated), tests (not_null, unique, accepted_values, relationships)

### Naming Conventions

Follow dbt best practices:
- `stg_{source_name}__{table_name}.sql`
- `int_{domain}__{description}.sql`
- `fct_{business_process}.sql` / `dim_{entity}.sql`

### How It Works

1. User selects source(s) and target dbt layer (staging, intermediate, marts)
2. LLM receives schema + profiling + detected relationships
3. Platform generates .sql and .yml files as downloadable text or ZIP
4. User copies into their dbt project

## Technical Notes

- **Existing infrastructure**: `ask_dbt.py` already parses `manifest.json` with `_extract_table_infos_from_manifest()` — reuse for bidirectional context
- **Profiling → tests**: `sample_profile` columns with 0 nulls → `not_null` test; low cardinality → `accepted_values`; unique counts matching row count → `unique` test
- **Relationship → ref()**: `suggest_source_relationships()` from `sql_utils.py` maps to dbt `ref()` and `relationships` tests
- **New endpoint**: `POST /api/sources/{id}/generate-dbt` with params: `layer`, `project_name`, `naming_convention`
- **Output format**: Individual files or bundled ZIP download

## Acceptance Criteria

- [ ] Generate sources.yml from any connected SQL source with accurate database/schema/table paths
- [ ] Generate staging models with proper CAST, rename, and cleaning based on profiling
- [ ] Generate schema.yml with LLM-written column descriptions
- [ ] Auto-suggest dbt tests from profiling (not_null, unique, accepted_values, relationships)
- [ ] Follow dbt naming conventions (stg_, int_, fct_, dim_)
- [ ] Output downloadable as individual files or ZIP bundle
- [ ] Support Jinja ref() and source() macros in generated SQL

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dbt Model & Schema.yml Generator from Connected Sources #109

Summary

Problem

Proposed Solution

Generated Artifacts

Naming Conventions

How It Works

Technical Notes

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

dbt Model & Schema.yml Generator from Connected Sources #109

Description

Summary

Problem

Proposed Solution

Generated Artifacts

Naming Conventions

How It Works

Technical Notes

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions