Transformation Mapping Assistant (Source-to-Target Column Mapping)

## Summary

AI-assisted column mapping between a source schema and a target schema. Given two connected sources (or a source + a manually defined target), the LLM suggests column-by-column mappings with transformation expressions (CAST, TRIM, COALESCE, lookups). Essential for data migrations and integration projects.

## Problem

- Source-to-target mapping is one of the most time-consuming ETL tasks
- Manual mapping in spreadsheets is error-prone and doesn't leverage data profiling
- Column names often differ between source and target (e.g., `cust_nm` → `customer_name`)
- Type mismatches require transformation logic that users must figure out manually
- No tool combines schema metadata + data profiling + LLM reasoning for mapping suggestions

## Proposed Solution

### Mapping Flow

1. User selects **source** (connected data source) and **target** (another source, or manual schema definition)
2. Platform profiles both sides and feeds metadata to the LLM
3. LLM suggests mappings with confidence scores:

```json
{
  "mappings": [
    {
      "source_column": "cust_nm",
      "target_column": "customer_name",
      "confidence": 0.95,
      "transform": "TRIM(cust_nm)",
      "reasoning": "Name similarity + same data type (VARCHAR)"
    },
    {
      "source_column": "created_dt",
      "target_column": "created_at",
      "confidence": 0.90,
      "transform": "CAST(created_dt AS TIMESTAMP)",
      "reasoning": "Date column with different naming convention"
    },
    {
      "source_column": null,
      "target_column": "updated_at",
      "confidence": 0,
      "transform": "CURRENT_TIMESTAMP",
      "reasoning": "No source match — suggest default value"
    }
  ]
}
```

4. User reviews, adjusts, and exports as SQL, mapping spreadsheet, or dbt model

### Matching Strategies

- **Name similarity**: Fuzzy match on column names (Levenshtein, token overlap)
- **Type compatibility**: Match compatible data types
- **Value overlap**: Compare sample values between source and target columns
- **LLM semantic matching**: Understand that `amt` maps to `amount`, `qty` to `quantity`

### Export Formats

- SQL (SELECT with transforms)
- CSV/XLSX mapping spreadsheet (for documentation)
- dbt model (SELECT with ref())
- Informatica/Talend mapping format

## Technical Notes

- **Profiling both sides**: Use existing `_build_sample_profile()` from `crud.py` for both source and target
- **LLM context**: Send both schemas + sample values + profiling stats for informed matching
- **Manual target schema**: Allow users to paste DDL or upload CSV header to define target
- **New endpoint**: `POST /api/mapping/suggest` with `source_id`, `target_id` (or `target_schema` JSON)
- **Frontend**: Interactive mapping table with drag-drop reassignment and transform editing

## Acceptance Criteria

- [ ] Suggest column mappings between two connected sources with confidence scores
- [ ] Support manual target schema definition (paste DDL or define columns)
- [ ] Include transformation expressions (CAST, TRIM, COALESCE) in suggestions
- [ ] Confidence score per mapping (high/medium/low/unmapped)
- [ ] User can accept, reject, or modify each mapping
- [ ] Export as SQL SELECT, mapping spreadsheet (CSV/XLSX), or dbt model
- [ ] Handle unmapped columns with suggested defaults or flags

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transformation Mapping Assistant (Source-to-Target Column Mapping) #112

Summary

Problem

Proposed Solution

Mapping Flow

Matching Strategies

Export Formats

Technical Notes

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Transformation Mapping Assistant (Source-to-Target Column Mapping) #112

Description

Summary

Problem

Proposed Solution

Mapping Flow

Matching Strategies

Export Formats

Technical Notes

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions