Skip to content

ERD & Relationship Discovery with Interactive Diagram Export #114

@Empreiteiro

Description

@Empreiteiro

Summary

Auto-detect foreign key relationships between tables across connected sources using naming conventions, value overlap analysis, and LLM inference. Generate interactive ERD diagrams exportable as Mermaid, PNG, or SQL constraints.

Problem

  • `suggest_source_relationships()` in `sql_utils.py` only matches exact column names — misses semantic relationships (`user_id` → `users.id`)
  • No visual representation of table relationships across sources
  • Users manually document ERDs in external tools (dbdiagram.io, Lucidchart)
  • Foreign keys in source databases are often missing (especially in data warehouses)
  • Without relationship visibility, users write incorrect JOINs

Proposed Solution

Detection Strategies (layered)

  1. Naming convention matching (existing): `order.customer_id` → `customer.id`
  2. Value overlap analysis (new): Compare distinct values between columns — high overlap suggests FK
  3. Type + cardinality matching (new): Integer column in table A with cardinality matching row count of table B
  4. LLM semantic inference (new): Understand that `emp_no` relates to `employees.employee_number`
  5. Explicit FK metadata (new): Read actual FK constraints from source databases when available

Confidence Scoring

Each detected relationship gets a confidence score:

  • High (0.9+): Exact name match + value overlap + type match
  • Medium (0.6-0.9): Semantic match + partial value overlap
  • Low (0.3-0.6): Name similarity only
  • Suggested (<0.3): LLM inference, needs user validation

Output Formats

  • Interactive diagram: In-app ERD with zoom, pan, click-to-explore (React Flow or similar)
  • Mermaid: For embedding in Markdown documentation
  • SQL constraints: ALTER TABLE ADD CONSTRAINT FOREIGN KEY statements
  • dbt relationships: schema.yml relationship test definitions
  • dbdiagram.io format: DBML for import into dbdiagram.io

Technical Notes

  • Extend existing: Build on `suggest_source_relationships()` in `sql_utils.py` (lines 185-209) — add value overlap and LLM layers
  • Value overlap: Sample N values from candidate FK columns, compute Jaccard similarity
  • New endpoint: `GET /api/sources/{id}/relationships/discover` and `GET /api/sources/{id}/erd?format=mermaid|dbml|sql`
  • Frontend: New ERD component using React Flow or d3-dag for interactive graph

Acceptance Criteria

  • Detect relationships beyond exact name matching (value overlap, LLM inference)
  • Confidence score per detected relationship
  • Interactive ERD diagram in the UI with zoom and table expand
  • Export as Mermaid, SQL constraints, dbt relationships, and DBML
  • Support cross-source relationship detection (e.g., CSV ↔ SQL database)
  • User can confirm, reject, or manually add relationships
  • Persist validated relationships for use in Q&A and other features

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions