Skip to content

Query Performance Analyzer & SQL Reviewer #113

@Empreiteiro

Description

@Empreiteiro

Summary

Allow users to paste existing ETL SQL (queries, stored procedures, views) and get AI-powered performance analysis: index suggestions, anti-pattern detection, partitioning recommendations, and optimization tips — all contextualized with the actual schema from connected sources.

Problem

  • Slow ETL queries are the WhatsApp Integration #1 operational issue in data pipelines
  • Users lack tools that combine SQL analysis with actual schema knowledge
  • Generic SQL linters don't know the data distribution, row counts, or column cardinality
  • EXPLAIN plans are hard to read without experience
  • Anti-patterns (SELECT *, implicit cartesian joins, N+1 patterns) go undetected until production

Proposed Solution

Analysis Capabilities

  1. Anti-pattern detection: SELECT *, implicit cross joins, OR chains instead of IN, unnecessary subqueries, non-sargable WHERE clauses
  2. Index suggestions: Based on WHERE/JOIN/ORDER BY columns + table cardinality from profiling
  3. Partitioning recommendations: Suggest partition keys based on date columns and query patterns
  4. Query rewrite suggestions: CTEs vs subqueries, window functions vs self-joins, UNION ALL vs UNION
  5. Dialect-specific advice: BigQuery (avoid SELECT *, use APPROX_COUNT_DISTINCT), Snowflake (clustering keys), PostgreSQL (partial indexes)
  6. Complexity scoring: Estimate query complexity (joins, subqueries, aggregations) as a readability metric

How It Works

  1. User pastes SQL or selects a saved query
  2. Platform identifies the target dialect and referenced tables
  3. If tables match connected sources → enrich analysis with actual profiling data (row counts, cardinality, data types)
  4. LLM analyzes the query with full context and returns structured recommendations
  5. Output: annotated SQL with inline suggestions + summary report

Technical Notes

  • Schema context: Match referenced table names against Source.metadata_.table_infos for enriched analysis
  • LLM prompt: System prompt with SQL optimization expertise + actual schema stats as context
  • No execution: The platform does not run EXPLAIN — it analyzes SQL statically with LLM reasoning
  • New endpoint: POST /api/tools/analyze-sql with sql, dialect, source_ids[] (optional for schema context)
  • Frontend: SQL editor with syntax highlighting + annotation sidebar for suggestions

Acceptance Criteria

  • Detect at least 10 common SQL anti-patterns
  • Suggest indexes based on query patterns + source profiling data
  • Support PostgreSQL, BigQuery, Snowflake, and Redshift dialects
  • Provide query rewrite suggestions with before/after examples
  • Complexity score per query (simple/moderate/complex/critical)
  • Enriched analysis when referenced tables match connected sources
  • Output as annotated SQL + exportable report (Markdown)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions