A modular, interpretable pipeline that converts natural language questions into SQL queries using open-source LLMs via HuggingFace.
| Requirement | Implementation |
|---|---|
| Takes natural language questions as input | Web UI with question input and example queries |
| Reasons about the database schema | Chain-of-thought prompting breaks down query logic step-by-step |
| Generates safe, efficient SQL | LLM generates SQL with syntax verification and auto-correction |
| Returns human-readable answers | "In Plain English" section explains what the query does |
| Shows its reasoning | Displays numbered reasoning steps for transparency |
┌─────────────────────────────────────────────────────────────────┐
│ Flask Web UI │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Pipeline Modules │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Schema │→ │ Reasoning │→ │ SQL │ │
│ │ Processor │ │ (CoT) │ │ Generator │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Answer │← │ Verifier │ │
│ │ Generator │ │ & Corrector │ │
│ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
HuggingFace Inference API
(Qwen2.5-Coder-32B-Instruct)
- Schema Processing - Parses CREATE TABLE statements to understand database structure
- Chain-of-Thought Reasoning - Breaks down the question into logical steps (tables needed, joins, filters, etc.)
- SQL Generation - Generates SQL based on the reasoning
- Verification & Correction - Validates syntax and schema references, auto-corrects errors
- Answer Generation - Produces a human-readable explanation of what the query does
We intentionally designed this as a query generation tool rather than an execution engine for several important reasons:
-
Security - Executing arbitrary SQL on real databases poses significant security risks (SQL injection, data exposure, accidental data modification)
-
Flexibility - Users can review and modify the generated SQL before running it on their own databases with proper access controls
-
Database Agnostic - The tool works with any SQL dialect without needing database connections or credentials
-
Educational Focus - The transparent reasoning and human-readable explanations help users understand SQL, not just execute it
-
Production Safety - In real-world scenarios, generated SQL should always be reviewed by a human before execution
-
Install dependencies:
pip install -r requirements.txt
-
Configure HuggingFace API token:
- Get a free token at: https://huggingface.co/settings/tokens
- Create a
.envfile:HF_API_TOKEN=hf_your_token_here
-
Run the application:
python app.py
-
Open in browser:
http://localhost:5000
- Define Schema - Paste your CREATE TABLE statements
- Ask Question - Type your question in natural language
- Generate - Click to generate SQL with full reasoning
- Review - See the human-readable answer, SQL query, and step-by-step reasoning
- Backend: Python, Flask
- LLM: Qwen2.5-Coder-32B-Instruct (via HuggingFace Inference API)
- SQL Parsing: sqlparse
- Frontend: Vanilla HTML/CSS/JavaScript
├── app.py # Flask application
├── config.py # Configuration and prompts
├── pipeline/
│ ├── schema_processor.py # Schema parsing
│ ├── reasoning.py # Chain-of-thought reasoning
│ ├── sql_generator.py # SQL generation
│ ├── verifier.py # Syntax verification
│ └── answer_generator.py # Human-readable answers
├── utils/
│ └── hf_client.py # HuggingFace API client
├── templates/
│ └── index.html # Web UI
└── static/
└── styles.css # Styling
MIT License