A production-ready Streamlit app that converts natural language queries to SQL using OpenAI, validates them with sqlglot, and executes them against configured databases. Includes three orchestration modes: direct pipeline, LangChain multi-tool agent, and full agentic pipeline with step tracing and reasoning logs.
Features:
- 🔗 Connect to multiple databases via connection strings (PostgreSQL, MySQL, SQLite)
- 🔍 Dynamic schema introspection and semantic model generation (with table samples)
- 🧠 NLP→SQL generation via OpenAI with dialect-aware validation
- ✅ Automatic retry (up to 5 times) to fix invalid SQL
- 🛡️ SQL policy verification (read-only SELECT/DESCRIBE/EXPLAIN/SHOW only)
- 📊 Execute queries and summarize results (detailed stats or aggregations)
- 🤖 Three orchestration modes:
- Direct Mode: Simple generate→validate→execute flow
- LangChain Agent: Multi-tool orchestrator using LangChain
- Full Agentic Pipeline: Complete step tracing, reasoning logs, and insights
| File | Purpose |
|---|---|
app.py |
Streamlit UI with database config, semantic model builder, orchestration mode selector |
db.py |
DB connection, introspection, schema sampling, and query execution helpers |
semantic.py |
Build semantic model enriched with table samples |
nlp_sql.py |
OpenAI SQL generation with sqlglot validation and retry logic |
agent.py |
SQL verifier: enforce read-only policy, reject CRUD/DDL |
insights.py |
Data summarization and narrative insight generation via OpenAI |
agent_chain.py |
LangChain-based tool orchestrator |
agentic_pipeline.py |
Full end-to-end pipeline with step tracing and reasoning |
test_agent.py |
Unit tests for agent functions (pytest) |
setup_sample_db.py |
Create sample SQLite database for testing |
dry_run.py |
Dry-run example without Streamlit |
requirements.txt |
Python dependencies |
# Create virtualenv
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt# Required: OpenAI API key
export OPENAI_API_KEY="sk-..."
# Optional: OpenAI model (default: gpt-4)
export OPENAI_MODEL="gpt-4"
# Configure database URLs (JSON map)
export DB_URLS='{"default":"sqlite:////workspaces/nattu22/data.db"}'python setup_sample_db.pypytest test_agent.py -vpython dry_run.pystreamlit run app.pyThen open http://localhost:8501 in your browser.
- Introspect → Fetch database schema
- Semantic → Build model with table samples
- Generate SQL → OpenAI translates NL to SQL
- Validate → Check syntax and policy
- Execute → Run query with row limit
- Summarize → Compute statistics or aggregations
- Insights → OpenAI generates narrative
- Uses LangChain's ZERO_SHOT_REACT_DESCRIPTION agent
- Exposes tools: generate_sql, validate_sql, execute_sql, narrative_insights
- Agent decides which tools to use and in what order
- Step-by-step orchestration with full tracing
- Each step includes:
- input_data: What the step received
- output_data: What the step produced
- reasoning: Why the step was executed
- duration_ms: Execution time
- error: Any error message
- All traces stored and displayed in collapsible UI
User Query (NL)
↓
[Introspect Schema] → Get DB structure
↓
[Build Semantic Model] → Enrich with samples
↓
[Generate SQL] → OpenAI + dialect-aware
↓
[Validate Syntax] → sqlglot parsing
↓
[Verify Policy] → Read-only check (agent.py)
↓
[Execute] → Run SELECT with row limit
↓
[Summarize] → describe() + percentiles or aggregates
↓
[Narrative] → OpenAI generates insights
↓
Result + Traces + Reasoning Logs
- ✓ SQL validation with
sqlglot(syntax parsing) - ✓ Policy enforcement: SELECT/DESCRIBE/EXPLAIN/SHOW only (agent.py)
- ✓ Block CRUD/DDL keywords: INSERT, UPDATE, DELETE, DROP, ALTER, etc.
- ✓ Reject multiple statements (semicolon check)
- ✓ Comment stripping to prevent injection bypass
⚠️ For production: use read-only DB user, row limits, query timeouts, connection pooling, audit logging
Example 1: Local SQLite
export DB_URLS='{"test":"sqlite:////tmp/test.db"}'Example 2: PostgreSQL
export DB_URLS='{"prod":"postgresql://user:pass@localhost:5432/mydb"}'Example 3: Multiple databases
export DB_URLS='{"sqlite":"sqlite:////tmp/test.db","postgres":"postgresql://user:pass@db.example.com/prod"}'- Add conversation memory for multi-turn refinement
- Integrate with more LLM providers (Claude, Llama, etc.)
- Add query caching and results memoization
- Implement row-level security (RLS) filters
- Add audit logging and query history
- Deploy with authentication (OAuth2, JWT)
- Support for time-series and advanced SQL functions
- Add chart generation from results (Plotly, Matplotlib)
Prototype / Educational Use
Note: This is a prototype. Review and harden the security layer before using with production databases. pip install -r requirements.txt cp .env.example .env
uvicorn main:app --host 0.0.0.0 --port 8000
Quick start (frontend)
1. Go to `frontend` and install dependencies (Node 18+ recommended):
```bash
cd frontend
npm install
VITE_BACKEND_URL="http://localhost:8000" npm run dev
Now open the frontend URL shown by Vite. Create a session, record voice or type messages, and interact with the agent.
- Add authentication, query logging, and row-level limits.