NLP-to-SQL Streamlit Agent

A production-ready Streamlit app that converts natural language queries to SQL using OpenAI, validates them with sqlglot, and executes them against configured databases. Includes three orchestration modes: direct pipeline, LangChain multi-tool agent, and full agentic pipeline with step tracing and reasoning logs.

Overview

Features:

🔗 Connect to multiple databases via connection strings (PostgreSQL, MySQL, SQLite)
🔍 Dynamic schema introspection and semantic model generation (with table samples)
🧠 NLP→SQL generation via OpenAI with dialect-aware validation
✅ Automatic retry (up to 5 times) to fix invalid SQL
🛡️ SQL policy verification (read-only SELECT/DESCRIBE/EXPLAIN/SHOW only)
📊 Execute queries and summarize results (detailed stats or aggregations)
🤖 Three orchestration modes:
- Direct Mode: Simple generate→validate→execute flow
- LangChain Agent: Multi-tool orchestrator using LangChain
- Full Agentic Pipeline: Complete step tracing, reasoning logs, and insights

Files

File	Purpose
`app.py`	Streamlit UI with database config, semantic model builder, orchestration mode selector
`db.py`	DB connection, introspection, schema sampling, and query execution helpers
`semantic.py`	Build semantic model enriched with table samples
`nlp_sql.py`	OpenAI SQL generation with `sqlglot` validation and retry logic
`agent.py`	SQL verifier: enforce read-only policy, reject CRUD/DDL
`insights.py`	Data summarization and narrative insight generation via OpenAI
`agent_chain.py`	LangChain-based tool orchestrator
`agentic_pipeline.py`	Full end-to-end pipeline with step tracing and reasoning
`test_agent.py`	Unit tests for agent functions (pytest)
`setup_sample_db.py`	Create sample SQLite database for testing
`dry_run.py`	Dry-run example without Streamlit
`requirements.txt`	Python dependencies

Quick Start

1. Setup

# Create virtualenv
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

2. Environment Variables

# Required: OpenAI API key
export OPENAI_API_KEY="sk-..."

# Optional: OpenAI model (default: gpt-4)
export OPENAI_MODEL="gpt-4"

# Configure database URLs (JSON map)
export DB_URLS='{"default":"sqlite:////workspaces/nattu22/data.db"}'

3. Create Sample Database (optional)

python setup_sample_db.py

4. Run Tests

pytest test_agent.py -v

5. Dry-Run Example

python dry_run.py

6. Run Streamlit App

streamlit run app.py

Then open http://localhost:8501 in your browser.

Usage Flow

Direct Mode

Introspect → Fetch database schema
Semantic → Build model with table samples
Generate SQL → OpenAI translates NL to SQL
Validate → Check syntax and policy
Execute → Run query with row limit
Summarize → Compute statistics or aggregations
Insights → OpenAI generates narrative

LangChain Agent Mode

Uses LangChain's ZERO_SHOT_REACT_DESCRIPTION agent
Exposes tools: generate_sql, validate_sql, execute_sql, narrative_insights
Agent decides which tools to use and in what order

Full Agentic Pipeline Mode (Recommended)

Step-by-step orchestration with full tracing
Each step includes:
- input_data: What the step received
- output_data: What the step produced
- reasoning: Why the step was executed
- duration_ms: Execution time
- error: Any error message
All traces stored and displayed in collapsible UI

Architecture

User Query (NL)
    ↓
[Introspect Schema] → Get DB structure
    ↓
[Build Semantic Model] → Enrich with samples
    ↓
[Generate SQL] → OpenAI + dialect-aware
    ↓
[Validate Syntax] → sqlglot parsing
    ↓
[Verify Policy] → Read-only check (agent.py)
    ↓
[Execute] → Run SELECT with row limit
    ↓
[Summarize] → describe() + percentiles or aggregates
    ↓
[Narrative] → OpenAI generates insights
    ↓
Result + Traces + Reasoning Logs

Security & Safety

✓ SQL validation with sqlglot (syntax parsing)
✓ Policy enforcement: SELECT/DESCRIBE/EXPLAIN/SHOW only (agent.py)
✓ Block CRUD/DDL keywords: INSERT, UPDATE, DELETE, DROP, ALTER, etc.
✓ Reject multiple statements (semicolon check)
✓ Comment stripping to prevent injection bypass
⚠️ For production: use read-only DB user, row limits, query timeouts, connection pooling, audit logging

Environment Setup Examples

Example 1: Local SQLite

export DB_URLS='{"test":"sqlite:////tmp/test.db"}'

Example 2: PostgreSQL

export DB_URLS='{"prod":"postgresql://user:pass@localhost:5432/mydb"}'

Example 3: Multiple databases

export DB_URLS='{"sqlite":"sqlite:////tmp/test.db","postgres":"postgresql://user:pass@db.example.com/prod"}'

Next Steps & Enhancements

Add conversation memory for multi-turn refinement
Integrate with more LLM providers (Claude, Llama, etc.)
Add query caching and results memoization
Implement row-level security (RLS) filters
Add audit logging and query history
Deploy with authentication (OAuth2, JWT)
Support for time-series and advanced SQL functions
Add chart generation from results (Plotly, Matplotlib)

License

Prototype / Educational Use

Note: This is a prototype. Review and harden the security layer before using with production databases. pip install -r requirements.txt cp .env.example .env

set OPENAI_API_KEY in .env

uvicorn main:app --host 0.0.0.0 --port 8000


Quick start (frontend)

1. Go to `frontend` and install dependencies (Node 18+ recommended):

```bash
cd frontend
npm install
VITE_BACKEND_URL="http://localhost:8000" npm run dev

Now open the frontend URL shown by Vite. Create a session, record voice or type messages, and interact with the agent.

Add authentication, query logging, and row-level limits.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
backend		backend
docs		docs
frontend		frontend
ARCHITECTURE.md		ARCHITECTURE.md
DEPLOYMENT.md		DEPLOYMENT.md
MANIFEST.md		MANIFEST.md
PIPELINE_GUIDE.md		PIPELINE_GUIDE.md
README.md		README.md
SUMMARY.md		SUMMARY.md
Untitled Diagram.drawio		Untitled Diagram.drawio
agent.py		agent.py
agent_chain.py		agent_chain.py
agentic_pipeline.py		agentic_pipeline.py
app.py		app.py
db.py		db.py
dry_run.py		dry_run.py
insights.py		insights.py
integration_examples.py		integration_examples.py
nlp_sql.py		nlp_sql.py
requirements.txt		requirements.txt
semantic.py		semantic.py
setup_sample_db.py		setup_sample_db.py
start.sh		start.sh
test_agent.py		test_agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP-to-SQL Streamlit Agent

Overview

Files

Quick Start

1. Setup

2. Environment Variables

3. Create Sample Database (optional)

4. Run Tests

5. Dry-Run Example

6. Run Streamlit App

Usage Flow

Direct Mode

LangChain Agent Mode

Full Agentic Pipeline Mode (Recommended)

Architecture

Security & Safety

Environment Setup Examples

Next Steps & Enhancements

License

set OPENAI_API_KEY in .env

nattu22

About

Uh oh!

Releases

Packages

Languages

nattu22/Agentic-ai-NLP2SQL

Folders and files

Latest commit

History

Repository files navigation

NLP-to-SQL Streamlit Agent

Overview

Files

Quick Start

1. Setup

2. Environment Variables

3. Create Sample Database (optional)

4. Run Tests

5. Dry-Run Example

6. Run Streamlit App

Usage Flow

Direct Mode

LangChain Agent Mode

Full Agentic Pipeline Mode (Recommended)

Architecture

Security & Safety

Environment Setup Examples

Next Steps & Enhancements

License

set OPENAI_API_KEY in .env

nattu22

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages