GitHub - DylanSteinecke/nl_to_sql_to_figure: 2025 - 2026: A RAG LLM pipeline that processes a natural language query, uses RAG to curate data from an SQL database (In Progress), and should analyze the data per the user's request. Runs on transformer language models.

(Work in progress / Incomplete)

Pipeline Overview

This pipeline is intended for the use case where there is an SQL database and a user wants to know some summary about the data. From a technical perspective, this takes in a natural language query (e.g., "Which artist sold the most records last year?"), embeds it in the same space as automatatically generated SQL column descriptions are embedded, retrieves the most similar column descriptions, provides these column descriptions plus the original user query to a NL-to-SQL language model, generates an SQL query to retrieve that data (Done but query is unreliable), and determines which code to run or generate for the analysis needed to satisfy the user's original question.

How To Run

Currently the best way to run it is through the scratch.ipynb notebook. There are a few places that are scratch work exploring the data, but there is a "main" for running the pipeline. See if you can get it to generate another SQL query based on your question. Inspect the top columns retrieved if you want.

To Do:

Better define the most relevant documents to retrieve (e.g., re-rank the top results, pick a better threshold or dynamically set it)
Implemente guardrails for SQL query safety and accuracy (e.g., improve the RAG process, explore agentic approach to iterate on the query, improve document descriptions)
Implement data-to-analysis part of the pipeline
Try to explicitly code some of the above and then extend to implement a LangGraph version. Use tools or skills for the analyses/plots.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
app.log		app.log
create_sql_query.py		create_sql_query.py
db_path		db_path
embed_documents_into_vector_db.py		embed_documents_into_vector_db.py
generate_schema_documents.py		generate_schema_documents.py
scratch.ipynb		scratch.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

(Work in progress / Incomplete)

Pipeline Overview

How To Run

To Do:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

(Work in progress / Incomplete)

Pipeline Overview

How To Run

To Do:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages