This repository demonstrates how to build a text-to-SQL system using Lamini's Memory Experiment framework. The system generates training data, fine-tunes a language model, and evaluates its performance at converting natural language questions to SQL queries for a bakery database.
This repository provides a complete pipeline for creating a custom text-to-SQL model that understands domain-specific terminology and database structure.
The pipeline includes:
- Data Generation: Create diverse training examples from a small set of sample questions
- Data Validation: Ensure generated SQL queries are valid and executable
- Coverage Analysis: Identify and fill gaps in SQL concept coverage
- Memory Tuning: Fine-tune a language model on the generated data
- Performance Evaluation: Test the model against evaluation set
This example uses the Bakery dataset from the Spider benchmark collection. The dataset contains information about sales for a small bakery shop, with the following structure:
The database consists of four tables:
Id: Unique customer identifierLastName: Customer's last nameFirstName: Customer's first name
Id: Unique identifier of the baked goodFlavor: Flavor/type (e.g., "chocolate", "lemon")Food: Category (e.g., "cake", "tart")Price: Price in dollars
Reciept: Receipt number (foreign key to receipts.RecieptNumber)Ordinal: Position of the purchased item on the receiptItem: Identifier of the item purchased (foreign key to goods.Id)
RecieptNumber: Unique identifier of the receiptDate: Date of purchase (DD-MM-YYYY format)CustomerId: Customer ID (foreign key to customers.Id)
- Python 3.7+
- Lamini API key (get yours at app.lamini.ai)
- SQLite database with the bakery schema
-
Clone this repository:
git clone https://github.com/lamini-ai/txt2sql-examples.git cd txt2sql-examples -
Install required packages:
pip install lamini
-
Fill the
config.ymlfile
The repository contains the following core scripts:
Generates training questions and their corresponding SQL queries based on your evaluation set.
Analyzes concept coverage and generates additional questions for missing SQL concepts. This step is optional but recommended for comprehensive coverage.
Fine-tunes a language model on the generated data using Lamini's Memory Tuning feature.
Evaluates the fine-tuned model by comparing generated queries against gold standard queries.
Contains utility functions used across scripts:
- Reading and writing JSONL files
- Getting user input with default values
- Formatting database schema and glossary
- Processing variations and saving results
Create a glossary file that defines domain-specific terms and provides additional context about the database
Prepare an evaluation set with example questions and their corresponding SQL queries
python generate_data.pyThis script:
- Takes a test set of questions and their corresponding SQL queries
- Generates pattern-based variations of questions (similar questions with different patterns)
- Creates structural variations (questions with different structures but similar intent)
- Decomposes complex questions into simpler sub-questions
- Ensures all generated variations include their corresponding SQL queries
- Outputs two files:
- flattened.jsonl (only the VALID question and query pairs)
- nested_results.jsonl (detailed information including question, query, validation status, original question, and original query)
python analyze_generated_data.pyThis script (optional but recommended):
- Takes the flattened.jsonl and nested_results.jsonl as input
- Identifies SQL concepts not covered in the generated data
- Generates 2 additional questions for each missing concept
- Validates the SQL queries and passes them through a debugger if needed
- Collects all failed queries for analysis
- Provides a detailed analysis of why certain queries failed
- Outputs additional_questions.jsonl for enhancing your training data. NOTE: Copy this file and append to flattened.jsonl if you would like to use the additional questions
python memory_tuning.pyThis script:
- Takes the flattened JSONL file of question-SQL pairs
- Configures training parameters like max_steps, learning_rate, etc.
- Submits the tuning job to Lamini's API
- Displays the tuning job information
NOTE: After the job completes, you'll need to get the model ID from the Lamini app environment for the next step.
python run_inference.pyThis script:
- Reads test questions from your evaluation JSONL file
- Runs inference using your fine-tuned model to generate SQL queries
- Executes both the generated queries and gold standard queries
- Compares the results to determine functional equivalence
- Generates a detailed performance report with metrics and error analysis
- Produces three output files:
- inference_results.json: Raw inference outputs
- analysis_results_with_data.json: Detailed comparison data
- analysis_report.md: Human-readable performance summary
To use this pipeline, you'll need to prepare:
- Database: SQLite database with the bakery schema
- Test/Evaluation Set: JSONL file with example questions and their gold SQL queries
- Glossary: JSONL file with term definitions to help the model understand domain-specific terminology
Running the pipeline will generate:
- Nested Results: Generated questions and SQL stored in a JSONL file along with the original questions, SQL queries, and validation information
- Flattened Results: Generated questions and SQL pairs in a simpler JSONL format for training
- Tuned Model: A model ID that can be used for inference
- Evaluation Results: JSON file with detailed comparison of generated queries vs. gold standard
- Analysis Report: Markdown report summarizing the model's performance