Self-improving SQL query optimizer using ReSTEM (Reward-optimized Self-Training for Executable Models).
Generate large training dataset using ReSTEM loop:
- Sample seed examples for few-shot prompting
- Generate optimization candidates with LLM (GPT-4o-mini)
- Execute and evaluate on real SQLite database
- Filter by reward threshold (correctness + speedup)
- Add successful examples back to training set
- Repeat with improved few-shot examples
Fine-tune Qwen on generated dataset:
- Input: schema + slow_query
- Output: fast_query + explanation
Use ReSTEM rewards to RL-train Qwen via PPO/DPO
quill/
├── evaluator.py # SQL performance evaluator (correctness + speedup)
├── llm_judge.py # LLM-as-Judge for readability scoring
├── restem_optimizer.py # ReSTEM self-improving loop
scripts/
├── seed_collector.py # Generate test database (10k users, 50k orders)
├── train_restem.py # Multi-iteration training with metrics
├── analyze_training.py # Analyze metrics and export for fine-tuning
examples/
├── test_evaluation.py # Test evaluator on seed data
├── test_llm_judge.py # Test readability judge
data/
├── seed_data.json # 20 hand-crafted seed examples
├── test.db # SQLite test database (gitignored)
└── restem_training_data.json # Generated training data (gitignored)
pip install -r requirements.txt
cp .env.example .env # Add your OPENAI_API_KEYPYTHONPATH=. python3 scripts/seed_collector.pyPYTHONPATH=. python3 examples/test_evaluation.pyPYTHONPATH=. python3 quill/restem_optimizer.pyPYTHONPATH=. python3 scripts/train_restem.pyThis will:
- Run 50 iterations
- Generate 5 candidates per iteration
- Track metrics (success rate, rewards, diversity)
- Save checkpoints every 10 iterations
- Output:
data/restem_training_data.json(100+ examples)
PYTHONPATH=. python3 scripts/analyze_training.pyPYTHONPATH=. python3 scripts/analyze_training.py exportOutput: data/finetuning_dataset.jsonl in OpenAI fine-tuning format
reward = speedup_reward + readability_bonus
speedup_reward = min(1.0, log(speedup + 1) / log(10))
- 1x speedup → 0.3 reward
- 10x speedup → 1.0 reward
- 100x speedup → 1.0 reward (capped)
readability_bonus = [-0.2, +0.2]
- LLM judges which query is more readable
- Bonus if optimized is more readable
- Penalty if optimized is less readable
- indexing: Add indexes (CREATE INDEX)
- join: Replace subqueries with JOINs
- projection: SELECT specific columns instead of *
- limit: Add LIMIT for top-N queries
- redundancy: Remove duplicate computations
Edit scripts/train_restem.py to customize:
num_iterations: Number of ReSTEM loops (default: 50)candidates_per_iteration: Candidates per loop (default: 5)reward_threshold: Minimum reward to accept (default: 0.5)timeout_seconds: Query timeout (default: 10s)