Generation of database schemas from natural language prompts using the validation loop pattern.
schema_gen is a Python library that uses Large Language Models (LLMs) to generate SQLAlchemy database schemas from natural language descriptions. It employs a validation loop pattern to ensure the generated schemas are valid and can be directly converted to working SQLAlchemy ORM classes.
- Natural Language to Schema: Describe your database structure in plain English and get a complete schema
- Validation Loop Pattern: Automatically retries LLM generation if validation fails, ensuring correct output
- Type Safety: Uses Pydantic v2 models for schema representation with full type checking
- SQLAlchemy Integration: Seamless conversion to SQLAlchemy ORM classes and Table objects
- Comprehensive Validation: Multi-level validation including:
- Pydantic field validation
- Column type parameter validation
- Foreign key reference validation
- Circular dependency detection
- Primary key requirements
- Database Support: Works with PostgreSQL, MySQL, SQLite, and other SQLAlchemy-supported databases
- Round-trip Conversion: Convert between Pydantic schema models and SQLAlchemy ORM classes
cd schema_gen
poetry installFor PostgreSQL support with additional features:
poetry install --extras postgrespip install -e .from langchain_openai import ChatOpenAI
from schema_gen import generate_schema_from_prompt
# Initialize LLM
llm = ChatOpenAI(model="gpt-4o")
# Generate schema from natural language
schema = generate_schema_from_prompt(
prompt="""
Create a blog schema with:
- Users who can write posts
- Posts with title, content, and timestamps
- Comments on posts
""",
language_model=llm,
)
# Convert to SQLAlchemy ORM classes
orm_classes = schema.to_orm_classes()
# Use the generated classes
User = orm_classes["User"]
Post = orm_classes["Post"]
Comment = orm_classes["Comment"]from sqlalchemy import create_engine
# Create database
engine = create_engine("sqlite:///blog.db")
# Create tables from generated schema
Base = list(orm_classes.values())[0].__bases__[0]
Base.metadata.create_all(engine)See example.py for a complete working example that:
- Generates a project management schema from a natural language prompt
- Creates a SQLite database
- Populates it with sample data
- Demonstrates queries using the generated ORM classes
Run it with:
export OPENAI_API_KEY='your-api-key-here'
poetry run python example.pyDatabaseSchema: Top-level container for the entire database schemaTableDefinition: Defines a single table with columns, constraints, and relationshipsColumnDefinition: Defines a column with type, constraints, and default valuesForeignKeyDefinition: Foreign key constraint with ON DELETE/UPDATE actionsIndexDefinition: Index definition with type (BTREE, HASH, etc.)UniqueConstraint: Multi-column unique constraintCheckConstraint: Check constraint with SQL expression
SQLColumnType: Supported column types (INTEGER, STRING, TIMESTAMP, JSON, etc.)ForeignKeyAction: Actions for foreign keys (CASCADE, SET NULL, RESTRICT, etc.)IndexType: Database index types (BTREE, HASH, GIST, GIN)
generate_schema_from_prompt(): Main function that uses LLM with validation looppost_process_schema(): Validates and post-processes generated schemas- Automatic retry logic if Pydantic validation or ORM conversion fails
create_database(): Create PostgreSQL database if it doesn't existcreate_sqlalchemy_engine(): Create SQLAlchemy engine from credentialssetup_database(): Complete database initialization with table creation
The validation loop ensures generated schemas are correct:
- Prompt Construction: Natural language description is formatted for the LLM
- LLM Generation: Uses
structured_output_with_retriesfrom MotleyCrew - Pydantic Validation: Automatic validation of field types and constraints
- Post-Processing: Additional validation:
- Verify foreign key references point to existing tables
- Check for circular dependencies
- Validate column type parameters
- Ensure all tables have primary keys
- ORM Conversion: Attempt to convert to SQLAlchemy ORM classes
- Retry: If any validation fails, the error message is sent back to the LLM for correction
def generate_schema_from_prompt(
prompt: str,
language_model: BaseLanguageModel,
max_retries: int = 3,
) -> DatabaseSchema:
"""
Generate a database schema from a natural language prompt.
Args:
prompt: Natural language description of the desired schema
language_model: LangChain LLM to use for generation
max_retries: Maximum number of retry attempts if validation fails
Returns:
DatabaseSchema: Validated schema that can be converted to ORM classes
Raises:
ValueError: If schema generation fails after max_retries
"""class DatabaseSchema(BaseModel):
"""Complete database schema definition."""
tables: List[TableDefinition]
def to_orm_classes(self) -> Dict[str, Type]:
"""Convert schema to SQLAlchemy ORM classes."""
def to_sqlalchemy_tables(self, metadata: MetaData) -> Dict[str, Table]:
"""Convert schema to SQLAlchemy Table objects."""class TableDefinition(BaseModel):
"""Single table definition."""
name: str
columns: List[ColumnDefinition]
foreign_keys: List[ForeignKeyDefinition] = []
indexes: List[IndexDefinition] = []
unique_constraints: List[UniqueConstraint] = []
check_constraints: List[CheckConstraint] = []The library works with any LangChain-compatible LLM:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(model="claude-sonnet-4-5-20250929")Any LangChain BaseLanguageModel implementation will work with the appropriate setup.
OPENAI_API_KEY: Required for OpenAI modelsANTHROPIC_API_KEY: Required for Anthropic models
Requires Python 3.11 or higher.
from schema_gen import DatabaseSchema
from schema_gen.validators import post_process_schema
# Generate schema
schema = generate_schema_from_prompt(prompt, llm)
# Additional custom validation
schema = post_process_schema(schema)
# Your custom checks
for table in schema.tables:
if not table.columns:
raise ValueError(f"Table {table.name} has no columns")from schema_gen import DBCredentials, create_database
# Create PostgreSQL database
credentials = DBCredentials(
user="postgres",
password="password",
host="localhost",
port="5432",
database="mydb",
engine="postgres",
)
engine, created = create_database(credentials, delete_if_exists=False, orm_classes)prompt = """
Create an e-commerce schema with:
- Customers with contact information
- Products with SKU, price, and inventory
- Orders placed by customers
- Order items linking orders to products
- Payments for orders
"""
schema = generate_schema_from_prompt(prompt, llm)prompt = """
Create a multi-tenant SaaS schema with:
- Organizations (tenants)
- Users belonging to organizations
- Subscriptions for organizations
- Usage metrics tracked per organization
"""
schema = generate_schema_from_prompt(prompt, llm)See LICENSE file for details.
This library was extracted from the Storyline project for standalone use. Contributions are welcome.
- pydantic (>=2.11.7) - Schema validation and type safety
- sqlalchemy (>=2.0.36) - Database ORM and query building
- langchain-core - LLM interface abstraction
- langchain-openai - OpenAI integration
- motleycrew (>=0.3.4) - Validation loop with structured output
- psycopg2-binary (optional) - PostgreSQL support