Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 68 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,36 @@

A Python package for generating unified JSON documentation files for database schemas by resolving JSON Schema references and handling oneOf variants. This tool processes modular database schema specifications and generates consolidated documentation for different database engines and versions.

## User Project Structure

The generated schemas are designed to validate user projects with this structure:

```
my-project/
├── .bfloo/ # Hidden config directory (like .git)
│ ├── config.yml # All schemas configuration
│ ├── orders/ # Schema: "orders"
│ │ ├── manifest.yml # Snapshot registry
│ │ └── 2024-01-15_v1.0.0.yml # Snapshot files
│ ├── users/ # Schema: "users"
│ │ └── manifest.yml
│ └── analytics/ # Schema: "analytics"
│ └── manifest.yml
├── schemas/ # Custom directory (via dir: "schemas")
│ ├── orders.yml # Working schema for "orders"
│ └── users.yml # Working schema for "users"
└── db-schemas/
└── analytics.yml # Working schema at root (dir omitted)
```

**Key concepts:**

- **Schema names are user-defined** - `orders`, `users`, `analytics`, etc.
- **Flat structure** - Each schema is a top-level entry (no nested hierarchy)
- **One manifest per schema** - Each schema has its own snapshot history in `.bfloo/<schema>/`
- **Configurable working directory** - Use `dir` to specify where `<schema>.yml` is stored (default: `.db-schemas/`)
- **Per-schema API keys** - Each schema has its own API key for sync

## 🚀 Quick Start

### Prerequisites
Expand Down Expand Up @@ -76,24 +106,28 @@ database-schema-spec/
│ ├── project/
│ │ ├── manifest.json # Snapshot manifest schema
│ │ └── config/
│ │ ├── base.json # Common config schema
│ │ ├── base.json # Common config schema (with $defs)
│ │ └── engines/
│ │ └── postgresql.json # PostgreSQL connection config
│ │ └── postgresql.json # PostgreSQL-specific config (references base.json)
│ └── engines/
│ └── postgresql/
│ └── v15.0/ # Version-specific spec
│ ├── spec.json
│ └── v15.0/ # Version-specific schemas
│ ├── tables.json # Tables array schema (AI-focused)
│ ├── snapshot/
│ │ ├── stored.json # Stored snapshot schema
│ │ └── working.json # Working snapshot schema
│ └── components/
└── output/ # Generated output files
├── smap.json # Schema map (discovery file)
├── manifest.json # Manifest schema with $id
├── config/
│ ├── base.json # Base config with $id
│ └── engines/
│ └── postgresql.json # PostgreSQL config with $id
│ └── postgresql.json # Fully-resolved PostgreSQL config (self-contained)
└── postgresql/
└── v15.0/
└── spec.json # Fully resolved spec with $id
├── tables.json # Tables array schema (AI-focused)
└── snapshot/
├── stored.json # Stored snapshot schema (CLI)
└── working.json # Working snapshot schema (CLI)
```

## 🧪 Development
Expand Down Expand Up @@ -168,33 +202,41 @@ output/
├── smap.json # Schema map for discovery
├── manifest.json # Manifest schema
├── config/
│ ├── base.json # Base config schema
│ └── engines/
│ └── postgresql.json # PostgreSQL config schema
│ └── postgresql.json # Fully-resolved PostgreSQL config (self-contained)
└── postgresql/
└── v15.0/
└── spec.json # PostgreSQL 15.0 spec
├── tables.json # Tables array schema (AI-focused)
└── snapshot/
├── stored.json # Stored snapshot schema (CLI)
└── working.json # Working snapshot schema (CLI)
```

**Note:** Each engine config file (e.g., `postgresql.json`) is fully resolved with all `$ref` references inlined, making it completely self-contained. This eliminates the need for separate `base.json` and engine-specific files in the output.

### Schema Map (smap.json)

The schema map provides a structured index of all generated schemas:

```json
{
"project": {
"manifest": "https://example.com/schemas/manifest.json",
"config": {
"base": "https://example.com/schemas/config/base.json",
"engines": {
"postgresql": "https://example.com/schemas/config/engines/postgresql.json"
}
}
},
"engines": {
"postgresql": {
"v15.0": "https://example.com/schemas/postgresql/v15.0/spec.json"
}
}
"project": {
"manifest": "https://example.com/schemas/manifest.json",
"config": {
"postgresql": "https://example.com/schemas/config/postgresql.json"
}
},
"engines": {
"postgresql": {
"v15.0": {
"tables": "https://example.com/schemas/postgresql/v15.0/tables.json",
"snapshot": {
"stored": "https://example.com/schemas/postgresql/v15.0/snapshot/stored.json",
"working": "https://example.com/schemas/postgresql/v15.0/snapshot/working.json"
}
}
}
}
}
```

The `config` section maps engine names directly to their fully-resolved schema URLs, making it easy to fetch the appropriate config schema for any supported database engine.
112 changes: 59 additions & 53 deletions database_schema_spec/cli/generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,34 +94,22 @@ def generate_all_variants(self) -> list[Path]:
# Collect unique engine names for config generation
engines: list[str] = list({v.engine for v in variants})

# Generate schema for each variant
# Generate schemas for each variant
generated_files: list[Path] = []
for variant in variants:
logger.info("Generating schema for %s %s", variant.engine, variant.version)
file_path = self.generate_variant(variant)
generated_files.append(file_path)
logger.info("Generating schemas for %s %s", variant.engine, variant.version)
file_paths = self.generate_variant(variant)
generated_files.extend(file_paths)

# Generate project schemas
logger.info("Generating project schemas...")

# Generate base config schema
base_config_path = self.output_manager.write_project_schema(
config.file_names.project_config_base_schema,
"config/base.json",
config.base_url,
)
generated_files.append(base_config_path)
logger.info("Base config schema written to: %s", base_config_path)

# Generate engine-specific config schemas
# Generate fully-resolved engine config schemas
for engine in engines:
engine_lower = engine.lower()
source_path = config.file_names.project_config_engine_pattern.format(
engine=engine_lower
)
output_path = f"config/engines/{engine_lower}.json"
engine_config_path = self.output_manager.write_project_schema(
source_path, output_path, config.base_url
engine_config_path = self.output_manager.write_resolved_engine_config(
engine,
config.file_names.project_config_base_schema,
config.base_url,
)
generated_files.append(engine_config_path)
logger.info(
Expand All @@ -143,51 +131,69 @@ def generate_all_variants(self) -> list[Path]:

return generated_files

def generate_variant(self, variant: DatabaseVariantSpec) -> Path:
"""Generate unified schema for a specific database variant.
def generate_variant(self, variant: DatabaseVariantSpec) -> list[Path]:
"""Generate unified schemas for a specific database variant.

Generates three schema files per variant:
- tables.json: Tables array schema (for AI agents)
- snapshot/stored.json: Stored snapshot schema (for CLI)
- snapshot/working.json: Working snapshot schema (for CLI)

Args:
variant: Database variant to generate schema for

Returns:
Path where the schema was written
List of paths where schemas were written
"""
# Build path to engine-specific spec file
spec_path = config.file_names.engine_spec_pattern.format(
engine=variant.engine.lower(),
version=variant.version,
)
generated_files: list[Path] = []

# Schema types to generate: (source_pattern_attr, output_type)
schema_types = [
("engine_tables_pattern", "tables"),
("engine_snapshot_stored_pattern", "snapshot/stored"),
("engine_snapshot_working_pattern", "snapshot/working"),
]

for pattern_attr, schema_type in schema_types:
# Build path to source schema file
pattern = getattr(config.file_names, pattern_attr)
source_path = pattern.format(
engine=variant.engine.lower(),
version=variant.version,
)

# Create a variant-aware resolver and load the spec directly
variant_resolver = JSONRefResolver(self.docs_path, variant)
unified_schema = variant_resolver.resolve_file(spec_path)
# Create a variant-aware resolver and load the schema
variant_resolver = JSONRefResolver(self.docs_path, variant)
unified_schema = variant_resolver.resolve_file(source_path)

# Inject dynamic $id derived from BASE_URL for the final output
id_field = config.json_schema_fields.id_field
schema_field = config.json_schema_fields.schema_field
spec_url = self.output_manager._get_spec_url(
variant.engine, variant.version, config.base_url
)
# Inject dynamic $id derived from BASE_URL for the final output
id_field = config.json_schema_fields.id_field
schema_field = config.json_schema_fields.schema_field
schema_url = self.output_manager._get_engine_schema_url(
variant.engine, variant.version, schema_type, config.base_url
)

# Set/override $id
unified_schema[id_field] = spec_url
# Set/override $id
unified_schema[id_field] = schema_url

# Reorder top-level keys to ensure `$id` appears immediately after `$schema`
unified_schema = self._reorder_schema_keys(
unified_schema, id_field, schema_field
)
# Reorder top-level keys to ensure `$id` appears immediately after `$schema`
unified_schema = self._reorder_schema_keys(
unified_schema, id_field, schema_field
)

# Validate the resulting schema
validation_result = self.validator.validate_schema(unified_schema)
if not validation_result.is_valid:
raise ValidationError(validation_result.errors)
# Validate the resulting schema
validation_result = self.validator.validate_schema(unified_schema)
if not validation_result.is_valid:
raise ValidationError(validation_result.errors)

# Write the schema to output file
output_path = self.output_manager.write_schema(
unified_schema, variant.engine, variant.version
)
# Write the schema to output file
output_path = self.output_manager.write_engine_schema(
unified_schema, variant.engine, variant.version, schema_type
)
generated_files.append(output_path)
logger.info(" %s schema written to: %s", schema_type, output_path)

return output_path
return generated_files

def _reorder_schema_keys(
self, schema: dict, id_field: str, schema_field: str
Expand Down
9 changes: 8 additions & 1 deletion database_schema_spec/core/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,14 @@ class FileNamesConfig(BaseModel):
"""Configuration for file names."""

database_registry_file: str = "schemas/_registry_.json"
engine_spec_pattern: str = "schemas/engines/{engine}/{version}/spec.json"
# Engine schema patterns (tables for AI, snapshot schemas for CLI)
engine_tables_pattern: str = "schemas/engines/{engine}/{version}/tables.json"
engine_snapshot_stored_pattern: str = (
"schemas/engines/{engine}/{version}/snapshot/stored.json"
)
engine_snapshot_working_pattern: str = (
"schemas/engines/{engine}/{version}/snapshot/working.json"
)
project_config_base_schema: str = "schemas/project/config/base.json"
project_config_engine_pattern: str = "schemas/project/config/engines/{engine}.json"
project_manifest_schema: str = "schemas/project/manifest.json"
Expand Down
Loading