Conversation
…he current state of what application is built around
There was a problem hiding this comment.
Pull request overview
This pull request modernizes the database schema specification tool by restructuring the output from a single monolithic spec.json per engine/version to three specialized schemas: tables.json (AI-focused), snapshot/stored.json, and snapshot/working.json (CLI-focused). The changes include fully resolving engine config schemas with all $ref references inlined, eliminating the need for separate base/engine config files in output, and updating the schema map structure to directly map engines to their resolved config URLs.
Key Changes
- Splits each engine/version output into three schemas (tables, stored snapshot, working snapshot) optimized for different consumers (AI vs CLI)
- Implements fully-resolved, self-contained engine config schemas with all references inlined
- Updates schema map structure with nested URLs for tables and snapshot schemas
- Enhances JSON reference resolver with JSON pointer support for local and external references
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_resolver.py | Comprehensive test suite for JSON reference resolver with circular reference detection and JSON pointer resolution |
| tests/test_output_manager.py | Updated tests to validate new three-schema output structure and resolved engine configs |
| tests/test_integration_production.py | Integration tests updated for new file structure (tables.json, snapshot/stored.json, snapshot/working.json) |
| tests/test_config.py | Config tests updated with custom test config to avoid .env file loading |
| tests/conftest.py | Test fixtures updated to reflect new schema structure with references to engine-specific configs |
| docs/schemas/project/manifest.json | Refactored manifest schema to use map structure with snapshot IDs as keys, added sync-state and content-hash fields |
| docs/schemas/project/config/engines/postgresql.json | Simplified PostgreSQL config with fewer connection parameters, added target-schema and env-file fields |
| docs/schemas/project/config/base.json | Complete rewrite to multi-schema support with conditional engine-specific references |
| docs/schemas/engines/postgresql/v15.0/tables.json | New tables array schema for AI agent consumption |
| docs/schemas/engines/postgresql/v15.0/snapshot/*.json | New stored and working snapshot schemas for CLI operations |
| docs/schemas/engines/postgresql/v15.0/components/*.json | Added minimum: 1 constraints to ID fields, formatting improvements |
| database_schema_spec/resolution/resolver.py | Enhanced resolver with JSON pointer support, file caching, and improved circular reference detection |
| database_schema_spec/io/output_manager.py | Refactored to support new schema structure with write_engine_schema and write_resolved_engine_config methods |
| database_schema_spec/core/config.py | Updated file name patterns for tables and snapshot schemas |
| database_schema_spec/cli/generator.py | Generator updated to produce three schemas per variant and fully-resolved engine configs |
| README.md | Documentation updated to explain new user project structure and output organization |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "properties": { | ||
| "environments": { | ||
| "envs": { | ||
| "$ref": "#/$defs/envs" | ||
| } | ||
| }, | ||
| "required": ["envs"], |
There was a problem hiding this comment.
The required fields list includes "envs" but the actual property definition at line 7 wraps it in an "envs" property that references the definition. This creates a nested structure where users would need { "envs": { "prod": {...} } } instead of directly having environments. Consider whether the top-level "envs" property wrapper is intended, or if the schema should directly use the $defs/envs pattern without the wrapper to allow direct environment definitions.
| # Create base config schema with reference to engine-specific envs | ||
| base_config = { | ||
| "$schema": "http://json-schema.org/draft-07/schema#", | ||
| "title": "Base Project Configuration", | ||
| "type": "object", | ||
| "properties": { | ||
| "schema_id": {"type": "string", "format": "uuid"}, | ||
| "database": { | ||
| "$defs": { | ||
| "schemaDefinition": { | ||
| "type": "object", | ||
| "properties": {"engine": {"type": "string"}}, | ||
| "required": ["engine"], | ||
| }, | ||
| "allOf": [ | ||
| { | ||
| "properties": { | ||
| "engine": {"type": "string"}, | ||
| } | ||
| }, | ||
| { | ||
| "if": { | ||
| "properties": {"engine": {"const": "PostgreSQL"}}, | ||
| "required": ["engine"], | ||
| }, | ||
| "then": { | ||
| "properties": { | ||
| "envs": {"$ref": "engines/postgresql.json#/$defs/envs"} | ||
| } | ||
| }, | ||
| }, | ||
| { | ||
| "if": { | ||
| "properties": {"engine": {"const": "MySQL"}}, | ||
| "required": ["engine"], | ||
| }, | ||
| "then": { | ||
| "properties": { | ||
| "envs": {"$ref": "engines/mysql.json#/$defs/envs"} | ||
| } | ||
| }, | ||
| }, | ||
| ], | ||
| } | ||
| }, |
There was a problem hiding this comment.
The conftest fixture references "schemas/project/config/base.json" but the actual schema file shows it should be using the engine-specific pattern with "engines/postgresql.json#/$defs/envs". The test fixture should align with the actual schema structure to ensure tests validate the correct behavior.
| "pattern": "^[a-z][a-z0-9_]*$", | ||
| "minLength": 1, | ||
| "maxLength": 63, | ||
| "minLength": 1, "maxLength": 63, |
There was a problem hiding this comment.
The whitespace appears to have been accidentally removed from the end of line 83, causing lines 83-84 to run together. This formatting inconsistency should be fixed to maintain proper spacing between the minLength constraint and the maxLength constraint.
| "minLength": 1, "maxLength": 63, | |
| "minLength": 1, | |
| "maxLength": 63, |
| { | ||
| "if": { | ||
| "properties": { | ||
| "engine": { "const": "PostgreSQL" } | ||
| }, | ||
| "required": ["engine"] | ||
| }, | ||
| "then": { | ||
| "properties": { | ||
| "envs": { | ||
| "$ref": "engines/postgresql.json#/$defs/envs" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| ], |
There was a problem hiding this comment.
The base config schema references engine-specific config files using conditional logic based on engine type, but there's no MySQL branch included even though MySQL is referenced elsewhere in the codebase (e.g., in tests and registry). If MySQL support is intended, add a conditional block for MySQL similar to the PostgreSQL block. If MySQL isn't supported yet, consider removing MySQL references from test fixtures to avoid confusion.
This pull request introduces a significant update to the schema generation and output structure for the database schema documentation tool. The main improvements include generating multiple output schemas per database engine/version (instead of a single monolithic spec), fully resolving and flattening engine config schemas for easier consumption, and updating all documentation and code to reflect the new schema organization. These changes make the tool more modular, AI/CLI-friendly, and easier to integrate with user projects.
Key changes:
1. Output Structure & Schema Generation
tables.json(for AI agents),snapshot/stored.json, andsnapshot/working.json(for CLI), instead of a singlespec.json. The code and documentation are updated to reflect this new structure. [1] [2] [3] [4] [5] [6] [7] [8]2. Fully-Resolved Engine Config Schemas
config/postgresql.json), with all$refreferences inlined. This simplifies downstream usage and eliminates the need for separate base and engine config files in the output. [1] [2] [3] [4]3. Schema Map (
smap.json) Improvementstablesandsnapshotschemas, and each engine to its resolved config schema, making discovery and integration more straightforward. [1] [2] [3] [4]4. Documentation Updates
README.mdfiles are updated to document the new output structure, user project structure, and the improved schema map. This includes clear explanations and examples for users on how to organize their projects and consume the generated schemas. [1] [2] [3] [4]5. Code Refactoring & Internal Changes
These changes modernize the schema tool's output, improve modularity, and make it easier for both AI and CLI consumers to work with the generated schemas.