Skip to content

feat: add schema and record entry types with JSON Schema validation#287

Open
cmeans-claude-dev[bot] wants to merge 24 commits intomainfrom
feat/schema-record-entry-types
Open

feat: add schema and record entry types with JSON Schema validation#287
cmeans-claude-dev[bot] wants to merge 24 commits intomainfrom
feat/schema-record-entry-types

Conversation

@cmeans-claude-dev
Copy link
Copy Markdown
Contributor

Summary

Adds two new EntryType values — schema and record — with JSON Schema Draft 2020-12 validation on write. Schemas live per-owner with a shared _system fallback namespace for built-in shapes; records pin to an exact (schema_ref, schema_version) and get re-validated on content update. Schema deletion is blocked while live records still reference the version. A new CLI tool (mcp-awareness-register-schema) gives operators a path to seed _system-owned schemas at deploy time. The internal _error_response() helper now accepts **extras, so all new structured error envelopes flow through the same path.

Closes #208.

Spec: docs/superpowers/specs/2026-04-13-schema-record-entry-types-design.md
Plan: docs/superpowers/plans/2026-04-13-schema-record-entry-types-plan.md

What's new

  • New MCP tools: register_schema, create_record
  • New CLI: mcp-awareness-register-schema --system ... — operator-only, bypasses MCP
  • New migration: idempotent _system user seed
  • New runtime dep: jsonschema>=4.26.0,<5
  • New helpers: validation.validate_schema_body, validation.validate_record_content, validation.resolve_schema, validation.assert_schema_deletable, validation.compose_schema_logical_key, SchemaInUseError
  • New Store methods: find_schema(owner_id, logical_key) (with _system fallback), count_records_referencing(owner_id, schema_logical_key) (returns (count, up_to_10_ids))
  • Extended: update_entry branches on type — schemas always reject (schema_immutable), records re-validate content on update; delete_entry blocks schema deletion when records reference it (schema_in_use); _error_response() now accepts **extras for structured error payloads

What's explicitly out of scope

  • Secrets (x-secret encryption, one-time token web form, edge decrypt endpoint) — separate follow-up PR per the spec's implementation order (steps 3–6).
  • Admin-via-MCP — no is_admin column on users; system writes are CLI-only for now.
  • Cross-schema $ref resolution via referencing.Registry — deferred until a real use case demands it.
  • Validator caching — per-write construction; cache later if throughput demands it.
  • Bulk-delete paths (delete_entry by tags/source) — not protected against live records. Single-id deletion is protected. Bulk protection is a known concern worth its own follow-up.
  • Generic create_entry — kept type-specific tools per existing convention; a future refactor across all write tools could collapse them, but that's not this PR.

Deployment

After merge + Docker image rebuild:

  1. Pull + restart holodeck LXCs (production) and the QA instance.
  2. Run mcp-awareness-migrate in each environment to apply the _system user seed.
  3. Operator runs mcp-awareness-register-schema --system ... per built-in schema, gradually as schemas are authored.
  4. No re-embed needed.

QA

Prerequisites

  • pip install -e ".[dev]"
  • Deploy to QA test instance on alternate port (AWARENESS_PORT=8421) via docker-compose.qa.yaml.
  • Run mcp-awareness-migrate against the QA DB to apply the _system user seed migration.

Manual tests (via MCP tools unless otherwise noted)

    • Register a schema (happy path)
    register_schema(source="qa-test", tags=["qa"], description="qa test schema",
                    family="schema:qa-thing", version="1.0.0",
                    schema={"type": "object", "properties": {"name": {"type": "string"}}, "required": ["name"]})
    

    Expected: {"status":"ok","id":"<uuid>","logical_key":"schema:qa-thing:1.0.0"}

    • Reject invalid schema (meta-schema check)
    register_schema(source="qa-test", tags=[], description="bad",
                    family="schema:bad", version="1.0.0",
                    schema={"type": "strng"})
    

    Expected: structured error with code: "invalid_schema" in the payload.

    • Reject duplicate family+version
      Re-run step 1 exactly. Expected: code: "schema_already_exists" with logical_key in the extras.
    • Reject empty family or version
    register_schema(source="qa-test", tags=[], description="bad", family="", version="1.0.0", schema={"type":"object"})
    

    Expected: code: "invalid_parameter" with param: "family".

    • Create a valid record
    create_record(source="qa-test", tags=[], description="a qa thing",
                  logical_key="qa-rec-1", schema_ref="schema:qa-thing", schema_version="1.0.0",
                  content={"name": "widget"})
    

    Expected: {"status":"ok","id":"<uuid>","action":"created"}

    • Reject record with invalid content (all errors surfaced)
    create_record(source="qa-test", tags=[], description="bad record",
                  logical_key="qa-rec-bad", schema_ref="schema:qa-thing", schema_version="1.0.0",
                  content={"unexpected": 42})
    

    Expected: code: "validation_failed" with validation_errors list containing at least the required violation for the missing name field, and schema_ref/schema_version echoed.

    • Unknown schema yields schema_not_found with searched_owners
    create_record(source="qa-test", tags=[], description="orphan",
                  logical_key="orphan-1", schema_ref="schema:does-not-exist", schema_version="1.0.0",
                  content={})
    

    Expected: code: "schema_not_found", searched_owners: [<your-owner>, "_system"].

    • Record upsert via same logical_key
      Re-run step 5 with different content {"name": "widget-v2"}. Expected: action: "updated", same id as step 5.
    • Record re-validation on content update (valid)
    update_entry(entry_id=<id from step 5>, content={"name": "still-valid"})
    

    Expected: success (no error). Content updated.

    • Record re-validation on content update (invalid → rejected)
    update_entry(entry_id=<id>, content={"name": 123})
    

    Expected: code: "validation_failed"; record content unchanged when queried via get_knowledge.

    • Schema immutability
    update_entry(entry_id=<schema id from step 1>, description="new desc")
    

    Expected: code: "schema_immutable"; schema unchanged.

    • Schema deletion blocked by live records
    delete_entry(entry_id=<schema id from step 1>)
    

    Expected: code: "schema_in_use", referencing_records: [...], total_count.

    • Schema deletion allowed after records deleted
      delete_entry(entry_id=<record id from step 5>), then retry step 12.
      Expected: schema soft-deletes successfully.
    • _system fallback via CLI + MCP
      On the QA instance shell:
    echo '{"type": "object"}' > /tmp/qa-sys.json && \
    AWARENESS_DATABASE_URL=<qa-dsn> mcp-awareness-register-schema \
      --system --family schema:qa-system --version 1.0.0 \
      --schema-file /tmp/qa-sys.json --source qa-built-in --tags qa --description "qa system schema"
    

    Then via MCP:

    create_record(source="qa-test", tags=[], description="uses system schema",
                  logical_key="qa-sys-rec", schema_ref="schema:qa-system", schema_version="1.0.0",
                  content={"any": "thing"})
    

    Expected: record creates successfully against the _system-owned schema.

    • Cross-owner isolation
      As a second authenticated user on the QA instance, attempt to resolve step 1's schema (schema:qa-thing:1.0.0). Expected: code: "schema_not_found".

Edge cases

    • Primitive content schemas
    register_schema(source="qa-test", tags=[], description="int schema",
                    family="schema:counter", version="1.0.0", schema={"type": "integer"})
    create_record(source="qa-test", tags=[], description="a count",
                  logical_key="count-1", schema_ref="schema:counter", schema_version="1.0.0",
                  content=42)
    

    Expected: success — data.content can be any JSON value, not just objects.

    • Array content schemas
      Register {"type": "array", "items": {"type": "string"}} and write ["a","b"] as content. Expected: success.

Known limitations (verify no attempt exploits these)

  • Bulk-delete paths (delete_entry by tags/source) do not currently protect schemas referenced by records. Only single-id deletion is protected. If QA has a compelling case for bulk-delete protection, file as follow-up; not blocking for this PR.

cmeans-claude-dev[bot] and others added 23 commits April 13, 2026 20:55
Full design for implementing EntryType.SCHEMA and EntryType.RECORD —
JSON Schema Draft 2020-12 validation on write, per-owner with _system
fallback, absolute schema immutability, record re-validation on update,
CLI-only writes to _system owner, structured validation error envelope.

Closes design phase for #208. Implementation plan follows.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bite-sized TDD plan covering 19 tasks: new validation module, two Store
protocol methods, two MCP tools, update_entry/delete_entry branching,
Alembic _system user seed, CLI tool, docs, pre-push verification, and PR.
Full self-review cross-checks every D1-D8 decision and error code
against a concrete task.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds find_schema(owner_id, logical_key) to the Store protocol and
PostgresStore. A single SQL query with CASE-based ORDER BY returns the
caller's own schema when present, falling back to the _system-owned
version. Soft-deleted entries are excluded. Seeds the _system user in
the test fixture and adds 5 tests covering all lookup scenarios.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds count_records_referencing to the Store protocol and PostgresStore.
Returns (total_count, first_10_ids) of non-deleted records that reference
a given schema logical key (decomposed via rpartition on the last ':').
Backed by two SQL files following the one-operation-per-file convention.
Five tests cover zero, match, soft-delete exclusion, version isolation,
and the 10-id cap with 15 records.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Idempotent INSERT ON CONFLICT DO NOTHING seeds the _system user row so
entries with owner_id='_system' have a valid owner. Includes idempotence
test verifying the ON CONFLICT path does not create duplicates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements the register_schema tool with JSON Schema Draft 2020-12
validation, duplicate detection via psycopg UniqueViolation, and
integration tests. Also registers the tool in server.py re-exports
and updates TestWriteResponseShapes to cover the new tool.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…allback

Resolves schema by ref+version (caller-owned first, _system fallback),
validates content via validate_record_content, and upserts on logical_key.
Raises structured ToolError with validation_errors list on schema mismatch.
Truncation sentinel from validate_record_content is promoted to top-level
envelope fields. TestWriteResponseShapes updated for create_record.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds **extras: Any kwargs to _error_response so structured fields beyond
the fixed set (schema_ref, schema_version, searched_owners,
validation_errors, etc.) can flow through the error envelope uniformly
without raw ToolError construction at call sites.

Adds TestErrorResponseExtras unit tests verifying extras appear in the
raised ToolError JSON payload and do not clobber the fixed fields.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds type-specific branching in update_entry after the updates dict is
built but before the store write:
- SCHEMA entries: always rejected with schema_immutable error
- RECORD entries + content kwarg: re-validates content against the
  registered schema (resolver uses _system fallback); rejects with
  validation_failed including structured per-error list
- RECORD entries + no content kwarg: passes through unchanged
  (description-only updates skip re-validation)

Also refactors create_record to use _error_response with **extras
instead of raw ToolError(json.dumps(...)) construction, removing the
last two raw-ToolError sites from the tools module.

Adds 4 integration tests covering all branches.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Soft-deleting a schema entry is now blocked when live (non-deleted)
records still reference it. The check runs in the by-id path only;
bulk deletes by tags/source do not include schemas in scope. Adds
three integration tests covering: no-records-succeeds,
with-records-rejected, and allowed-after-records-deleted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Operator bootstrap tool for seeding _system-owned schema entries directly
via PostgresStore — no MCP auth or middleware involved.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…/record

Append three new integration tests to verify:

1. test_cross_owner_schema_invisible — Owner A registers a schema; Owner B
   cannot resolve it. Verifies owner isolation at the tool boundary.

2. test_both_owners_see_system_schema — Both owners can use a _system schema.
   Verifies _system fallback works cross-owner.

3. test_caller_schema_overrides_system — When both _system and caller have
   the same logical_key, caller's version wins. Verifies override semantics
   via schema body difference (integer vs string).

All three tests use the configured_server fixture with monkeypatch for
owner switching. Tests verify the design point from Task 6: resolve_schema
→ find_schema query prefers caller-owned over _system via SQL CASE ORDER BY.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add schema/record documentation to CHANGELOG, README (tool count 30→32,
new tool descriptions), data-dictionary (new entry type specs), and
server instructions. Include jsonschema dependency note and _system
namespace details.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add mypy override for jsonschema.* (no stubs available), remove
now-unnecessary type: ignore comments that mypy flagged as unused.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-error paths

Also fixes two latent bugs in tools.py:
- register_schema: except Exception fallback was unreachable (lines 621-630)
  — now exercised via generic unique-violation mock
- create_record: jse.JsonSchemaException doesn't exist in jsonschema 4.x
  — replaced with except Exception; removed now-unused jse import

Brings tools.py to 100%, cli_register_schema.py to 98%, validation.py to 100%.
Total: 945 tests collected (938 pass, 7 skip).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@cmeans-claude-dev cmeans-claude-dev bot added enhancement New feature or request Dev Active Developer is actively working on this PR; QA should not start labels Apr 14, 2026
@github-actions github-actions bot added the Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA label Apr 14, 2026
Tests used @pytest.mark.asyncio (local env had pytest-asyncio) but the
repo's established pattern is @pytest.mark.anyio via anyio plugin. Also
fixed ruff I001/E501 from Task 18 coverage tests and removed a now-unused
# type: ignore in cli_register_schema.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 99.49749% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/mcp_awareness/cli_register_schema.py 97.95% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@cmeans-claude-dev cmeans-claude-dev bot added Ready for QA Dev work complete — QA can begin review and removed Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA Dev Active Developer is actively working on this PR; QA should not start labels Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Ready for QA Dev work complete — QA can begin review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Schema + Record entry types — JSON Schema validation on write

0 participants