diff --git a/CHANGELOG.md b/CHANGELOG.md index 31c6ba2..1560939 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,26 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Added +- Two new entry types: `schema` (JSON Schema Draft 2020-12 definition) and `record` (validated payload conforming to a schema). Tools: `register_schema`, `create_record`. Schemas are absolutely immutable after registration; records re-validate on content update. Schema deletion is blocked while live records reference a version. Per-owner storage with a shared `_system` fallback namespace for built-in schemas. +- New CLI: `mcp-awareness-register-schema` for operators to seed `_system`-owned schemas at deploy time. +- New migration: `_system` user seed (idempotent). +- `_error_response()` helper now accepts `**extras` kwargs so tools can include structured fields in error envelopes beyond the fixed set (e.g., `validation_errors`, `schema_ref`, `referencing_records`). + +### Fixed +- **RLS carve-out for `_system`-owned schema reads** — migration `n9i0j1k2l3m4` alters the `entries.owner_isolation` policy: `USING` now allows reads where `owner_id = '_system' AND type = 'schema'`, while an **explicit** `WITH CHECK (owner_id = current_user)` keeps writes strictly owner-scoped. Before this change, the strict `owner_id = current_user` USING clause filtered out `_system`-owned rows for non-superuser DB roles, making the `find_schema` fallback (and the whole CLI bootstrap pattern for built-in schemas) a no-op in production. The WITH CHECK clause is explicit because a `FOR ALL` permissive policy without one reuses `USING` for INSERT/UPDATE — without it the read carve-out would leak into the write path and let non-privileged owners stamp `_system`-owned schema rows. +- **`schema_already_exists` error envelope** — register_schema now returns `logical_key` and a best-effort `existing_id` as structured fields alongside the human-readable message (matches the design-doc error-code table; callers no longer have to parse the message to locate the conflicting entry). +- **RECORD content shape preserved across `update_entry`** — previously `update_entry` stringified non-string content via `json.dumps()` before handing it to the store, causing RECORD entries to drift from a native JSON object/array/primitive (how they are stored on create) to a JSON-encoded string after any content update. `update_entry` now branches on the existing entry's type: RECORD entries persist native JSON content to match the create path, while other knowledge types (note / pattern / context / preference) retain the existing stringify-on-write behavior. +- **Bulk-delete paths (`delete_entry` by tags/source) still do not consult `schema_in_use`** — single-id schema deletion is protected; bulk paths are explicitly flagged in the code and tracked by [#288](https://github.com/cmeans/mcp-awareness/issues/288). Not changed in this PR (out of scope per the design), but documented where the gap lives. +- **`count_records_referencing` store boundary hardening** — `schema_logical_key` parsing now asserts the `ref:version` invariant (non-empty ref + non-empty version after the last `:`). Empty version is blocked at `register_schema`, but the store API is public, so we fail loudly here rather than silently running a non-matching query. +- **`_system` user downgrade no-op** — `m8h9i0j1k2l3.downgrade()` now checks for referencing entries before `DELETE`. If any exist, it logs a warning and returns rather than FK-failing the whole transaction. Operators can soft-delete or re-home the referenced entries and re-run downgrade. +- **CLI language resolution** — `mcp-awareness-register-schema` now runs the description through `resolve_language()` (same chain as the MCP path) instead of pinning every CLI-seeded schema to `english`. Auto-detection falls back to `simple` for short/unknown-language descriptions. +- **Dead-code cleanup in `register_schema`** — removed the string-matching fallback (`"unique"` / `"duplicate"` / `"23505"` in the exception message) in favor of the psycopg-native `UniqueViolation` branch. The fallback was unreachable under the `psycopg`-direct driver the project uses. +- **Mypy override cleanup** — dropped the no-op `ignore_errors = true` from the `jsonschema.*` override in `pyproject.toml`. `ignore_missing_imports = true` alone covers the import; there is no project code under `jsonschema.*` to silence. + +### Dependencies +- Added `jsonschema>=4.26.0` as a runtime dependency. + ## [0.17.0] - 2026-04-13 ### Added diff --git a/README.md b/README.md index d4d6f61..9b5dfd3 100644 --- a/README.md +++ b/README.md @@ -282,7 +282,7 @@ Results from the initial run (2026-03-27): HNSW query P50 stays under 4ms from 5 ## Tools -The server exposes 30 MCP tools. Clients that support MCP resources also get 6 read-only resources, but since not all clients surface resources, every resource has a tool mirror. +The server exposes 32 MCP tools. Clients that support MCP resources also get 6 read-only resources, but since not all clients surface resources, every resource has a tool mirror. ### Read tools @@ -318,6 +318,8 @@ The server exposes 30 MCP tools. Clients that support MCP resources also get 6 r | `remind` | Create a todo, reminder, or planned action. Optional `deliver_at` timestamp for time-based surfacing. Intentions have a lifecycle: pending → fired → active → completed. | | `update_intention` | Transition an intention state: pending → fired → active → completed/snoozed/cancelled. | | `acted_on` | Log that you took action because of an entry. Tags inherited from the entry. | +| `register_schema` | Define a typed data contract using JSON Schema Draft 2020-12. Schemas are immutable after registration; family + version become logical_key. Per-owner with `_system` fallback for shared built-in shapes. | +| `create_record` | Write a validated data entry conforming to a registered schema. Records pin exact schema version and re-validate on content update. Validation errors include every failure with structured envelope. | ### Data management tools diff --git a/alembic/versions/m8h9i0j1k2l3_add_system_user_for_schemas.py b/alembic/versions/m8h9i0j1k2l3_add_system_user_for_schemas.py new file mode 100644 index 0000000..15348de --- /dev/null +++ b/alembic/versions/m8h9i0j1k2l3_add_system_user_for_schemas.py @@ -0,0 +1,73 @@ +# mcp-awareness — ambient system awareness for AI agents +# Copyright (C) 2026 Chris Means +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU Affero General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU Affero General Public License for more details. +# +# You should have received a copy of the GNU Affero General Public License +# along with this program. If not, see . + +"""add _system user for system-owned schemas + +Revision ID: m8h9i0j1k2l3 +Revises: l7g8h9i0j1k2 +Create Date: 2026-04-13 00:00:00.000000 + +""" + +from __future__ import annotations + +import logging +from collections.abc import Sequence + +from alembic import op + +logger = logging.getLogger("alembic.runtime.migration") + +revision: str = "m8h9i0j1k2l3" +down_revision: str | Sequence[str] | None = "l7g8h9i0j1k2" +branch_labels: str | Sequence[str] | None = None +depends_on: str | Sequence[str] | None = None + + +def upgrade() -> None: + """Seed the _system user for system-owned schema entries. + + Idempotent — ON CONFLICT DO NOTHING lets the migration run multiple + times safely (e.g., after a stamp-and-reapply). + """ + op.execute( + "INSERT INTO users (id, display_name) " + "VALUES ('_system', 'System-managed schemas') " + "ON CONFLICT (id) DO NOTHING" + ) + + +def downgrade() -> None: + """Remove the _system user, if safe to do so. + + This downgrade is a no-op when `_system`-owned entries still exist (schemas + seeded via ``mcp-awareness-register-schema --system``, for example). A hard + DELETE would FK-fail and abort the entire downgrade transaction — masking + any subsequent downgrade steps from surfacing. The warning surfaces the + manual step required: operators who really want to remove `_system` must + first soft-delete or re-home the referenced entries, then re-run downgrade. + """ + conn = op.get_bind() + referenced = conn.exec_driver_sql( + "SELECT 1 FROM entries WHERE owner_id = '_system' LIMIT 1" + ).fetchone() + if referenced is not None: + logger.warning( + "Skipping delete of users._system — entries still reference it. " + "Soft-delete or re-home those entries, then re-run downgrade." + ) + return + op.execute("DELETE FROM users WHERE id = '_system'") diff --git a/alembic/versions/n9i0j1k2l3m4_rls_allow_system_schema_reads.py b/alembic/versions/n9i0j1k2l3m4_rls_allow_system_schema_reads.py new file mode 100644 index 0000000..abb8425 --- /dev/null +++ b/alembic/versions/n9i0j1k2l3m4_rls_allow_system_schema_reads.py @@ -0,0 +1,82 @@ +# mcp-awareness — ambient system awareness for AI agents +# Copyright (C) 2026 Chris Means +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU Affero General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU Affero General Public License for more details. +# +# You should have received a copy of the GNU Affero General Public License +# along with this program. If not, see . + +"""RLS: allow all owners to read _system-owned schema entries + +Revision ID: n9i0j1k2l3m4 +Revises: m8h9i0j1k2l3 +Create Date: 2026-04-14 00:00:00.000000 + +The owner_isolation SELECT/UPDATE/DELETE policy on entries was + + USING (owner_id = current_setting('app.current_user', true)) + +which — under FORCE ROW LEVEL SECURITY for non-superuser roles — filters +out `_system`-owned rows. That blocks the schema-fallback design for +built-in schemas registered via ``mcp-awareness-register-schema --system`` +because the `find_schema` query's ``owner_id IN (%s, '_system')`` clause is +evaluated AFTER RLS strips the `_system` row. + +This migration narrows the read carve-out to `_system`-owned *schema* rows +only. Writes are kept strictly isolated by an explicit ``WITH CHECK`` +clause — without it, a ``FOR ALL`` permissive policy's ``USING`` is used +for INSERT/UPDATE too, and (because permissive policies combine with OR) +the read carve-out would leak into the write path, allowing non-privileged +owners to INSERT/UPDATE rows with ``owner_id = '_system' AND type = 'schema'`` +(PR #287 Round-3 QA reproduction). Keeping the write check strict ensures +the only path that can seed ``_system`` schemas is the CLI +(``mcp-awareness-register-schema --system``) which bypasses MCP entirely +and connects as whichever DB role the operator chose. + +Rationale: option 1 from the PR #287 Round-2 QA review (narrowest +change, read-only carve-out, no SECURITY DEFINER functions needed) — +plus the explicit WITH CHECK added in Round 3. +""" + +from __future__ import annotations + +from collections.abc import Sequence + +from alembic import op + +revision: str = "n9i0j1k2l3m4" +down_revision: str | Sequence[str] | None = "m8h9i0j1k2l3" +branch_labels: str | Sequence[str] | None = None +depends_on: str | Sequence[str] | None = None + + +def upgrade() -> None: + """Replace the owner_isolation policy on `entries` to allow reads of + `_system`-owned schema rows from any owner context while keeping + writes strictly isolated via an explicit WITH CHECK clause.""" + op.execute("DROP POLICY IF EXISTS owner_isolation ON entries") + op.execute(""" + CREATE POLICY owner_isolation ON entries + USING ( + owner_id = current_setting('app.current_user', true) + OR (owner_id = '_system' AND type = 'schema') + ) + WITH CHECK (owner_id = current_setting('app.current_user', true)) + """) + + +def downgrade() -> None: + """Restore the strict-isolation policy on `entries`.""" + op.execute("DROP POLICY IF EXISTS owner_isolation ON entries") + op.execute(""" + CREATE POLICY owner_isolation ON entries + USING (owner_id = current_setting('app.current_user', true)) + """) diff --git a/docs/data-dictionary.md b/docs/data-dictionary.md index 58fdd7f..70237c1 100644 --- a/docs/data-dictionary.md +++ b/docs/data-dictionary.md @@ -46,7 +46,7 @@ The UNIQUE constraint is on `canonical_email`, not `email`. Users see and use th |--------|------|----------|-------------| | `id` | TEXT | No | Primary key. UUID v4, generated via `uuid.uuid4()`. | | `owner_id` | TEXT | No | Owner identifier. References the user who owns this entry. All queries are scoped by `owner_id`. | -| `type` | TEXT | No | Entry type. One of: `status`, `alert`, `pattern`, `suppression`, `context`, `preference`, `note`, `intention`. | +| `type` | TEXT | No | Entry type. One of: `status`, `alert`, `pattern`, `suppression`, `context`, `preference`, `note`, `intention`, `schema`, `record`. | | `source` | TEXT | No | Origin identifier. Describes the subject, not the owner (e.g., `"personal"`, `"synology-nas"`, `"mcp-awareness-project"`). | | `created` | TIMESTAMPTZ | No | UTC timestamp. Set once when the entry is first created. | | `updated` | TIMESTAMPTZ | Yes | UTC timestamp. Updated on every upsert or `update_entry` call. `NULL` until first update. | @@ -177,6 +177,34 @@ Written by agents via `set_preference`. Keyed by `key` + `scope` (upserted). Por | `value` | string | Yes | Preference value (e.g., `"one_sentence_warnings"`, `"first_turn_only"`). | | `scope` | string | Yes | Scope of the preference. Default: `"global"`. | +### `schema` — JSON Schema definitions + +Written by operators or agents via `register_schema`. Immutable after registration. Schema body lives in `data.schema`; family + version in `data.family` + `data.version`; `logical_key` derived as `{family}:{version}`. Used by `record` entries for typed validation. + +**`data` fields:** + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `family` | string | Yes | Schema family identifier (e.g., `schema:edge-manifest`, `schema:config`). Used as the reference key. | +| `version` | string | Yes | Schema version (user-chosen semantic or sequential, e.g., `"1.0.0"`, `"1"`). | +| `schema` | object | Yes | JSON Schema Draft 2020-12 body. Defines the validation rules and structure. | +| `description` | string | No | Human-readable description of what this schema validates. | +| `learned_from` | string | No | Platform that registered the schema (e.g., `"claude-code"`, `"operator"`). Default: `"conversation"`. | + +### `record` — Validated data entries + +Written by agents via `create_record`. Content in `data.content`; pinned schema reference in `data.schema_ref` + `data.schema_version` (exact version, no "latest" aliasing). Re-validated on content update via `update_entry`. + +**`data` fields:** + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `schema_ref` | string | Yes | Target schema family (e.g., `schema:edge-manifest`). Used to look up the schema definition. | +| `schema_version` | string | Yes | Target schema version (exact pin, e.g., `"1.0.0"`). Pinned at write time; determines which schema is used for validation on updates. | +| `content` | any | Yes | Validated payload — any JSON value (object, array, string, number, boolean, null). Must conform to the pinned schema. | +| `description` | string | No | Human-readable description of what this record represents. | +| `learned_from` | string | No | Platform that created the record (e.g., `"claude-code"`, edge provider name). Default: `"conversation"`. | + ## Lifecycle - **Upsert behavior:** `status` entries are upserted by `source`. `alert` entries by `source` + `alert_id`. `preference` entries by `key` + `scope`. Other types always insert new rows. diff --git a/docs/superpowers/plans/2026-04-13-schema-record-entry-types-plan.md b/docs/superpowers/plans/2026-04-13-schema-record-entry-types-plan.md new file mode 100644 index 0000000..b94fee4 --- /dev/null +++ b/docs/superpowers/plans/2026-04-13-schema-record-entry-types-plan.md @@ -0,0 +1,2420 @@ +# Schema + Record Entry Types Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Add `EntryType.SCHEMA` and `EntryType.RECORD` with JSON Schema Draft 2020-12 validation on write, per-owner storage with a shared `_system` fallback, schema immutability, record re-validation on update, and a CLI tool for seeding system-owned schemas. Delivers steps 1–2 of the `design-schema-record-secrets` spec; secrets infrastructure is a separate follow-up. + +**Architecture:** New `validation.py` module holds pure validation functions using the `jsonschema` library (centralized, unit-testable without DB, keeps `jsonschema` out of the store layer). Two new MCP write tools — `register_schema` and `create_record` — match the existing one-tool-per-type convention. `update_entry` and `delete_entry` gain type-specific branches for the new entries. Store protocol grows exactly two methods: `find_schema` (with `_system` fallback) and `count_records_referencing` (for deletion protection). A new `mcp-awareness-register-schema` console script bypasses MCP for operator bootstrap of `_system`-owned schemas. + +**Tech Stack:** Python 3.11+, FastMCP, psycopg + pgvector-enabled Postgres 17, Alembic, `jsonschema>=4.26.0` (new dep), existing structured-error helper (`_error_response`), testcontainers for integration tests. + +**Spec:** [`docs/superpowers/specs/2026-04-13-schema-record-entry-types-design.md`](../specs/2026-04-13-schema-record-entry-types-design.md) — all design decisions D1–D8 and error codes are authoritative there. This plan implements without re-deriving. + +**Branch:** `feat/schema-record-entry-types` (already created with the spec commit). + +--- + +## File Map + +**Files to create:** +- `src/mcp_awareness/validation.py` — pure validation functions +- `src/mcp_awareness/cli_register_schema.py` — `mcp-awareness-register-schema` console script +- `alembic/versions/_add_system_user_for_schemas.py` — `_system` user seed migration +- `tests/test_validation.py` — unit tests for validation module +- `tests/test_tools_schema_record.py` — integration tests via testcontainers Postgres +- `tests/test_cli_register_schema.py` — CLI tool tests + +**Files to modify:** +- `src/mcp_awareness/schema.py` — add `SCHEMA` and `RECORD` enum values +- `src/mcp_awareness/store.py` — `Store` protocol: add `find_schema`, `count_records_referencing` +- `src/mcp_awareness/postgres_store.py` — implement the two new methods +- `src/mcp_awareness/tools.py` — add `register_schema`, `create_record`; branch `update_entry` / `delete_entry` +- `src/mcp_awareness/instructions.md` — mention new tools in server instructions +- `pyproject.toml` — add `jsonschema>=4.26.0` dep; add `mcp-awareness-register-schema` console script +- `CHANGELOG.md` — entry under `[Unreleased]` +- `README.md` — update tool count and "Implemented" section +- `docs/data-dictionary.md` — document `schema` and `record` entry types +- `tests/test_schema.py` — add enum-value coverage +- `tests/test_store.py` — add `find_schema` / `count_records_referencing` coverage + +--- + +## Execution Notes + +- **TDD throughout:** every code task writes the failing test first, verifies it fails, implements minimal code, verifies it passes, commits. No committing of untested code. +- **Commit frequency:** at minimum one commit per task, often mid-task after a green test. +- **Conventional commits:** `feat:`, `fix:`, `docs:`, `test:`, `chore:`, `refactor:` as appropriate. +- **Pre-commit discipline** (per saved feedback): before first push, run `ruff format`, `ruff check`, `mypy src/`, `pytest --cov`, verify coverage of new lines, verify test count in README matches reality. +- **AGPL preamble:** every new `.py` file must start with the AGPL v3 license header (copy from any existing `src/mcp_awareness/*.py` file). +- **Structured errors only:** all new error paths use `_error_response()` from `helpers.py`. No `raise ValueError` in tool-facing paths. +- **No `pragma: no cover`** without explicit approval. + +--- + +## Task 1: Add `SCHEMA` and `RECORD` to `EntryType` enum + +**Files:** +- Modify: `src/mcp_awareness/schema.py` (class `EntryType`, line 30) +- Modify: `tests/test_schema.py` + +- [ ] **Step 1: Write failing test** + +Append to `tests/test_schema.py`: + +```python +def test_entry_type_schema_value(): + assert EntryType.SCHEMA.value == "schema" + assert EntryType("schema") is EntryType.SCHEMA + + +def test_entry_type_record_value(): + assert EntryType.RECORD.value == "record" + assert EntryType("record") is EntryType.RECORD +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `pytest tests/test_schema.py::test_entry_type_schema_value tests/test_schema.py::test_entry_type_record_value -v` +Expected: FAIL — `AttributeError: SCHEMA`. + +- [ ] **Step 3: Add enum values** + +Edit `src/mcp_awareness/schema.py`, inside `class EntryType`: + +```python +class EntryType(str, Enum): + STATUS = "status" + ALERT = "alert" + PATTERN = "pattern" + SUPPRESSION = "suppression" + CONTEXT = "context" + PREFERENCE = "preference" + NOTE = "note" + INTENTION = "intention" + SCHEMA = "schema" + RECORD = "record" +``` + +- [ ] **Step 4: Run tests** + +Run: `pytest tests/test_schema.py -v` +Expected: all pass. + +- [ ] **Step 5: Commit** + +```bash +git add src/mcp_awareness/schema.py tests/test_schema.py +git commit -m "feat: add SCHEMA and RECORD to EntryType enum" +``` + +--- + +## Task 2: Add `jsonschema` dependency + +**Files:** +- Modify: `pyproject.toml` + +- [ ] **Step 1: Add to dependencies** + +Edit `pyproject.toml` → `[project] dependencies` array (or equivalent). Add: + +``` +jsonschema>=4.26.0,<5 +``` + +- [ ] **Step 2: Install locally** + +Run: `pip install -e ".[dev]"` +Expected: installs `jsonschema`, `jsonschema-specifications`, `referencing`, `rpds-py`, `attrs`. + +- [ ] **Step 3: Verify importable** + +Run: `python -c "from jsonschema import Draft202012Validator; print(Draft202012Validator.META_SCHEMA['$id'])"` +Expected: prints `https://json-schema.org/draft/2020-12/schema`. + +- [ ] **Step 4: Commit** + +```bash +git add pyproject.toml +git commit -m "chore: add jsonschema>=4.26.0 dependency" +``` + +--- + +## Task 3: Create `validation.py` with `compose_schema_logical_key` + +Start with the smallest pure function to establish the module. + +**Files:** +- Create: `src/mcp_awareness/validation.py` +- Create: `tests/test_validation.py` + +- [ ] **Step 1: Create failing test** + +Create `tests/test_validation.py`: + +```python +# AGPL preamble here — copy from tests/test_schema.py + +"""Tests for src/mcp_awareness/validation.py.""" + +from __future__ import annotations + +import pytest + +from mcp_awareness.validation import compose_schema_logical_key + + +def test_compose_schema_logical_key_basic(): + assert compose_schema_logical_key("schema:edge-manifest", "1.0.0") == "schema:edge-manifest:1.0.0" + + +def test_compose_schema_logical_key_no_prefix(): + assert compose_schema_logical_key("tag-taxonomy", "0.1.0") == "tag-taxonomy:0.1.0" +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `pytest tests/test_validation.py -v` +Expected: FAIL — `ModuleNotFoundError: No module named 'mcp_awareness.validation'`. + +- [ ] **Step 3: Create validation module** + +Create `src/mcp_awareness/validation.py`: + +```python +# AGPL preamble here — copy from src/mcp_awareness/schema.py + +"""Validation helpers for Schema and Record entry types. + +Pure functions wrapping jsonschema Draft 2020-12 validation and schema +lookup with _system fallback. Kept out of the store layer so the Store +protocol stays swappable (no jsonschema import in store.py). +""" + +from __future__ import annotations + + +def compose_schema_logical_key(family: str, version: str) -> str: + """Derive the canonical logical_key for a schema entry. + + Single source of truth for the family+version → logical_key format. + Used by register_schema on write and by resolve_schema on lookup. + """ + return f"{family}:{version}" +``` + +- [ ] **Step 4: Run tests** + +Run: `pytest tests/test_validation.py -v` +Expected: both tests pass. + +- [ ] **Step 5: Commit** + +```bash +git add src/mcp_awareness/validation.py tests/test_validation.py +git commit -m "feat: add validation module with compose_schema_logical_key" +``` + +--- + +## Task 4: `validation.validate_schema_body` + +**Files:** +- Modify: `src/mcp_awareness/validation.py` +- Modify: `tests/test_validation.py` + +- [ ] **Step 1: Write failing tests** + +Append to `tests/test_validation.py`: + +```python +import jsonschema + +from mcp_awareness.validation import validate_schema_body + + +def test_validate_schema_body_accepts_valid_object_schema(): + schema = { + "type": "object", + "properties": {"name": {"type": "string"}}, + "required": ["name"], + } + validate_schema_body(schema) # must not raise + + +def test_validate_schema_body_rejects_bad_type(): + schema = {"type": "strng"} # typo: 'strng' is not a valid JSON Schema type + with pytest.raises(jsonschema.exceptions.SchemaError): + validate_schema_body(schema) + + +def test_validate_schema_body_accepts_empty_object(): + # Empty schema matches anything — valid per spec + validate_schema_body({}) + + +def test_validate_schema_body_rejects_non_dict(): + # Schemas must be objects; bare arrays fail meta-schema + with pytest.raises(jsonschema.exceptions.SchemaError): + validate_schema_body([{"type": "string"}]) # type: ignore[arg-type] +``` + +- [ ] **Step 2: Run to verify failure** + +Run: `pytest tests/test_validation.py -v` +Expected: FAIL — `ImportError: cannot import name 'validate_schema_body'`. + +- [ ] **Step 3: Implement** + +Append to `src/mcp_awareness/validation.py`: + +```python +from typing import Any + +from jsonschema import Draft202012Validator + + +def validate_schema_body(schema: Any) -> None: + """Validate a schema body against the JSON Schema Draft 2020-12 meta-schema. + + Raises jsonschema.exceptions.SchemaError on invalid schema. Callers at + the MCP boundary translate this into a structured 'invalid_schema' error + response; direct callers (CLI) format to stderr. + """ + Draft202012Validator.check_schema(schema) +``` + +- [ ] **Step 4: Run tests, verify pass** + +Run: `pytest tests/test_validation.py -v` +Expected: all pass. + +- [ ] **Step 5: Commit** + +```bash +git add src/mcp_awareness/validation.py tests/test_validation.py +git commit -m "feat: add validate_schema_body for Draft 2020-12 meta-schema check" +``` + +--- + +## Task 5: `validation.validate_record_content` + +Returns a sorted list of flattened error dicts. Callers decide how to envelope them. + +**Files:** +- Modify: `src/mcp_awareness/validation.py` +- Modify: `tests/test_validation.py` + +- [ ] **Step 1: Write failing tests** + +Append to `tests/test_validation.py`: + +```python +from mcp_awareness.validation import validate_record_content + + +_PERSON_SCHEMA = { + "type": "object", + "properties": { + "name": {"type": "string"}, + "age": {"type": "integer", "minimum": 0}, + }, + "required": ["name"], +} + + +def test_validate_record_content_valid_returns_empty_list(): + assert validate_record_content(_PERSON_SCHEMA, {"name": "Alice", "age": 30}) == [] + + +def test_validate_record_content_surfaces_missing_required(): + errors = validate_record_content(_PERSON_SCHEMA, {"age": 30}) + assert len(errors) == 1 + assert errors[0]["validator"] == "required" + assert "name" in errors[0]["message"] + + +def test_validate_record_content_surfaces_all_errors(): + # Missing 'name' AND age is wrong type + errors = validate_record_content(_PERSON_SCHEMA, {"age": "thirty"}) + assert len(errors) == 2 + validators = {e["validator"] for e in errors} + assert validators == {"required", "type"} + + +def test_validate_record_content_is_sorted_by_path(): + schema = { + "type": "object", + "properties": { + "a": {"type": "integer"}, + "b": {"type": "integer"}, + "c": {"type": "integer"}, + }, + } + errors = validate_record_content(schema, {"a": "x", "b": "y", "c": "z"}) + paths = [e["path"] for e in errors] + assert paths == sorted(paths) + + +def test_validate_record_content_accepts_primitive_schema(): + schema = {"type": "integer"} + assert validate_record_content(schema, 42) == [] + errors = validate_record_content(schema, "abc") + assert len(errors) == 1 + assert errors[0]["validator"] == "type" + + +def test_validate_record_content_array_schema_with_index_paths(): + schema = {"type": "array", "items": {"type": "integer"}} + errors = validate_record_content(schema, [1, "two", 3, "four"]) + assert len(errors) == 2 + # Array indices should appear in paths + paths = [e["path"] for e in errors] + assert any("1" in p for p in paths) + assert any("3" in p for p in paths) + + +def test_validate_record_content_truncates_at_50(): + schema = { + "type": "array", + "items": {"type": "integer"}, + } + # 60 wrong-type items — all fail + result = validate_record_content(schema, ["x"] * 60) + assert isinstance(result, list) + # Truncation is carried via a special sentinel entry at the end; see impl + assert len(result) == 51 # 50 errors + 1 truncation marker + assert result[-1]["truncated"] is True + assert result[-1]["total_errors"] == 60 +``` + +- [ ] **Step 2: Run to verify failure** + +Run: `pytest tests/test_validation.py -v` +Expected: FAIL on missing import. + +- [ ] **Step 3: Implement** + +Append to `src/mcp_awareness/validation.py`: + +```python +from jsonschema import ValidationError + +_MAX_VALIDATION_ERRORS = 50 + + +def _flatten_error(err: ValidationError) -> dict[str, Any]: + """Flatten a jsonschema ValidationError to a structured dict for the error envelope.""" + return { + "path": err.json_path, + "message": err.message, + "validator": err.validator, + "schema_path": "/" + "/".join(str(p) for p in err.schema_path), + } + + +def validate_record_content(schema_body: dict[str, Any], content: Any) -> list[dict[str, Any]]: + """Validate content against a schema body. Returns list of structured errors. + + Empty list means valid. List truncated at _MAX_VALIDATION_ERRORS; when + truncated, final entry is {'truncated': True, 'total_errors': }. + """ + validator = Draft202012Validator(schema_body) + all_errors = sorted(validator.iter_errors(content), key=lambda e: e.path) + if len(all_errors) <= _MAX_VALIDATION_ERRORS: + return [_flatten_error(e) for e in all_errors] + kept = [_flatten_error(e) for e in all_errors[:_MAX_VALIDATION_ERRORS]] + kept.append({"truncated": True, "total_errors": len(all_errors)}) + return kept +``` + +- [ ] **Step 4: Run tests** + +Run: `pytest tests/test_validation.py -v` +Expected: all pass. + +- [ ] **Step 5: Commit** + +```bash +git add src/mcp_awareness/validation.py tests/test_validation.py +git commit -m "feat: add validate_record_content with iter_errors and truncation" +``` + +--- + +## Task 6: Add `find_schema` to Store protocol and PostgresStore + +**Files:** +- Modify: `src/mcp_awareness/store.py` (Store protocol) +- Modify: `src/mcp_awareness/postgres_store.py` (implementation) +- Modify: `tests/test_store.py` +- Create (if needed): `src/mcp_awareness/sql/find_schema.sql` + +- [ ] **Step 1: Inspect existing Store protocol** + +Read `src/mcp_awareness/store.py` to see the current Protocol signature style; mirror it. + +- [ ] **Step 2: Write failing integration test** + +Append to `tests/test_store.py`: + +```python +from mcp_awareness.schema import Entry, EntryType, make_id, now_utc + +SYSTEM_OWNER = "_system" + + +def _schema_entry(owner: str, logical_key: str, family: str, version: str, schema_body: dict) -> Entry: + return Entry( + id=make_id(), + type=EntryType.SCHEMA, + source="test", + tags=[], + created=now_utc(), + updated=None, + expires=None, + data={ + "family": family, + "version": version, + "schema": schema_body, + "description": "test schema", + "learned_from": "test", + }, + logical_key=logical_key, + owner_id=owner, + ) + + +def test_find_schema_returns_caller_owned(store): + # Ensure _system user exists so the FK-less owner_id insert is valid + store._conn_pool # ensure pool lazy-init done — or use a helper if provided + # Insert _system user if the test schema doesn't seed it; adjust if fixture changes + store.save_entry(_schema_entry(TEST_OWNER, "s:test:1.0.0", "s:test", "1.0.0", {"type": "object"})) + found = store.find_schema(TEST_OWNER, "s:test:1.0.0") + assert found is not None + assert found.owner_id == TEST_OWNER + assert found.data["family"] == "s:test" + + +def test_find_schema_system_fallback(store): + store.save_entry(_schema_entry(SYSTEM_OWNER, "s:test:1.0.0", "s:test", "1.0.0", {"type": "object"})) + found = store.find_schema(TEST_OWNER, "s:test:1.0.0") + assert found is not None + assert found.owner_id == SYSTEM_OWNER + + +def test_find_schema_caller_wins_over_system(store): + # Seed _system first + store.save_entry(_schema_entry(SYSTEM_OWNER, "s:test:1.0.0", "s:test", "1.0.0", {"type": "object"})) + # Then caller-owned override + store.save_entry(_schema_entry(TEST_OWNER, "s:test:1.0.0", "s:test", "1.0.0", {"type": "string"})) + found = store.find_schema(TEST_OWNER, "s:test:1.0.0") + assert found is not None + assert found.owner_id == TEST_OWNER + # The caller-owned schema overrode the system one + assert found.data["schema"] == {"type": "string"} + + +def test_find_schema_returns_none_when_missing(store): + assert store.find_schema(TEST_OWNER, "s:nonexistent:1.0.0") is None + + +def test_find_schema_excludes_soft_deleted(store): + entry = _schema_entry(TEST_OWNER, "s:test:1.0.0", "s:test", "1.0.0", {"type": "object"}) + store.save_entry(entry) + store.delete_entry(TEST_OWNER, entry.id) + assert store.find_schema(TEST_OWNER, "s:test:1.0.0") is None +``` + +Note: the `_system` user FK must exist before inserting entries with that `owner_id`. This is normally handled by the migration in Task 10. During testing, **either** wait for Task 10 to land **or** add a test-only helper that inserts `_system` into `users`. The simplest approach: chain Tasks 6 and 10 together OR do Task 10 before Task 6. **Decision: reorder — do the migration (Task 10) before writing the store integration tests.** + +→ **If following the plan in order, swap Task 6 and Task 10.** Alternative: augment conftest.py's `store` fixture to pre-seed `_system` into `users` (keeps plan order natural). + +Preferred approach: add `_system` to the `store` fixture in `conftest.py`: + +```python +@pytest.fixture +def store(pg_dsn): + """Fresh PostgresStore for each test — tables created, then cleared after.""" + s = PostgresStore(pg_dsn) + # Ensure _system user exists for cross-owner schema tests. + with s._conn_pool.connection() as conn, conn.cursor() as cur: + cur.execute( + "INSERT INTO users (id, display_name) VALUES ('_system', 'System-managed schemas') " + "ON CONFLICT (id) DO NOTHING" + ) + conn.commit() + yield s + s.clear(TEST_OWNER) + s.clear(SYSTEM_OWNER) +``` + +- [ ] **Step 3: Run tests to verify failure** + +Run: `pytest tests/test_store.py -v -k find_schema` +Expected: FAIL — `AttributeError: PostgresStore has no attribute 'find_schema'`. + +- [ ] **Step 4: Add method to Store protocol** + +Edit `src/mcp_awareness/store.py`, add to the `Store` Protocol: + +```python +def find_schema(self, owner_id: str, logical_key: str) -> Entry | None: + """Look up a schema entry by logical_key, preferring caller-owned over _system. + + Returns the schema entry or None if not found or soft-deleted. + """ + ... +``` + +- [ ] **Step 5: Implement in PostgresStore** + +Edit `src/mcp_awareness/postgres_store.py`: + +```python +def find_schema(self, owner_id: str, logical_key: str) -> Entry | None: + """Look up a schema, preferring caller-owned over _system-owned. + + Single query with CASE-based ORDER BY for predictable override + semantics: caller's own version wins, _system is fallback. + """ + query = """ + SELECT id, type, source, tags, created, updated, expires, data, + logical_key, owner_id, language, deleted + FROM entries + WHERE type = 'schema' + AND logical_key = %(logical_key)s + AND owner_id IN (%(caller)s, '_system') + AND deleted IS NULL + ORDER BY CASE WHEN owner_id = %(caller)s THEN 0 ELSE 1 END + LIMIT 1 + """ + with self._conn_pool.connection() as conn, conn.cursor(row_factory=dict_row) as cur: + cur.execute(query, {"logical_key": logical_key, "caller": owner_id}) + row = cur.fetchone() + if row is None: + return None + return _row_to_entry(row) +``` + +Adjust `dict_row` / `_row_to_entry` to match existing patterns in the file (import name and helper function may differ — follow what the rest of `postgres_store.py` uses). + +- [ ] **Step 6: Externalize SQL if project pattern requires** + +If the codebase follows "one SQL file per operation" (check `src/mcp_awareness/sql/`), create `sql/find_schema.sql` with the query text and load it via the existing SQL-loading helper. Otherwise, inline is fine. + +- [ ] **Step 7: Run tests** + +Run: `pytest tests/test_store.py -v -k find_schema` +Expected: all pass. + +- [ ] **Step 8: Commit** + +```bash +git add src/mcp_awareness/store.py src/mcp_awareness/postgres_store.py \ + tests/test_store.py tests/conftest.py src/mcp_awareness/sql/find_schema.sql +git commit -m "feat: add Store.find_schema with _system fallback" +``` + +--- + +## Task 7: Add `count_records_referencing` to Store and PostgresStore + +**Files:** +- Modify: `src/mcp_awareness/store.py` +- Modify: `src/mcp_awareness/postgres_store.py` +- Modify: `tests/test_store.py` +- Create (if project convention): `src/mcp_awareness/sql/count_records_referencing.sql` + +- [ ] **Step 1: Write failing tests** + +Append to `tests/test_store.py`: + +```python +def _record_entry(owner: str, logical_key: str, schema_ref: str, schema_version: str, content) -> Entry: + return Entry( + id=make_id(), + type=EntryType.RECORD, + source="test", + tags=[], + created=now_utc(), + updated=None, + expires=None, + data={ + "schema_ref": schema_ref, + "schema_version": schema_version, + "content": content, + "description": "test record", + "learned_from": "test", + }, + logical_key=logical_key, + owner_id=owner, + ) + + +def test_count_records_referencing_returns_zero_when_none(store): + count, ids = store.count_records_referencing(TEST_OWNER, "s:test:1.0.0") + assert count == 0 + assert ids == [] + + +def test_count_records_referencing_counts_matching_records(store): + # Insert 3 records referencing s:test:1.0.0 + for i in range(3): + store.save_entry(_record_entry(TEST_OWNER, f"rec-{i}", "s:test", "1.0.0", {"i": i})) + count, ids = store.count_records_referencing(TEST_OWNER, "s:test:1.0.0") + assert count == 3 + assert len(ids) == 3 + + +def test_count_records_referencing_excludes_soft_deleted(store): + e = _record_entry(TEST_OWNER, "rec-1", "s:test", "1.0.0", {}) + store.save_entry(e) + store.delete_entry(TEST_OWNER, e.id) + count, ids = store.count_records_referencing(TEST_OWNER, "s:test:1.0.0") + assert count == 0 + assert ids == [] + + +def test_count_records_referencing_ignores_other_versions(store): + store.save_entry(_record_entry(TEST_OWNER, "rec-1", "s:test", "1.0.0", {})) + store.save_entry(_record_entry(TEST_OWNER, "rec-2", "s:test", "2.0.0", {})) + count, _ = store.count_records_referencing(TEST_OWNER, "s:test:1.0.0") + assert count == 1 + + +def test_count_records_referencing_caps_id_list_at_ten(store): + for i in range(15): + store.save_entry(_record_entry(TEST_OWNER, f"rec-{i}", "s:test", "1.0.0", {"i": i})) + count, ids = store.count_records_referencing(TEST_OWNER, "s:test:1.0.0") + assert count == 15 + assert len(ids) == 10 +``` + +- [ ] **Step 2: Verify failure** + +Run: `pytest tests/test_store.py -v -k count_records_referencing` +Expected: FAIL — method not defined. + +- [ ] **Step 3: Add to protocol and implement** + +Edit `src/mcp_awareness/store.py`: + +```python +def count_records_referencing( + self, owner_id: str, schema_logical_key: str +) -> tuple[int, list[str]]: + """Return (total_count, first_N_ids) of non-deleted records referencing a schema. + + The schema_logical_key is composed as f"{schema_ref}:{schema_version}". + Caller uses total_count for the error payload and ids for the blocker list. + """ + ... +``` + +Edit `src/mcp_awareness/postgres_store.py`: + +```python +def count_records_referencing( + self, owner_id: str, schema_logical_key: str +) -> tuple[int, list[str]]: + """Count and sample-id records referencing a schema version. + + Query splits schema_logical_key into schema_ref + version by splitting on + the last ':'. Matches data.schema_ref and data.schema_version in the + record entries' JSONB. + """ + # Parse "schema_ref:schema_version" — schema_ref may itself contain ':' + # (e.g., "schema:edge-manifest:1.0.0"). Split on the LAST ':'. + ref, _, version = schema_logical_key.rpartition(":") + count_query = """ + SELECT COUNT(*) AS cnt + FROM entries + WHERE type = 'record' + AND owner_id = %(owner)s + AND data->>'schema_ref' = %(ref)s + AND data->>'schema_version' = %(version)s + AND deleted IS NULL + """ + ids_query = """ + SELECT id + FROM entries + WHERE type = 'record' + AND owner_id = %(owner)s + AND data->>'schema_ref' = %(ref)s + AND data->>'schema_version' = %(version)s + AND deleted IS NULL + ORDER BY created + LIMIT 10 + """ + params = {"owner": owner_id, "ref": ref, "version": version} + with self._conn_pool.connection() as conn, conn.cursor(row_factory=dict_row) as cur: + cur.execute(count_query, params) + count = cur.fetchone()["cnt"] + if count == 0: + return (0, []) + cur.execute(ids_query, params) + ids = [r["id"] for r in cur.fetchall()] + return (count, ids) +``` + +- [ ] **Step 4: Run tests** + +Run: `pytest tests/test_store.py -v -k count_records_referencing` +Expected: all pass. + +- [ ] **Step 5: Commit** + +```bash +git add src/mcp_awareness/store.py src/mcp_awareness/postgres_store.py tests/test_store.py +git commit -m "feat: add Store.count_records_referencing for schema deletion protection" +``` + +--- + +## Task 8: `validation.resolve_schema` + +Uses `store.find_schema()` under the hood but exists in the validation module for a uniform interface to callers. + +**Files:** +- Modify: `src/mcp_awareness/validation.py` +- Modify: `tests/test_validation.py` + +- [ ] **Step 1: Write unit tests with store stub** + +Append to `tests/test_validation.py`: + +```python +from mcp_awareness.validation import resolve_schema + + +class _StubStore: + """Minimal Store-like stub for validation unit tests. + + Records calls to find_schema and returns pre-configured results keyed by + (owner_id, logical_key). Only needs to implement find_schema; other Store + methods are never called by resolve_schema. + """ + + def __init__(self): + self._results: dict[tuple[str, str], object] = {} + self.calls: list[tuple[str, str]] = [] + + def set(self, owner_id: str, logical_key: str, result): + self._results[(owner_id, logical_key)] = result + + def find_schema(self, owner_id, logical_key): + self.calls.append((owner_id, logical_key)) + return self._results.get((owner_id, logical_key)) + + +def test_resolve_schema_returns_caller_owned(): + stub = _StubStore() + stub.set("alice", "s:test:1.0.0", object()) # sentinel + result = resolve_schema(stub, "alice", "s:test", "1.0.0") + assert result is stub._results[("alice", "s:test:1.0.0")] + + +def test_resolve_schema_returns_none_when_missing(): + stub = _StubStore() + assert resolve_schema(stub, "alice", "s:nope", "1.0.0") is None +``` + +Note: the underlying `find_schema` already handles `_system` fallback at the SQL level, so `resolve_schema` delegates fully. No branching in Python. + +- [ ] **Step 2: Verify failure** + +Run: `pytest tests/test_validation.py -v -k resolve_schema` +Expected: FAIL — missing import. + +- [ ] **Step 3: Implement** + +Append to `src/mcp_awareness/validation.py`: + +```python +from typing import Protocol + + +class _SchemaFinder(Protocol): + """Minimal protocol for resolve_schema's store dependency.""" + def find_schema(self, owner_id: str, logical_key: str): + ... + + +def resolve_schema(store: _SchemaFinder, owner_id: str, family: str, version: str): + """Resolve a schema by family + version, preferring caller-owned. + + Delegates to Store.find_schema (which handles the _system fallback at + the SQL level). Returns the schema Entry or None. + """ + return store.find_schema(owner_id, compose_schema_logical_key(family, version)) +``` + +- [ ] **Step 4: Run tests** + +Run: `pytest tests/test_validation.py -v` +Expected: all pass. + +- [ ] **Step 5: Commit** + +```bash +git add src/mcp_awareness/validation.py tests/test_validation.py +git commit -m "feat: add validation.resolve_schema delegating to Store.find_schema" +``` + +--- + +## Task 9: `validation.assert_schema_deletable` + +**Files:** +- Modify: `src/mcp_awareness/validation.py` +- Modify: `tests/test_validation.py` + +- [ ] **Step 1: Write failing tests** + +Append to `tests/test_validation.py`: + +```python +from mcp_awareness.validation import SchemaInUseError, assert_schema_deletable + + +class _CounterStore: + """Stub exposing count_records_referencing.""" + + def __init__(self, count: int, ids: list[str]): + self._count = count + self._ids = ids + + def count_records_referencing(self, owner_id, schema_logical_key): + return (self._count, self._ids) + + +def test_assert_schema_deletable_passes_with_zero_refs(): + assert_schema_deletable(_CounterStore(0, []), "alice", "s:test:1.0.0") + + +def test_assert_schema_deletable_raises_with_refs(): + with pytest.raises(SchemaInUseError) as excinfo: + assert_schema_deletable(_CounterStore(3, ["id1", "id2", "id3"]), "alice", "s:test:1.0.0") + assert excinfo.value.total_count == 3 + assert excinfo.value.referencing_records == ["id1", "id2", "id3"] +``` + +- [ ] **Step 2: Verify failure** + +Run: `pytest tests/test_validation.py -v -k assert_schema_deletable` +Expected: FAIL — missing import. + +- [ ] **Step 3: Implement** + +Append to `src/mcp_awareness/validation.py`: + +```python +class SchemaInUseError(Exception): + """Raised when a schema cannot be deleted because records reference it. + + Callers at the MCP boundary translate this into a structured schema_in_use + error response with the referencing_records list and total_count. + """ + + def __init__(self, total_count: int, referencing_records: list[str]): + self.total_count = total_count + self.referencing_records = referencing_records + super().__init__( + f"Cannot delete schema: {total_count} record(s) still reference it" + ) + + +class _RefCounter(Protocol): + def count_records_referencing(self, owner_id: str, schema_logical_key: str) -> tuple[int, list[str]]: + ... + + +def assert_schema_deletable( + store: _RefCounter, owner_id: str, schema_logical_key: str +) -> None: + """Raise SchemaInUseError if any non-deleted records reference this schema.""" + count, ids = store.count_records_referencing(owner_id, schema_logical_key) + if count > 0: + raise SchemaInUseError(total_count=count, referencing_records=ids) +``` + +- [ ] **Step 4: Run tests** + +Run: `pytest tests/test_validation.py -v` +Expected: all pass. + +- [ ] **Step 5: Commit** + +```bash +git add src/mcp_awareness/validation.py tests/test_validation.py +git commit -m "feat: add assert_schema_deletable and SchemaInUseError" +``` + +--- + +## Task 10: Alembic migration — seed `_system` user + +**Files:** +- Create: `alembic/versions/_add_system_user_for_schemas.py` + +- [ ] **Step 1: Determine next revision id** + +Run: `alembic current` (needs DB — or read head from `alembic/versions/` by the most recent `down_revision` chain). The latest is `l7g8h9i0j1k2_backfill_entry_language`. Pick the next id in the project's scheme — e.g., `m8h9i0j1k2l3`. + +- [ ] **Step 2: Create the migration file** + +Create `alembic/versions/m8h9i0j1k2l3_add_system_user_for_schemas.py`: + +```python +# AGPL preamble — copy from alembic/versions/l7g8h9i0j1k2_backfill_entry_language.py + +"""add _system user for system-owned schemas + +Revision ID: m8h9i0j1k2l3 +Revises: l7g8h9i0j1k2 +Create Date: 2026-04-13 00:00:00.000000 + +""" + +from __future__ import annotations + +from collections.abc import Sequence + +from alembic import op + +revision: str = "m8h9i0j1k2l3" +down_revision: str | Sequence[str] | None = "l7g8h9i0j1k2" +branch_labels: str | Sequence[str] | None = None +depends_on: str | Sequence[str] | None = None + + +def upgrade() -> None: + """Seed the _system user for system-owned schema entries. + + Idempotent — ON CONFLICT DO NOTHING lets the migration run multiple + times safely (e.g., after a stamp-and-reapply). + """ + op.execute( + "INSERT INTO users (id, display_name) " + "VALUES ('_system', 'System-managed schemas') " + "ON CONFLICT (id) DO NOTHING" + ) + + +def downgrade() -> None: + """Remove the _system user. + + Will fail if any entries still reference owner_id='_system'. Operators + must soft-delete or re-home such entries before downgrade. + """ + op.execute("DELETE FROM users WHERE id = '_system'") +``` + +- [ ] **Step 3: Test the migration end-to-end** + +Run: `mcp-awareness-migrate` against a local Postgres (the testcontainers instance or a scratch DB). +Expected: exits 0 with "Migrations complete."; `SELECT id FROM users WHERE id='_system'` returns a row. + +Run: `mcp-awareness-migrate --downgrade l7g8h9i0j1k2` +Expected: exits 0; `_system` row removed. + +Run: `mcp-awareness-migrate` again (re-upgrade) to confirm re-applies cleanly. + +- [ ] **Step 4: Add a quick idempotence test** + +Since Alembic testing is typically integration-level, add a smoke test to `tests/test_store.py`: + +```python +def test_system_user_exists_after_migration(store): + """The conftest fixture inserts _system — verifies the migration logic is ON CONFLICT safe.""" + # Fixture already inserted; insert again to prove ON CONFLICT DO NOTHING semantics + with store._conn_pool.connection() as conn, conn.cursor() as cur: + cur.execute( + "INSERT INTO users (id, display_name) VALUES ('_system', 'Re-insert') " + "ON CONFLICT (id) DO NOTHING" + ) + conn.commit() + cur.execute("SELECT COUNT(*) FROM users WHERE id = '_system'") + assert cur.fetchone()[0] == 1 +``` + +- [ ] **Step 5: Commit** + +```bash +git add alembic/versions/m8h9i0j1k2l3_add_system_user_for_schemas.py tests/test_store.py +git commit -m "feat: add migration seeding _system user for shared schemas" +``` + +--- + +## Task 11: MCP tool — `register_schema` + +**Files:** +- Modify: `src/mcp_awareness/tools.py` +- Create: `tests/test_tools_schema_record.py` + +- [ ] **Step 1: Write failing integration tests** + +Create `tests/test_tools_schema_record.py`: + +```python +# AGPL preamble — copy from tests/test_store.py + +"""Integration tests for schema/record MCP tool handlers. + +Uses testcontainers Postgres + direct tool-function calls via the server's +contextvar-based owner resolution. +""" + +from __future__ import annotations + +import json + +import pytest + +from mcp_awareness.schema import EntryType + + +TEST_OWNER = "test-owner" + + +@pytest.fixture +def configured_server(store, monkeypatch): + """Wire the FastMCP server to the testcontainers store.""" + import mcp_awareness.server as srv + monkeypatch.setattr(srv, "store", store) + # Set owner contextvar for all subsequent tool calls + from mcp_awareness.server import current_owner # or wherever the contextvar lives + token = current_owner.set(TEST_OWNER) + yield srv + current_owner.reset(token) + + +@pytest.mark.asyncio +async def test_register_schema_happy_path(configured_server): + from mcp_awareness.tools import register_schema + + response = await register_schema( + source="test", + tags=["schema"], + description="test schema", + family="schema:test-thing", + version="1.0.0", + schema={"type": "object", "properties": {"name": {"type": "string"}}, "required": ["name"]}, + ) + body = json.loads(response) + assert body["status"] == "ok" + assert body["logical_key"] == "schema:test-thing:1.0.0" + assert "id" in body + + +@pytest.mark.asyncio +async def test_register_schema_rejects_invalid_schema(configured_server): + from mcp.server.fastmcp.exceptions import ToolError + from mcp_awareness.tools import register_schema + + with pytest.raises(ToolError) as excinfo: + await register_schema( + source="test", + tags=["schema"], + description="bad schema", + family="schema:bad", + version="1.0.0", + schema={"type": "strng"}, # typo + ) + err = json.loads(excinfo.value.args[0])["error"] + assert err["code"] == "invalid_schema" + + +@pytest.mark.asyncio +async def test_register_schema_rejects_duplicate_family_version(configured_server): + from mcp.server.fastmcp.exceptions import ToolError + from mcp_awareness.tools import register_schema + + await register_schema( + source="test", tags=[], description="v1", + family="schema:dup", version="1.0.0", + schema={"type": "object"}, + ) + with pytest.raises(ToolError) as excinfo: + await register_schema( + source="test", tags=[], description="v1 again", + family="schema:dup", version="1.0.0", + schema={"type": "object"}, + ) + err = json.loads(excinfo.value.args[0])["error"] + assert err["code"] == "schema_already_exists" +``` + +- [ ] **Step 2: Verify failure** + +Run: `pytest tests/test_tools_schema_record.py -v -k register_schema` +Expected: FAIL — `register_schema` does not exist in `tools.py`. + +- [ ] **Step 3: Implement the tool** + +Add to `src/mcp_awareness/tools.py` (follow the exact pattern of `remember` for decorator order, docstring shape, and use of `_srv.mcp.tool()`, `_timed`, embedding submission, etc.): + +```python +@_srv.mcp.tool() +@_timed +async def register_schema( + source: str, + tags: list[str], + description: str, + family: str, + version: str, + schema: dict[str, Any], + learned_from: str = "conversation", + language: str | None = None, +) -> str: + """Register a new JSON Schema entry for later use by records. + + Validates the schema body against JSON Schema Draft 2020-12 meta-schema + on write. Family + version are combined into the entry's logical_key + (schema:family:version); each version is a separate entry. Schemas are + absolutely immutable once registered — to change one, register a new + version and (if no records reference the old one) delete it. + + Returns: + JSON: {"status": "ok", "id": "", "logical_key": ""} + + If you receive an unstructured error, the failure is in the transport + or platform layer, not in awareness.""" + from jsonschema import exceptions as jse + from mcp_awareness.validation import compose_schema_logical_key, validate_schema_body + + # Validate family / version + if not family or ":" in family.split(":", 1)[0]: + # Explicit invalid_parameter pattern + _error_response( + "invalid_parameter", + "family must be a non-empty string", + retryable=False, param="family", value=family, + ) + if not version: + _error_response( + "invalid_parameter", "version must be a non-empty string", + retryable=False, param="version", value=version, + ) + + # Validate the schema body + try: + validate_schema_body(schema) + except jse.SchemaError as e: + _error_response( + "invalid_schema", + f"Schema does not conform to JSON Schema Draft 2020-12: {e.message}", + retryable=False, + schema_error_path="/" + "/".join(str(p) for p in e.absolute_path), + detail=str(e.message), + ) + except jse.JsonSchemaException as e: + _error_response( + "validation_error", f"Unexpected schema validation error: {e}", + retryable=False, + ) + + logical_key = compose_schema_logical_key(family, version) + now = now_utc() + data: dict[str, Any] = { + "family": family, + "version": version, + "schema": schema, + "description": description, + "learned_from": learned_from, + } + text_for_detect = compose_detection_text("schema", data) + resolved_lang = resolve_language(explicit=language, text_for_detection=text_for_detect) + _check_unsupported_language(text_for_detect, resolved_lang) + + entry = Entry( + id=make_id(), + type=EntryType.SCHEMA, + source=source, + tags=tags, + created=now, + updated=None, + expires=None, + data=data, + logical_key=logical_key, + owner_id=_srv._current_owner(), # or existing helper + language=resolved_lang, + ) + try: + _srv.store.save_entry(entry) + except _UniqueViolation as e: # existing pattern for 23505 translation + _error_response( + "schema_already_exists", + f"Schema {logical_key} already exists in source {source}", + retryable=False, logical_key=logical_key, existing_id=e.existing_id, + ) + + _srv._generate_embedding(entry) + return json.dumps({"status": "ok", "id": entry.id, "logical_key": logical_key}) +``` + +Match the existing unique-constraint translation pattern (check `remember` for how logical_key collisions are surfaced — it uses upsert semantics, but for schemas we want *rejection* not upsert). + +- [ ] **Step 4: Run tests** + +Run: `pytest tests/test_tools_schema_record.py -v -k register_schema` +Expected: all pass. + +- [ ] **Step 5: Commit** + +```bash +git add src/mcp_awareness/tools.py tests/test_tools_schema_record.py +git commit -m "feat: add register_schema MCP tool" +``` + +--- + +## Task 12: MCP tool — `create_record` + +**Files:** +- Modify: `src/mcp_awareness/tools.py` +- Modify: `tests/test_tools_schema_record.py` + +- [ ] **Step 1: Write failing tests** + +Append to `tests/test_tools_schema_record.py`: + +```python +@pytest.mark.asyncio +async def test_create_record_happy_path(configured_server): + from mcp_awareness.tools import create_record, register_schema + + await register_schema( + source="test", tags=[], description="s", + family="schema:thing", version="1.0.0", + schema={"type": "object", "properties": {"name": {"type": "string"}}, "required": ["name"]}, + ) + response = await create_record( + source="test", tags=[], description="a thing", + logical_key="thing-one", + schema_ref="schema:thing", schema_version="1.0.0", + content={"name": "widget"}, + ) + body = json.loads(response) + assert body["status"] == "ok" + assert body["action"] == "created" + assert "id" in body + + +@pytest.mark.asyncio +async def test_create_record_rejects_unknown_schema(configured_server): + from mcp.server.fastmcp.exceptions import ToolError + from mcp_awareness.tools import create_record + + with pytest.raises(ToolError) as excinfo: + await create_record( + source="test", tags=[], description="orphan", + logical_key="thing-one", + schema_ref="schema:does-not-exist", schema_version="1.0.0", + content={"name": "widget"}, + ) + err = json.loads(excinfo.value.args[0])["error"] + assert err["code"] == "schema_not_found" + assert err["searched_owners"] == [TEST_OWNER, "_system"] + + +@pytest.mark.asyncio +async def test_create_record_surfaces_validation_errors(configured_server): + from mcp.server.fastmcp.exceptions import ToolError + from mcp_awareness.tools import create_record, register_schema + + await register_schema( + source="test", tags=[], description="s", + family="schema:person", version="1.0.0", + schema={"type": "object", "properties": {"name": {"type": "string"}, "age": {"type": "integer"}}, "required": ["name"]}, + ) + with pytest.raises(ToolError) as excinfo: + await create_record( + source="test", tags=[], description="bad person", + logical_key="p1", + schema_ref="schema:person", schema_version="1.0.0", + content={"age": "thirty"}, # missing name; wrong age type + ) + err = json.loads(excinfo.value.args[0])["error"] + assert err["code"] == "validation_failed" + validators = {ve["validator"] for ve in err["validation_errors"]} + assert "required" in validators + assert "type" in validators + + +@pytest.mark.asyncio +async def test_create_record_upsert_on_same_logical_key(configured_server): + from mcp_awareness.tools import create_record, register_schema + + await register_schema( + source="test", tags=[], description="s", + family="schema:thing", version="1.0.0", + schema={"type": "object"}, + ) + r1 = json.loads(await create_record( + source="test", tags=[], description="v1", + logical_key="thing-one", + schema_ref="schema:thing", schema_version="1.0.0", + content={"v": 1}, + )) + assert r1["action"] == "created" + r2 = json.loads(await create_record( + source="test", tags=[], description="v2", + logical_key="thing-one", + schema_ref="schema:thing", schema_version="1.0.0", + content={"v": 2}, + )) + assert r2["action"] == "updated" + assert r2["id"] == r1["id"] + + +@pytest.mark.asyncio +async def test_create_record_uses_system_schema_fallback(configured_server): + """A record can reference a schema owned by _system, not the caller.""" + from mcp_awareness.tools import create_record + + # Seed _system schema directly via store (not via tool, since tool always writes to caller owner) + from mcp_awareness.schema import Entry, make_id, now_utc + _srv = configured_server + _srv.store.save_entry(Entry( + id=make_id(), type=EntryType.SCHEMA, source="system", + tags=["system"], created=now_utc(), updated=None, expires=None, + data={ + "family": "schema:system-thing", "version": "1.0.0", + "schema": {"type": "object"}, + "description": "system-seeded", "learned_from": "cli-bootstrap", + }, + logical_key="schema:system-thing:1.0.0", owner_id="_system", + )) + response = await create_record( + source="test", tags=[], description="mine", + logical_key="mine-1", + schema_ref="schema:system-thing", schema_version="1.0.0", + content={"any": "thing"}, + ) + body = json.loads(response) + assert body["status"] == "ok" +``` + +- [ ] **Step 2: Verify failure** + +Run: `pytest tests/test_tools_schema_record.py -v -k create_record` +Expected: FAIL — `create_record` not defined. + +- [ ] **Step 3: Implement** + +Add to `src/mcp_awareness/tools.py` following the existing `remember` pattern (especially for logical_key upsert behavior): + +```python +@_srv.mcp.tool() +@_timed +async def create_record( + source: str, + tags: list[str], + description: str, + logical_key: str, + schema_ref: str, + schema_version: str, + content: Any, + learned_from: str = "conversation", + language: str | None = None, +) -> str: + """Create or upsert a record validated against a registered schema. + + Resolves the target schema by schema_ref + schema_version (prefers + caller-owned, falls back to _system). Validates content against the + schema on write; rejects with a structured validation_failed error + listing every validation error. Upserts on matching (source, logical_key) + — same logical_key means update in place with changelog. + + Returns: + JSON: {"status": "ok", "id": "", "action": "created" | "updated"}""" + from jsonschema import exceptions as jse + from mcp_awareness.validation import resolve_schema, validate_record_content + + resolved = resolve_schema(_srv.store, _srv._current_owner(), schema_ref, schema_version) + if resolved is None: + _error_response( + "schema_not_found", + f"No schema {schema_ref}:{schema_version} in your namespace or _system", + retryable=False, + schema_ref=schema_ref, schema_version=schema_version, + searched_owners=[_srv._current_owner(), "_system"], + ) + + schema_body = resolved.data["schema"] + try: + errors = validate_record_content(schema_body, content) + except jse.JsonSchemaException as e: + _error_response( + "validation_error", f"Unexpected content validation error: {e}", + retryable=False, + ) + if errors: + n = errors[-1].get("total_errors") if errors[-1].get("truncated") else len(errors) + extras: dict[str, Any] = { + "schema_ref": schema_ref, + "schema_version": schema_version, + "validation_errors": errors, + } + if errors[-1].get("truncated"): + extras["truncated"] = True + extras["total_errors"] = errors[-1]["total_errors"] + _error_response( + "validation_failed", + f"Record content does not conform to schema {schema_ref}:{schema_version} ({n} errors)", + retryable=False, **extras, + ) + + # Existing logical_key upsert path (mirror `remember`'s approach) + now = now_utc() + data: dict[str, Any] = { + "schema_ref": schema_ref, + "schema_version": schema_version, + "content": content, + "description": description, + "learned_from": learned_from, + } + text_for_detect = compose_detection_text("record", data) + resolved_lang = resolve_language(explicit=language, text_for_detection=text_for_detect) + _check_unsupported_language(text_for_detect, resolved_lang) + + entry = Entry( + id=make_id(), + type=EntryType.RECORD, + source=source, + tags=tags, + created=now, + updated=None, + expires=None, + data=data, + logical_key=logical_key, + owner_id=_srv._current_owner(), + language=resolved_lang, + ) + # Upsert via existing store method that returns (entry, action) — mirror remember + saved, action = _srv.store.upsert_by_logical_key(entry) + _srv._generate_embedding(saved) + return json.dumps({"status": "ok", "id": saved.id, "action": action}) +``` + +The exact shape of `upsert_by_logical_key` is whatever `remember` calls today — copy that. + +- [ ] **Step 4: Run tests** + +Run: `pytest tests/test_tools_schema_record.py -v -k create_record` +Expected: all pass. + +- [ ] **Step 5: Commit** + +```bash +git add src/mcp_awareness/tools.py tests/test_tools_schema_record.py +git commit -m "feat: add create_record MCP tool with schema validation and _system fallback" +``` + +--- + +## Task 13: Update `update_entry` handler for schema/record branching + +**Files:** +- Modify: `src/mcp_awareness/tools.py` (function `update_entry`, around line 533) +- Modify: `tests/test_tools_schema_record.py` + +- [ ] **Step 1: Write failing tests** + +Append to `tests/test_tools_schema_record.py`: + +```python +@pytest.mark.asyncio +async def test_update_entry_rejects_schema_update(configured_server): + from mcp.server.fastmcp.exceptions import ToolError + from mcp_awareness.tools import register_schema, update_entry + + resp = json.loads(await register_schema( + source="test", tags=[], description="s", + family="schema:thing", version="1.0.0", + schema={"type": "object"}, + )) + with pytest.raises(ToolError) as excinfo: + await update_entry(entry_id=resp["id"], description="new desc") + err = json.loads(excinfo.value.args[0])["error"] + assert err["code"] == "schema_immutable" + + +@pytest.mark.asyncio +async def test_update_entry_record_content_revalidates(configured_server): + from mcp_awareness.tools import create_record, register_schema, update_entry + + await register_schema( + source="test", tags=[], description="s", + family="schema:thing", version="1.0.0", + schema={"type": "object", "properties": {"name": {"type": "string"}}, "required": ["name"]}, + ) + r = json.loads(await create_record( + source="test", tags=[], description="r", + logical_key="r1", + schema_ref="schema:thing", schema_version="1.0.0", + content={"name": "good"}, + )) + # Valid update — passes re-validation + await update_entry(entry_id=r["id"], content={"name": "still-good"}) + + +@pytest.mark.asyncio +async def test_update_entry_record_content_rejects_invalid(configured_server): + from mcp.server.fastmcp.exceptions import ToolError + from mcp_awareness.tools import create_record, register_schema, update_entry + + await register_schema( + source="test", tags=[], description="s", + family="schema:thing", version="1.0.0", + schema={"type": "object", "properties": {"name": {"type": "string"}}, "required": ["name"]}, + ) + r = json.loads(await create_record( + source="test", tags=[], description="r", + logical_key="r1", + schema_ref="schema:thing", schema_version="1.0.0", + content={"name": "good"}, + )) + with pytest.raises(ToolError) as excinfo: + await update_entry(entry_id=r["id"], content={"name": 123}) # wrong type + err = json.loads(excinfo.value.args[0])["error"] + assert err["code"] == "validation_failed" + + +@pytest.mark.asyncio +async def test_update_entry_record_non_content_skips_revalidation(configured_server): + from mcp_awareness.tools import create_record, register_schema, update_entry + + await register_schema( + source="test", tags=[], description="s", + family="schema:thing", version="1.0.0", + schema={"type": "object", "properties": {"name": {"type": "string"}}, "required": ["name"]}, + ) + r = json.loads(await create_record( + source="test", tags=[], description="orig", + logical_key="r1", + schema_ref="schema:thing", schema_version="1.0.0", + content={"name": "good"}, + )) + # Description-only change — no re-validation, even though pre-existing content would still pass + await update_entry(entry_id=r["id"], description="updated desc") + # No exception raised + + +@pytest.mark.asyncio +async def test_update_entry_record_pin_immutable(configured_server): + # This test only applies if update_entry exposes schema_ref/schema_version params; + # if it doesn't, the pin is already immutable by default. See Step 3 for the + # decision — we're NOT adding schema_ref/schema_version to update_entry's + # public surface, so this test is omitted. + pass +``` + +- [ ] **Step 2: Verify failure** + +Run: `pytest tests/test_tools_schema_record.py -v -k update_entry` +Expected: the `schema_immutable` and `validation_failed` tests fail (current update_entry accepts any entry without branching). + +- [ ] **Step 3: Implement branching** + +Edit `src/mcp_awareness/tools.py` inside the `update_entry` handler, after the entry is loaded by ID and before it's written back: + +```python +# --- New: type-specific branching --- +from mcp_awareness.validation import resolve_schema, validate_record_content + +if entry.type == EntryType.SCHEMA: + _error_response( + "schema_immutable", + "Schemas cannot be updated. Register a new version instead.", + retryable=False, + ) + +if entry.type == EntryType.RECORD and content is not None: + # content is being updated — re-resolve pinned schema and re-validate + schema_ref = entry.data["schema_ref"] + schema_version = entry.data["schema_version"] + resolved = resolve_schema(_srv.store, entry.owner_id, schema_ref, schema_version) + if resolved is None: + # The schema the record pins to has been soft-deleted — unusual, but possible + _error_response( + "schema_not_found", + f"Cannot re-validate: schema {schema_ref}:{schema_version} not found", + retryable=False, + schema_ref=schema_ref, schema_version=schema_version, + searched_owners=[entry.owner_id, "_system"], + ) + errors = validate_record_content(resolved.data["schema"], content) + if errors: + n = errors[-1].get("total_errors") if errors[-1].get("truncated") else len(errors) + extras = { + "schema_ref": schema_ref, "schema_version": schema_version, + "validation_errors": errors, + } + if errors[-1].get("truncated"): + extras["truncated"] = True + extras["total_errors"] = errors[-1]["total_errors"] + _error_response( + "validation_failed", + f"Record content does not conform to schema {schema_ref}:{schema_version} ({n} errors)", + retryable=False, **extras, + ) +# --- end branching --- +``` + +Note: `update_entry` should NOT accept `schema_ref`/`schema_version`/`family`/`version` params — those are out of scope for the update API. If any such params exist in the current signature, leave them out of the new tools' invocation paths. The test `test_update_entry_record_pin_immutable` is skipped because the pin fields aren't exposed. + +- [ ] **Step 4: Run tests** + +Run: `pytest tests/test_tools_schema_record.py -v -k update_entry` +Expected: all pass. + +- [ ] **Step 5: Commit** + +```bash +git add src/mcp_awareness/tools.py tests/test_tools_schema_record.py +git commit -m "feat: update_entry enforces schema immutability and record re-validation" +``` + +--- + +## Task 14: Update `delete_entry` for schema deletion protection + +**Files:** +- Modify: `src/mcp_awareness/tools.py` (function `delete_entry`) +- Modify: `tests/test_tools_schema_record.py` + +- [ ] **Step 1: Write failing tests** + +Append to `tests/test_tools_schema_record.py`: + +```python +@pytest.mark.asyncio +async def test_delete_entry_schema_with_no_records_succeeds(configured_server): + from mcp_awareness.tools import delete_entry, register_schema + + resp = json.loads(await register_schema( + source="test", tags=[], description="s", + family="schema:thing", version="1.0.0", + schema={"type": "object"}, + )) + await delete_entry(entry_id=resp["id"]) # no records; succeeds + # Verify soft-deleted + assert configured_server.store.find_schema(TEST_OWNER, "schema:thing:1.0.0") is None + + +@pytest.mark.asyncio +async def test_delete_entry_schema_with_records_rejected(configured_server): + from mcp.server.fastmcp.exceptions import ToolError + from mcp_awareness.tools import create_record, delete_entry, register_schema + + resp = json.loads(await register_schema( + source="test", tags=[], description="s", + family="schema:thing", version="1.0.0", + schema={"type": "object"}, + )) + await create_record( + source="test", tags=[], description="r", + logical_key="r1", + schema_ref="schema:thing", schema_version="1.0.0", + content={}, + ) + with pytest.raises(ToolError) as excinfo: + await delete_entry(entry_id=resp["id"]) + err = json.loads(excinfo.value.args[0])["error"] + assert err["code"] == "schema_in_use" + assert len(err["referencing_records"]) == 1 + + +@pytest.mark.asyncio +async def test_delete_entry_schema_allowed_after_records_deleted(configured_server): + from mcp_awareness.tools import create_record, delete_entry, register_schema + + schema_resp = json.loads(await register_schema( + source="test", tags=[], description="s", + family="schema:thing", version="1.0.0", + schema={"type": "object"}, + )) + record_resp = json.loads(await create_record( + source="test", tags=[], description="r", + logical_key="r1", + schema_ref="schema:thing", schema_version="1.0.0", + content={}, + )) + await delete_entry(entry_id=record_resp["id"]) + await delete_entry(entry_id=schema_resp["id"]) # no live refs; succeeds +``` + +- [ ] **Step 2: Verify failure** + +Run: `pytest tests/test_tools_schema_record.py -v -k delete_entry` +Expected: `schema_with_records_rejected` fails (no protection yet). + +- [ ] **Step 3: Implement branching** + +Edit `src/mcp_awareness/tools.py` inside `delete_entry`, after the entry is loaded: + +```python +from mcp_awareness.validation import SchemaInUseError, assert_schema_deletable + +if entry.type == EntryType.SCHEMA: + try: + assert_schema_deletable(_srv.store, entry.owner_id, entry.logical_key) + except SchemaInUseError as e: + _error_response( + "schema_in_use", + f"Cannot delete schema {entry.logical_key}: {e.total_count} record(s) reference it", + retryable=False, + referencing_records=e.referencing_records, + total_count=e.total_count, + ) +# Existing soft-delete path follows +``` + +- [ ] **Step 4: Run tests** + +Run: `pytest tests/test_tools_schema_record.py -v` +Expected: all pass. + +- [ ] **Step 5: Commit** + +```bash +git add src/mcp_awareness/tools.py tests/test_tools_schema_record.py +git commit -m "feat: delete_entry protects schemas referenced by live records" +``` + +--- + +## Task 15: CLI tool — `mcp-awareness-register-schema` + +**Files:** +- Create: `src/mcp_awareness/cli_register_schema.py` +- Create: `tests/test_cli_register_schema.py` +- Modify: `pyproject.toml` + +- [ ] **Step 1: Write failing test** + +Create `tests/test_cli_register_schema.py`: + +```python +# AGPL preamble + +"""Tests for mcp-awareness-register-schema CLI.""" + +from __future__ import annotations + +import json +import subprocess +import sys +import tempfile + +import pytest + + +@pytest.fixture +def system_schema_file(): + with tempfile.NamedTemporaryFile(suffix=".json", mode="w", delete=False) as f: + json.dump({"type": "object", "properties": {"name": {"type": "string"}}}, f) + path = f.name + yield path + + +def test_cli_register_schema_happy_path(pg_dsn, system_schema_file, monkeypatch, capsys): + """End-to-end: CLI writes a _system schema via direct store access.""" + from mcp_awareness.cli_register_schema import main + + monkeypatch.setenv("AWARENESS_DATABASE_URL", pg_dsn) + sys_argv = [ + "mcp-awareness-register-schema", + "--system", + "--family", "schema:cli-test", + "--version", "1.0.0", + "--schema-file", system_schema_file, + "--source", "awareness-built-in", + "--tags", "cli,test", + "--description", "CLI-registered test schema", + ] + monkeypatch.setattr("sys.argv", sys_argv) + + main() + captured = capsys.readouterr() + body = json.loads(captured.out.strip()) + assert body["status"] == "ok" + assert body["logical_key"] == "schema:cli-test:1.0.0" + + # Verify entry exists in DB under _system owner + from mcp_awareness.postgres_store import PostgresStore + store = PostgresStore(pg_dsn) + entry = store.find_schema("any-caller", "schema:cli-test:1.0.0") + assert entry is not None + assert entry.owner_id == "_system" + assert entry.data["learned_from"] == "cli-bootstrap" + + +def test_cli_register_schema_rejects_invalid_schema_file(pg_dsn, monkeypatch, capsys): + from mcp_awareness.cli_register_schema import main + + with tempfile.NamedTemporaryFile(suffix=".json", mode="w", delete=False) as f: + json.dump({"type": "strng"}, f) # invalid + path = f.name + + monkeypatch.setenv("AWARENESS_DATABASE_URL", pg_dsn) + monkeypatch.setattr("sys.argv", [ + "mcp-awareness-register-schema", "--system", + "--family", "schema:bad", "--version", "1.0.0", + "--schema-file", path, "--source", "test", "--tags", "", "--description", "bad", + ]) + with pytest.raises(SystemExit) as excinfo: + main() + assert excinfo.value.code == 1 + captured = capsys.readouterr() + assert "invalid_schema" in captured.err +``` + +- [ ] **Step 2: Verify failure** + +Run: `pytest tests/test_cli_register_schema.py -v` +Expected: FAIL — module not found. + +- [ ] **Step 3: Implement** + +Create `src/mcp_awareness/cli_register_schema.py`: + +```python +# AGPL preamble + +"""CLI for registering _system-owned schema entries. + +Bypasses MCP entirely — operator tool, run once per built-in schema at +deploy/bootstrap time. No MCP auth, no middleware, direct PostgresStore +access. +""" + +from __future__ import annotations + +import argparse +import json +import os +import sys +from pathlib import Path + + +def main() -> None: + parser = argparse.ArgumentParser( + description="Register a _system-owned schema entry (operator bootstrap only).", + ) + parser.add_argument("--system", action="store_true", required=True, + help="Required. Confirms the caller intends to write to the _system owner.") + parser.add_argument("--family", required=True, help="Schema family (e.g., schema:edge-manifest)") + parser.add_argument("--version", required=True, help="Schema version (e.g., 1.0.0)") + parser.add_argument("--schema-file", required=True, type=Path, + help="Path to JSON file containing the Draft 2020-12 schema body") + parser.add_argument("--source", required=True, help="Source field for the entry") + parser.add_argument("--tags", default="", + help="Comma-separated tags (empty string for none)") + parser.add_argument("--description", required=True, help="Entry description") + args = parser.parse_args() + + # Read + parse schema file + if not args.schema_file.exists(): + print(json.dumps({"error": {"code": "file_not_found", "message": str(args.schema_file)}}), + file=sys.stderr) + sys.exit(1) + try: + schema_body = json.loads(args.schema_file.read_text()) + except json.JSONDecodeError as e: + print(json.dumps({"error": {"code": "invalid_json", "message": str(e)}}), file=sys.stderr) + sys.exit(1) + + # Meta-schema validation + from jsonschema import exceptions as jse + from mcp_awareness.validation import compose_schema_logical_key, validate_schema_body + try: + validate_schema_body(schema_body) + except jse.SchemaError as e: + print(json.dumps({"error": { + "code": "invalid_schema", "message": str(e.message), + "schema_error_path": "/" + "/".join(str(p) for p in e.absolute_path), + }}), file=sys.stderr) + sys.exit(1) + + # DB connection + database_url = os.environ.get("AWARENESS_DATABASE_URL", "") + if not database_url: + print(json.dumps({"error": {"code": "missing_env", "message": "AWARENESS_DATABASE_URL required"}}), + file=sys.stderr) + sys.exit(1) + + from mcp_awareness.postgres_store import PostgresStore + from mcp_awareness.schema import Entry, EntryType, make_id, now_utc + + store = PostgresStore(database_url) + logical_key = compose_schema_logical_key(args.family, args.version) + tags = [t.strip() for t in args.tags.split(",") if t.strip()] + + entry = Entry( + id=make_id(), + type=EntryType.SCHEMA, + source=args.source, + tags=tags, + created=now_utc(), + updated=None, + expires=None, + data={ + "family": args.family, + "version": args.version, + "schema": schema_body, + "description": args.description, + "learned_from": "cli-bootstrap", + }, + logical_key=logical_key, + owner_id="_system", + language="english", + ) + + try: + store.save_entry(entry) + except Exception as e: + print(json.dumps({"error": {"code": "store_error", "message": str(e)}}), file=sys.stderr) + sys.exit(1) + + print(json.dumps({"status": "ok", "id": entry.id, "logical_key": logical_key})) + sys.exit(0) + + +if __name__ == "__main__": + main() +``` + +- [ ] **Step 4: Register console script** + +Edit `pyproject.toml`: + +```toml +[project.scripts] +# ... existing scripts ... +mcp-awareness-register-schema = "mcp_awareness.cli_register_schema:main" +``` + +- [ ] **Step 5: Reinstall and test** + +Run: `pip install -e ".[dev]"` +Run: `pytest tests/test_cli_register_schema.py -v` +Expected: all pass. + +- [ ] **Step 6: Commit** + +```bash +git add src/mcp_awareness/cli_register_schema.py tests/test_cli_register_schema.py pyproject.toml +git commit -m "feat: add mcp-awareness-register-schema CLI for _system schemas" +``` + +--- + +## Task 16: Cross-owner isolation tests + +**Files:** +- Modify: `tests/test_tools_schema_record.py` + +- [ ] **Step 1: Add isolation tests** + +Append to `tests/test_tools_schema_record.py`: + +```python +@pytest.mark.asyncio +async def test_cross_owner_schema_invisible(configured_server, store): + """Owner A registers a schema; Owner B cannot resolve it.""" + from mcp_awareness.server import current_owner + from mcp_awareness.tools import create_record, register_schema + from mcp.server.fastmcp.exceptions import ToolError + + # Owner A (default TEST_OWNER) registers + await register_schema( + source="test", tags=[], description="A's schema", + family="schema:mine", version="1.0.0", + schema={"type": "object"}, + ) + + # Switch to Owner B + token = current_owner.set("other-owner") + try: + with pytest.raises(ToolError) as excinfo: + await create_record( + source="test", tags=[], description="B's attempt", + logical_key="r-b", schema_ref="schema:mine", schema_version="1.0.0", + content={}, + ) + err = json.loads(excinfo.value.args[0])["error"] + assert err["code"] == "schema_not_found" + finally: + current_owner.reset(token) + + +@pytest.mark.asyncio +async def test_both_owners_see_system_schema(configured_server, store): + """Both A and B can use a _system schema; their records don't cross.""" + from mcp_awareness.schema import Entry, make_id, now_utc + from mcp_awareness.server import current_owner + from mcp_awareness.tools import create_record + + # Seed _system schema directly + store.save_entry(Entry( + id=make_id(), type=EntryType.SCHEMA, source="system", + tags=["system"], created=now_utc(), updated=None, expires=None, + data={ + "family": "schema:shared", "version": "1.0.0", + "schema": {"type": "object"}, + "description": "shared", "learned_from": "cli-bootstrap", + }, + logical_key="schema:shared:1.0.0", owner_id="_system", + )) + + # A writes a record + a_resp = json.loads(await create_record( + source="test", tags=[], description="A's record", + logical_key="rec-a", schema_ref="schema:shared", schema_version="1.0.0", + content={"who": "alice"}, + )) + + # Switch to B + token = current_owner.set("bob") + try: + b_resp = json.loads(await create_record( + source="test", tags=[], description="B's record", + logical_key="rec-b", schema_ref="schema:shared", schema_version="1.0.0", + content={"who": "bob"}, + )) + assert b_resp["status"] == "ok" + finally: + current_owner.reset(token) + + # A's record invisible to B — verified via the owner_id on each entry + # (the records are already isolated by owner_id on create) + a_entry = store.get_entry(TEST_OWNER, a_resp["id"]) # exists + assert a_entry is not None + # Call with bob's owner — returns None because RLS/owner filter excludes + # (if get_entry takes owner_id as arg, this is clean; otherwise use find) +``` + +- [ ] **Step 2: Run tests** + +Run: `pytest tests/test_tools_schema_record.py -v` +Expected: all pass. + +- [ ] **Step 3: Commit** + +```bash +git add tests/test_tools_schema_record.py +git commit -m "test: cross-owner isolation for schema/record tools" +``` + +--- + +## Task 17: Update CHANGELOG, README, data-dictionary, server instructions + +**Files:** +- Modify: `CHANGELOG.md` +- Modify: `README.md` +- Modify: `docs/data-dictionary.md` +- Modify: `src/mcp_awareness/instructions.md` + +- [ ] **Step 1: CHANGELOG entry** + +Add under `[Unreleased]`: + +```markdown +### Added +- Two new entry types: `schema` (JSON Schema Draft 2020-12 definition) and `record` (validated payload conforming to a schema). Tools: `register_schema`, `create_record`. Schemas are absolutely immutable after registration; records re-validate on content update. Schema deletion is blocked while live records reference a version. Per-owner storage with a shared `_system` fallback namespace for built-in schemas. +- New CLI: `mcp-awareness-register-schema` for operators to seed `_system`-owned schemas at deploy time. +- New migration: `_system` user seed (idempotent). + +### Dependencies +- Added `jsonschema>=4.26.0` as a runtime dependency. +``` + +- [ ] **Step 2: README updates** + +- Bump tool count in the "Implemented" section (search for "tools" to find it). +- Add a bullet to the tool list describing `register_schema` / `create_record`. +- Bump test count after the test-count check in Task 19. + +Exact text for the new tool bullet (match the style of neighbors): + +```markdown +- **`register_schema` / `create_record`** — define typed data contracts via JSON Schema Draft 2020-12; validate payloads server-side on write with structured error envelopes listing every validation failure. +``` + +- [ ] **Step 3: Data dictionary** + +Add entries to `docs/data-dictionary.md` for both types. Match existing entry format: + +```markdown +### `schema` +JSON Schema Draft 2020-12 definition. Schema body lives in `data.schema`; family + version in `data.family` + `data.version`; `logical_key` derived as `{family}:{version}`. Immutable after registration. + +**`data` fields:** +- `family` (string, required) — schema family identifier (e.g., `schema:edge-manifest`) +- `version` (string, required) — schema version (user-chosen semantic or sequential) +- `schema` (object, required) — JSON Schema Draft 2020-12 body +- `description` (string) — human-readable description +- `learned_from` (string) — platform that registered the schema + +### `record` +Validated data entry conforming to a referenced schema. Content in `data.content`; pinned schema in `data.schema_ref` + `data.schema_version`. Re-validated on content update. + +**`data` fields:** +- `schema_ref` (string, required) — target schema family (e.g., `schema:edge-manifest`) +- `schema_version` (string, required) — target schema version (exact pin, no "latest") +- `content` (any JSON value, required) — validated payload +- `description` (string) — human-readable description +- `learned_from` (string) — platform that created the record +``` + +- [ ] **Step 4: Server instructions** + +Append to `src/mcp_awareness/instructions.md` (or wherever server-level guidance lives): + +```markdown +When you need typed data contracts for edge providers, tag taxonomies, or any +shape that should be validated on write: register a schema via `register_schema` +(family + version + JSON Schema body), then write records via `create_record` +referencing `schema_ref` + `schema_version`. Schemas are immutable — bump the +version to evolve. Built-in shared schemas live in the `_system` namespace +seeded by the operator. +``` + +- [ ] **Step 5: Commit** + +```bash +git add CHANGELOG.md README.md docs/data-dictionary.md src/mcp_awareness/instructions.md +git commit -m "docs: document schema/record entry types, new tools, and CLI" +``` + +--- + +## Task 18: Pre-push verification (ruff, mypy, full test suite, coverage, test count) + +**Files:** none — pure verification. + +- [ ] **Step 1: Format** + +Run: `ruff format src/ tests/` +Expected: no changes or minor formatting only. + +- [ ] **Step 2: Lint** + +Run: `ruff check src/ tests/` +Expected: 0 errors. + +- [ ] **Step 3: Type check** + +Run: `mypy src/mcp_awareness/` +Expected: 0 errors in strict mode. + +- [ ] **Step 4: Full test suite with coverage** + +Run: `pytest --cov=src/mcp_awareness --cov-report=term-missing` +Expected: all tests pass; verify coverage on new modules: + +- `src/mcp_awareness/validation.py` — 100% (pure functions, all paths tested) +- `src/mcp_awareness/cli_register_schema.py` — cover happy path, invalid schema, missing env, bad JSON +- New branches in `tools.py` — cover all new error codes (`schema_immutable`, `validation_failed`, `schema_not_found`, `schema_in_use`, `record_schema_pin_immutable`, `invalid_schema`, `schema_already_exists`) + +If any line is uncovered, add a test case; never use `pragma: no cover`. + +- [ ] **Step 5: Update test count in README** + +Run: `pytest --collect-only -q | tail -3` to get exact count, then update the number in `README.md`. + +- [ ] **Step 6: Commit docs fix-up if test count changed** + +```bash +git add README.md +git commit -m "docs: update test count after schema/record tests" +``` + +- [ ] **Step 7: Push branch** + +```bash +git push -u origin feat/schema-record-entry-types +``` + +--- + +## Task 19: Open PR with QA section + +**Files:** PR body only. + +- [ ] **Step 1: Author PR body** + +Title: `feat: add schema and record entry types with JSON Schema validation` + +Body: + +```markdown +## Summary + +- Adds two new `EntryType` values (`schema`, `record`) with JSON Schema Draft 2020-12 validation on write. +- Per-owner storage with `_system` fallback for shared built-in schemas. +- Schemas are absolutely immutable after registration; records re-validate on content update. +- Schema deletion blocked while live records reference a version. +- New CLI tool `mcp-awareness-register-schema` for operator bootstrap of `_system`-owned schemas. +- Adds `jsonschema>=4.26.0` dependency. + +Closes #208. Spec: `docs/superpowers/specs/2026-04-13-schema-record-entry-types-design.md`. Plan: `docs/superpowers/plans/2026-04-13-schema-record-entry-types-plan.md`. + +## QA + +### Prerequisites + +- `pip install -e ".[dev]"` +- Deploy to QA test instance on alternate port (`AWARENESS_PORT=8421`) via `docker-compose.qa.yaml`. +- Run `mcp-awareness-migrate` against the QA DB to apply the `_system` user seed. + +### Manual tests (via MCP tools) + +1. - [ ] **Register a schema** + ``` + register_schema(source="qa-test", tags=["qa"], description="qa test schema", + family="schema:qa-thing", version="1.0.0", + schema={"type": "object", "properties": {"name": {"type": "string"}}, "required": ["name"]}) + ``` + Expected: `{"status":"ok","id":"","logical_key":"schema:qa-thing:1.0.0"}` + +2. - [ ] **Reject invalid schema (meta-schema check)** + ``` + register_schema(source="qa-test", tags=[], description="bad", + family="schema:bad", version="1.0.0", + schema={"type": "strng"}) + ``` + Expected: structured error with `code: "invalid_schema"`, `schema_error_path`, `detail`. + +3. - [ ] **Reject duplicate family+version** + Re-run step 1 exactly. Expected: `code: "schema_already_exists"`, `logical_key`, `existing_id`. + +4. - [ ] **Create a valid record** + ``` + create_record(source="qa-test", tags=[], description="a qa thing", + logical_key="qa-rec-1", schema_ref="schema:qa-thing", schema_version="1.0.0", + content={"name": "widget"}) + ``` + Expected: `{"status":"ok","id":"","action":"created"}` + +5. - [ ] **Reject record with invalid content (shows all errors)** + ``` + create_record(source="qa-test", tags=[], description="bad record", + logical_key="qa-rec-bad", schema_ref="schema:qa-thing", schema_version="1.0.0", + content={"unexpected": 42}) # missing required 'name' + ``` + Expected: `code: "validation_failed"`, `validation_errors` list with `path`, `validator`, `schema_path`. + +6. - [ ] **Upsert record via same logical_key** + Re-run step 4 with different content. Expected: `action: "updated"`, same `id` as step 4. + +7. - [ ] **Re-validation on record update (valid)** + ``` + update_entry(entry_id=, content={"name": "still-valid"}) + ``` + Expected: `{"status":"ok"}` (or existing update_entry response shape). + +8. - [ ] **Re-validation on record update (invalid → rejected)** + ``` + update_entry(entry_id=, content={"name": 123}) + ``` + Expected: `code: "validation_failed"`; record content unchanged (verify via `get_knowledge`). + +9. - [ ] **Schema immutability** + ``` + update_entry(entry_id=, description="new desc") + ``` + Expected: `code: "schema_immutable"`; schema unchanged. + +10. - [ ] **Schema deletion blocked by live records** + ``` + delete_entry(entry_id=) + ``` + Expected: `code: "schema_in_use"`, `referencing_records: [...]`, `total_count`. + +11. - [ ] **Schema deletion allowed after records deleted** + Delete the record from step 4 via `delete_entry(entry_id=)`, then retry step 10. + Expected: schema soft-deletes successfully. + +12. - [ ] **`_system` fallback works** + Via QA shell: `mcp-awareness-register-schema --system --family schema:qa-system --version 1.0.0 --schema-file /tmp/qa-system-schema.json --source qa-built-in --tags qa --description "qa system schema"`. + Then via MCP: + ``` + create_record(source="qa-test", tags=[], description="uses system schema", + logical_key="qa-sys-rec", schema_ref="schema:qa-system", schema_version="1.0.0", + content={"any": "thing"}) + ``` + Expected: record created successfully. + +13. - [ ] **Cross-owner isolation** + As a second authenticated user, attempt to resolve the step-1 schema. Expected: `code: "schema_not_found"`. +EOF +``` + +- [ ] **Step 2: Create the PR** + +```bash +source ~/github.com/cmeans/claude-dev/github-app/activate.sh && \ + gh pr create \ + --title "feat: add schema and record entry types with JSON Schema validation" \ + --body-file <(cat <<'EOF' + +EOF +) \ + --label "enhancement" \ + --label "Dev Active" +``` + +(Exact label discipline per `feedback_label_discipline.md` — set `Dev Active` on push, let automation transition to `Awaiting CI` → `Ready for QA`.) + +- [ ] **Step 3: Poll CI and transition labels per project workflow** + +Per `feedback_poll_ci_after_push.md` — after push, run `gh pr checks ` immediately. On green, apply `Ready for QA`. Per `feedback_codecov_comment.md` — read the Codecov bot comment, fix any missing lines before marking Ready for QA. + +--- + +## Self-Review + +**Spec coverage check:** + +Walking the design doc section by section: + +- D1 (type-specific tools) → Tasks 11, 12 ✓ +- D2 (per-owner + `_system` fallback) → Task 6 (SQL-level) + Task 10 (seed) ✓ +- D3 (CLI-only `_system` writes) → Task 15 ✓ +- D4 (absolute schema immutability) → Task 13 (schema branch) ✓ +- D5 (record mutability with re-validation) → Task 13 (record branch) ✓ +- D6 (all errors via `iter_errors()`) → Task 5 ✓ +- D7 (server-derived `logical_key`) → Task 3 + used in Tasks 11/12 ✓ +- D8 (any JSON value for `content`) → Task 5 tests include primitive + array schemas ✓ + +**Architecture:** `validation.py` covered Tasks 3–5, 8, 9 ✓; Store changes covered Tasks 6, 7 ✓; Tool changes covered Tasks 11–14 ✓; CLI covered Task 15 ✓; Migration covered Task 10 ✓. + +**Error codes:** every code in the spec's error table is exercised by at least one test: `invalid_schema` (Task 11), `schema_already_exists` (Task 11), `schema_not_found` (Task 12), `validation_failed` (Tasks 12, 13), `schema_immutable` (Task 13), `schema_in_use` (Task 14). `invalid_parameter` inherited from existing helper. `record_schema_pin_immutable` is NOT tested — because `update_entry` doesn't expose `schema_ref`/`schema_version` params. Either keep it as a code reserved for a future API change, or drop the code from the spec. **Decision: keep as reserved; no test needed for a code that can't be triggered given the current API.** + +**Deployment:** Operator deploy sequence from the spec mapped to Task 18 (migration) + Task 15 (CLI) + PR-body QA steps. Compose files untouched; called out explicitly. + +**Testing:** Unit (Tasks 3–5, 8, 9) + integration (Tasks 6, 7, 11–14, 16) + CLI (Task 15) + coverage gate (Task 18). Cross-owner isolation explicit in Task 16. + +**Placeholder scan:** No "TBD" / "TODO" in task bodies. Each code step shows actual code. Each run step shows exact command + expected outcome. The one placeholder concession is migration revision id (`m8h9i0j1k2l3`) which depends on head-at-implementation-time — Task 10 Step 1 instructs how to pick it. + +**Type consistency:** Function names consistent throughout: `compose_schema_logical_key`, `validate_schema_body`, `validate_record_content`, `resolve_schema`, `assert_schema_deletable`, `SchemaInUseError`. Store methods: `find_schema`, `count_records_referencing`. Tool names: `register_schema`, `create_record`. Error codes match spec table exactly. + +--- + +## Execution Handoff + +Plan complete and saved to `docs/superpowers/plans/2026-04-13-schema-record-entry-types-plan.md`. Two execution options: + +1. **Subagent-Driven (recommended)** — I dispatch a fresh subagent per task, review between tasks, fast iteration. +2. **Inline Execution** — I execute tasks in this session using `executing-plans`, batch execution with checkpoints. + +Which approach do you want? diff --git a/docs/superpowers/specs/2026-04-13-schema-record-entry-types-design.md b/docs/superpowers/specs/2026-04-13-schema-record-entry-types-design.md new file mode 100644 index 0000000..2cc54a1 --- /dev/null +++ b/docs/superpowers/specs/2026-04-13-schema-record-entry-types-design.md @@ -0,0 +1,410 @@ +# Schema + Record entry types — Design Spec + +**Date:** 2026-04-13 +**Issue:** [#208](https://github.com/cmeans/mcp-awareness/issues/208) +**Related awareness entries:** `design-schema-record-secrets` (`53b378b2`), intention `3117644f` +**Scope:** Implementation steps 1–2 of the awareness-edge prerequisite design. Secrets infrastructure (step 3+) is a separate follow-up. + +## Problem + +mcp-awareness stores arbitrary agent-written entries. Edge providers and the future tag taxonomy layer need *typed data contracts* — schemas that define a shape, and records that conform to those shapes with server-side validation on write. Without this, the entire edge config pattern (manifests, provider preferences, target configs) rests on implicit naming conventions with no validation — typos silently fall through to defaults, flagged as the #1 practical pain point in the edge design review. + +This spec defines two new `EntryType` values — `schema` and `record` — with JSON Schema Draft 2020-12 validation on write, plus a system-owner fallback so canonical shared schemas can ship with the server. + +## Goals + +- Agents can register schemas via MCP and write records validated against them. +- Server-side enforcement: invalid schemas never stored; invalid records never stored. +- Canonical schemas (edge-manifest, edge-identity, eventually tag taxonomy) can live in a shared `_system` namespace, with per-user schemas as an override layer. +- Structured error responses listing *all* validation failures in one round trip. +- No new tool surface beyond two type-specific write tools; existing `update_entry` / `delete_entry` absorb the new types. + +## Non-goals + +- Secrets (`x-secret` encryption, one-time token web form, edge decrypt endpoint) — separate follow-up PR. +- Admin-via-MCP authorization (`is_admin` column on users) — deferred until actually needed. +- Cross-schema `$ref` resolution via `referencing.Registry` — deferred until a real use case demands it. +- Validator caching by schema version — deferred until throughput data justifies it. +- Backwards-compatibility shims for the historical "Structure"/"Structured"/"Secret" naming — superseded names; new implementation uses `schema`/`record`/`x-secret`. + +## Design decisions + +### D1. Tool surface: type-specific write tools + +Two new MCP tools — `register_schema` and `create_record` — matching the existing convention of one type-specific tool per writable entry type (`remember` → note, `learn_pattern` → pattern, etc.). The MCP Bench audit flagged the 29-tool surface as bloated, but extending `remember` with polymorphic `entry_type` would muddy its semantic. A future PR may unify all write tools behind a generic `create_entry`; that is a separate refactor across existing tools, not scope for this work. + +### D2. Multi-tenancy: per-owner with `_system` fallback + +Schemas are scoped by `owner_id`. A reserved `_system` owner holds shared canonical schemas. Schema lookup queries `WHERE logical_key=? AND owner_id IN (caller, '_system') ORDER BY CASE WHEN owner_id=caller THEN 0 ELSE 1 END LIMIT 1` — caller's own schema wins over `_system` when both exist, giving operators predictable override semantics. + +### D3. `_system` write mechanism: CLI only + +A new console script `mcp-awareness-register-schema --system ...` writes `_system`-owned schemas, bypassing MCP. Operators (DB access + server config) seed built-in schemas at deploy/bootstrap time. No `is_admin` column, no MCP authz plumbing — bootstrap is a deploy-time concern, not agent-accessible. + +### D4. Schema immutability: absolute + +`update_entry` on a `schema` entry always returns `schema_immutable`. To change a schema, register a new version; if the old version has no non-deleted records, soft-delete it. Matches the spec's "new version = new entry" framing and removes state-dependent authoring behavior. + +### D5. Record mutability: re-validated on content change + +`update_entry` on a `record` entry re-resolves the pinned schema and re-validates `content` on update. Updates that fail re-validation are rejected; the record is left unchanged. Non-content field updates (tags, description, source) skip re-validation. `schema_ref` and `schema_version` are immutable on records — records pin to an exact schema version and cannot be re-targeted. + +### D6. Validation error reporting: all errors via `iter_errors()` + +The `validation_failed` envelope includes a `validation_errors` list with one entry per `iter_errors()` yield, sorted by `path`. Each entry has `path` (from `ValidationError.json_path`), `message`, `validator` (the failing JSON Schema keyword), and `schema_path`. Truncated at 50 errors with `truncated: true, total_errors: ` if more. + +### D7. `logical_key` derivation: server-side + +For schemas, the caller passes `family` and `version`; the server derives `logical_key = f"{family}:{version}"`. Single source of truth; impossible to end up with a mismatch. Records mirror the derivation on lookup: `resolve_schema` composes the target `logical_key` from the record's `schema_ref` + `schema_version`. + +### D8. Record `content`: any JSON value + +`data.content` on record entries accepts any JSON-serializable value (dict, list, primitive, null) — matches JSON Schema's ability to validate any value, and matches the existing polymorphic `content` parameter on `remember`. Ruling it out now would create a future migration for no real benefit. + +## Architecture + +### New module: `src/mcp_awareness/validation.py` + +Pure functions, no I/O side effects except the store-lookup helper. Keeps `jsonschema` out of the store layer (preserves Store protocol as swappable) and makes validation unit-testable without Postgres. + +| Function | Purpose | +|---|---| +| `validate_schema_body(schema: dict) -> None` | `Draft202012Validator.check_schema(schema)`. Translates `SchemaError` into structured `invalid_schema` error. | +| `resolve_schema(store, owner_id, family, version) -> Entry \| None` | Caller-owner lookup first, `_system` fallback. Excludes soft-deleted. | +| `validate_record_content(schema_body: dict, content: Any) -> list[dict]` | Runs `iter_errors()`, returns sorted list of error dicts. Empty list = valid. | +| `compose_schema_logical_key(family: str, version: str) -> str` | Single place the format lives: `f"{family}:{version}"`. | +| `assert_schema_deletable(store, owner_id, logical_key) -> None` | Queries referencing records. Raises `schema_in_use` with blocker list if any. | +| `collect_validation_errors(validator, instance) -> list[dict]` | Internal helper; handles truncation at 50. | + +### Store protocol changes (`src/mcp_awareness/store.py`, `postgres_store.py`) + +Two new methods on the `Store` protocol: + +- `find_schema(owner_id: str, logical_key: str) -> Entry | None` — single-query schema lookup honoring the `_system` fallback and soft-delete exclusion. +- `count_records_referencing(owner_id: str, schema_logical_key: str) -> tuple[int, list[str]]` — supports schema-delete protection. Returns total count and up to N (default 10) referencing record IDs for the error envelope. + +Existing `save_entry` / write paths absorb the new entry types unchanged — the `type` field is a TEXT enum value change, not a structural change. + +### Tool surface changes (`src/mcp_awareness/tools.py`) + +- **New:** `register_schema(source, tags, description, family, version, schema, learned_from="conversation") -> str` +- **New:** `create_record(source, tags, description, logical_key, schema_ref, schema_version, content, learned_from="conversation") -> str` +- **Modified:** `update_entry` branches on `entry.type`: + - `SCHEMA` → always `schema_immutable`. + - `RECORD` with content change → re-resolve schema, re-validate, reject on failure. + - `RECORD` attempting to change `schema_ref`/`schema_version` → `record_schema_pin_immutable`. + - Other types → existing behavior. +- **Modified:** `delete_entry` branches on `entry.type == SCHEMA` to run deletion protection before soft-delete. + +Response payloads trimmed to only what the caller didn't provide: + +- `register_schema` returns `{"status": "ok", "id", "logical_key"}` (`logical_key` is server-derived). +- `create_record` returns `{"status": "ok", "id", "action": "created" | "updated"}`. + +### EntryType additions (`src/mcp_awareness/schema.py`) + +```python +class EntryType(str, Enum): + # ... existing eight values ... + SCHEMA = "schema" + RECORD = "record" +``` + +No DB-level CHECK constraint on `entries.type` (there isn't one today); Python-layer `_parse_entry_type` handles invalid input with structured errors. + +### CLI tool: `src/mcp_awareness/cli_register_schema.py` + +New console script `mcp-awareness-register-schema`. Registered in `pyproject.toml` as `[project.scripts]`. + +``` +mcp-awareness-register-schema --system \ + --family schema:edge-manifest \ + --version 1.0.0 \ + --schema-file edge-manifest.json \ + --source awareness-built-in \ + --tags "schema,edge" \ + --description "Edge provider manifest schema" +``` + +Argparse validation, direct `PostgresStore` construction (no MCP / middleware / auth), writes with `owner_id="_system"` and `learned_from="cli-bootstrap"`. Skips embedding submission — CLI bootstrap shouldn't require an embedding provider. + +## Data model + +Both new types reuse the existing `Entry` dataclass and `entries` table. Schema body and record content live in the JSONB `data` column (**not** the `content` string field — avoids the Pydantic JSON-deserialization bug in awareness entry `5bc732c1`). + +### Schema entry + +```python +Entry( + type=EntryType.SCHEMA, + source=source, + tags=tags, + data={ + "family": "schema:edge-manifest", + "version": "1.0.0", + "schema": { ... JSON Schema body as dict ... }, + "description": description, + "learned_from": learned_from, + }, + logical_key="schema:edge-manifest:1.0.0", # server-derived + owner_id=current_owner(), # _system only via CLI + language="english", +) +``` + +### Record entry + +```python +Entry( + type=EntryType.RECORD, + source=source, + tags=tags, + data={ + "schema_ref": "schema:edge-manifest", + "schema_version": "1.0.0", + "content": { ... any JSON value, validated ... }, + "description": description, + "learned_from": learned_from, + }, + logical_key=caller_chosen, # supports upsert + owner_id=current_owner(), # records never write to _system + language=resolve_language(...), +) +``` + +### Uniqueness and lookup + +The existing partial unique index `(owner_id, source, logical_key) WHERE logical_key IS NOT NULL AND deleted IS NULL` enforces: + +- Per-(owner, source) uniqueness for both types via `logical_key`. +- Natural upsert path for records via the existing `remember`-style upsert machinery. + +Cross-owner schema lookup issues a single query preferring caller-owned over `_system`: + +```sql +SELECT * FROM entries +WHERE type = 'schema' + AND logical_key = %(logical_key)s + AND owner_id IN (%(caller)s, '_system') + AND deleted IS NULL +ORDER BY CASE WHEN owner_id = %(caller)s THEN 0 ELSE 1 END +LIMIT 1 +``` + +## `jsonschema` integration + +- **Library version:** `jsonschema >= 4.26.0` (current PyPI latest, confirmed 2026-04-13). Added to main deps in `pyproject.toml` (not dev). Pulls `attrs`, `jsonschema-specifications`, `referencing`, `rpds-py` (wheels available for all supported platforms). +- **Meta-schema validation:** `Draft202012Validator.check_schema(schema_body)`. Raises `jsonschema.exceptions.SchemaError` on invalid schema. +- **Record validation:** `validator = Draft202012Validator(schema_body); errors = sorted(validator.iter_errors(content), key=lambda e: e.path)`. +- **Unknown keywords:** ignored by default (jsonschema v0.3+). Our future `x-secret` extension works "for free" without needing `validators.extend()` until we wire the secrets layer. +- **No validator caching** in v1 — construct per-write. Cache by `(owner_id, logical_key)` keyed on schema `id` if throughput demands later (schemas are immutable, so cache invalidation is trivial). +- **No `referencing.Registry`** in v1 — records reference schemas by our own `schema_ref`/`schema_version` pair, not JSON Schema `$ref`. +- **Belt-and-suspenders:** wrap both `check_schema()` and `iter_errors()` in try/except for `jsonschema.exceptions.JsonSchemaException` (base class); translate any unhandled exception to a generic `validation_error` structured response so raw tracebacks never reach agents. + +## Data flow + +### `register_schema` (MCP) + +1. Tool handler receives `family, version, schema, source, tags, description, learned_from`. +2. `validation.validate_schema_body(schema)` → structured `invalid_schema` on failure. +3. Compose `logical_key = f"{family}:{version}"`. +4. Build `Entry(type=SCHEMA, ..., owner_id=current_owner())`. +5. `store.save_entry(entry)` → Postgres unique-constraint violation becomes `schema_already_exists`. +6. Submit to embedding pool (existing pattern). +7. Return `{"status": "ok", "id", "logical_key"}`. + +### `create_record` (MCP) + +1. Tool handler receives `logical_key, schema_ref, schema_version, content, source, tags, description, learned_from`. +2. `validation.resolve_schema(store, owner_id, schema_ref, schema_version)` → None if not found or soft-deleted. + - None → `schema_not_found` structured error with `searched_owners: [caller, "_system"]`. +3. Extract `schema_body = resolved.data["schema"]`. +4. `validation.validate_record_content(schema_body, content)` → error list. + - Non-empty → `validation_failed` with full list. +5. Build `Entry(type=RECORD, ...)` with caller-chosen `logical_key`. +6. `store.save_entry(entry)` — existing upsert path handles same-logical_key updates. +7. Return `{"status": "ok", "id", "action": "created" | "updated"}`. + +### Record update (`update_entry`) + +1. Load entry by ID; branch on `entry.type`. +2. `SCHEMA` → `schema_immutable`, always. +3. `RECORD`: + - Update touches `content` → re-resolve schema, re-validate, reject on failure. + - Update touches `schema_ref` or `schema_version` → `record_schema_pin_immutable`. + - Update touches only non-content fields → no re-validation. +4. Write + append changelog per existing machinery. + +### Schema delete (`delete_entry`) + +1. Load entry; if `type == SCHEMA`, call `assert_schema_deletable`. +2. `count_records_referencing` → raise `schema_in_use` with blocker list if count > 0. +3. Soft-delete proceeds via existing machinery. + +### CLI bootstrap + +1. Argparse validates required args. +2. Read schema file as JSON. +3. `validation.validate_schema_body()` → stderr structured error + exit 1 on failure. +4. Build Entry with `owner_id="_system"`, `learned_from="cli-bootstrap"`, composed `logical_key`. +5. Construct `PostgresStore` directly (bypasses MCP, middleware, auth). +6. `save_entry()`. Skip embedding submission. +7. Print `{"status": "ok", "id", "logical_key"}` to stdout, exit 0. + +## Error handling + +All errors route through existing `_error_response()` helper (`helpers.py:214`) → structured `ToolError`. No new helper, no new envelope format. + +### New error codes + +| Code | Where | Retryable | Extra fields | +|---|---|---|---| +| `invalid_schema` | `register_schema` meta-schema failure | false | `schema_error_path`, `detail` | +| `invalid_parameter` | `register_schema` malformed `family`/`version` (existing code) | false | `param`, `value`, `valid` | +| `schema_already_exists` | `register_schema` unique-constraint collision | false | `logical_key`, `existing_id` | +| `schema_not_found` | `create_record` / record update | false | `schema_ref`, `schema_version`, `searched_owners` | +| `validation_failed` | record content fails schema | false | `schema_ref`, `schema_version`, `validation_errors`, `truncated?`, `total_errors?` | +| `schema_immutable` | `update_entry` on schema | false | — | +| `record_schema_pin_immutable` | record update tries to change pin fields | false | `param` | +| `schema_in_use` | `delete_entry` on referenced schema | false | `referencing_records`, `total_count?` | +| `validation_error` | unexpected exception from the validator (belt-and-suspenders — schemas are meta-validated on register, so this should be unreachable in practice) | false | — | + +### Validation error envelope shape + +```json +{ + "error": { + "code": "validation_failed", + "retryable": false, + "message": "Record content does not conform to schema edge-manifest:1.0.0 (2 errors)", + "schema_ref": "schema:edge-manifest", + "schema_version": "1.0.0", + "validation_errors": [ + { + "path": "/providers/0/name", + "message": "'name' is a required property", + "validator": "required", + "schema_path": "/properties/providers/items/required" + } + ] + } +} +``` + +- `path` from `ValidationError.json_path` — root is `/`, array indices included. +- `schema_path` is the JSON-Pointer-like path into the *schema* (`"/".join(str(p) for p in e.schema_path)`) — useful when the agent has the schema in hand for self-correction. +- `validator` is the failing JSON Schema keyword (`required`, `type`, `enum`, etc.) — enables keyword-specific remediation. +- List sorted by `path` for stable output. +- Truncated at 50 errors with `truncated: true, total_errors: `. + +## Deployment + +### Alembic migration + +`m8h9i0j1k2l3_add_system_user_for_schemas.py` (next sequential id; actual id assigned when authoring): + +```sql +INSERT INTO users (id, display_name, created) +VALUES ('_system', 'System-managed schemas', now()) +ON CONFLICT (id) DO NOTHING; +``` + +Single-purpose, idempotent, reversible. No DDL — leverages existing `users` table. + +### Operator deploy sequence + +1. Merge PR → Docker image rebuild on tag push (existing CI). +2. Pull + restart holodeck LXCs (production) **and** the QA instance (`docker-compose.qa.yaml`). +3. Run `mcp-awareness-migrate` in each environment — applies the `_system` user seed. **Not automatic; compose files do not run migrations at container start.** This matches the manual pattern used for all prior migrations (language/tsv backfills, OAuth columns, etc.). +4. Operator runs `mcp-awareness-register-schema --system ...` per built-in schema, gradually as schemas are authored. No requirement to seed all at deploy time. +5. No re-embed needed — existing entries unaffected. + +### Compose files + +All compose files (`docker-compose.yaml`, `docker-compose.qa.yaml`, `docker-compose.oauth.yaml`, `docker-compose.demo.yaml`) must remain coherent. **No changes required for this PR** — no new services, no new env vars, no new volumes, no new migration-at-start behavior. + +### Rollback + +`mcp-awareness-migrate --downgrade ` reverses the `_system` user seed. Any `schema`/`record` entries written during the deployment window remain in the DB as orphaned data on older code (unknown `EntryType` value → `_parse_entry_type` guard returns structured error). Re-rolling forward makes them visible again. + +### Feature flag + +None. The new tools are additive and opt-in. `_system` fallback only kicks in when a caller references a schema they don't own — opt-in by use. + +## Testing strategy + +### Unit tests: `tests/test_validation.py` + +Pure functions, no DB. Covers: + +- `validate_schema_body`: valid Draft 2020-12; invalid type value; non-object schema; empty `{}` (valid). +- `validate_record_content`: valid pass-through; multiple simultaneous failures; non-object content against non-object schema; `additionalProperties: false` behavior; truncation at 50. +- `compose_schema_logical_key`: format is `f"{family}:{version}"`. +- `resolve_schema` (with in-memory store stub): caller-owned present; `_system` fallback; caller wins over `_system`; soft-deleted excluded; neither exists. +- `assert_schema_deletable` (with store stub): passes with zero references; raises with blocker list. + +### Integration tests: `tests/test_tools_schema_record.py` + +Testcontainers Postgres. Covers: + +- `register_schema`: happy path; duplicate; invalid meta-schema; malformed `family`/`version`. +- `create_record`: happy path; against `_system` schema; schema-not-found; validation failure; upsert via same `logical_key`. +- `update_entry` on record: valid content update; invalid content update (rejected); non-content update; attempt to change `schema_ref`/`schema_version` (rejected). +- `update_entry` on schema: any update rejected. +- `delete_entry` on schema: zero refs succeeds; with refs rejected with blocker list; after refs soft-deleted succeeds. +- `delete_entry` on record: unchanged behavior. +- Cross-owner isolation: A cannot see B's schemas; both see `_system`; A's records invisible to B. + +### CLI tests: `tests/test_cli_register_schema.py` + +- Happy path: valid file → entry with `owner_id="_system"`, stdout structured response. +- Invalid schema file: stderr structured error, exit 1, no entry written. +- Missing required args: argparse error, exit 2. +- `--source`, `--tags`, `--description` flow through to stored entry. +- `learned_from` hardcoded to `"cli-bootstrap"`. + +### Existing tests to extend + +- `tests/test_schema.py` — add `SCHEMA`/`RECORD` enum coverage. +- `tests/test_postgres_store.py` — add `find_schema` + `count_records_referencing` coverage. +- `tests/test_tools.py` — any parametrized entry-type tests include new values. + +### Coverage discipline + +- Per `feedback_codecov_coverage.md` and `feedback_local_coverage_before_qa.md`: run `pytest --cov` locally before marking Ready for QA. +- All new lines in `validation.py`, `cli_register_schema.py`, and the tool handlers covered. No `pragma: no cover` without explicit approval. + +### Manual QA (PR body) + +Per project convention — MCP-call steps on an alternate-port test instance. Exercises: register schema; write valid record; write invalid record (verify envelope shape); update record content (valid + invalid); attempt schema update (verify immutability); delete schema with records (verify protection); delete schema without records; `_system` fallback via CLI tool. + +## PR conventions checklist + +Per `CLAUDE.md`: + +- [ ] CHANGELOG entry under `[Unreleased]`. +- [ ] README update if tool count or implemented-features sections change. +- [ ] Test count updated in README. +- [ ] `## QA` section in PR body with prerequisites + per-test checkboxes calling MCP tools. +- [ ] `QA Approved` label applied after manual QA. +- [ ] `docs/data-dictionary.md` updated with `schema`/`record` entry types and new `data` fields. +- [ ] Commit: AGPL v3 license preamble on every new `.py` file. + +## Open questions for planning phase + +None at design time. Items that will surface during planning: + +- Exact naming of the next Alembic revision id (depends on head at implementation time). +- Whether to split the PR at the CLI tool boundary if the test suite grows unwieldy — design allows it but default is a single PR. +- Whether to add a short `docs/schema-record-guide.md` alongside the implementation for users (can be filed as follow-up). + +## References + +- Awareness design spec: `design-schema-record-secrets` (entry `53b378b2`, 2026-03-28) +- Active intention: `3117644f` +- Historical intention cancelled in this session: `42bb92e5` (superseded) +- GitHub issue: [#208](https://github.com/cmeans/mcp-awareness/issues/208) +- Downstream consumers: Layer A/B/C tag taxonomy design (`design-tag-taxonomy-v2`), awareness-edge runtime +- `jsonschema` Python library: `/python-jsonschema/jsonschema` (context7), docs on `check_schema`, `iter_errors`, `referencing.Registry`, custom keyword extension +- MCP Bench audit: entry `1373dbd5` — tool surface concerns driving the "no generic create_entry refactor in this PR" decision +- Existing structured-error helper: `src/mcp_awareness/helpers.py:214` diff --git a/pyproject.toml b/pyproject.toml index 8a8a96e..cce2399 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -16,6 +16,7 @@ dependencies = [ "phonenumbers>=8.13,<10.0", "zxcvbn>=4.5.0,<5.0", "lingua-language-detector>=2.1.1,<3.0", + "jsonschema>=4.26.0,<5", ] [project.scripts] @@ -24,6 +25,7 @@ mcp-awareness-migrate = "mcp_awareness.migrate:main" mcp-awareness-user = "mcp_awareness.cli:user_main" mcp-awareness-token = "mcp_awareness.cli:token_main" mcp-awareness-secret = "mcp_awareness.cli:secret_main" +mcp-awareness-register-schema = "mcp_awareness.cli_register_schema:main" [project.optional-dependencies] dev = [ @@ -103,6 +105,10 @@ ignore_missing_imports = true module = ["jwt.*", "zxcvbn.*"] ignore_missing_imports = true +[[tool.mypy.overrides]] +module = ["jsonschema.*"] +ignore_missing_imports = true + # The tools/resources/prompts modules use a circular import pattern # (from . import server as _srv) to access mutable state through the server # module at call time. mypy cannot resolve attribute types on the partially- diff --git a/src/mcp_awareness/cli_register_schema.py b/src/mcp_awareness/cli_register_schema.py new file mode 100644 index 0000000..7b57452 --- /dev/null +++ b/src/mcp_awareness/cli_register_schema.py @@ -0,0 +1,173 @@ +# mcp-awareness — ambient system awareness for AI agents +# Copyright (C) 2026 Chris Means +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU Affero General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU Affero General Public License for more details. +# +# You should have received a copy of the GNU Affero General Public License +# along with this program. If not, see . + +"""CLI for registering _system-owned schema entries. + +Bypasses MCP entirely — operator tool, run once per built-in schema at +deploy/bootstrap time. No MCP auth, no middleware, direct PostgresStore +access. +""" + +from __future__ import annotations + +import argparse +import json +import os +import sys +from pathlib import Path + + +def main() -> None: + parser = argparse.ArgumentParser( + description="Register a _system-owned schema entry (operator bootstrap only).", + ) + parser.add_argument( + "--system", + action="store_true", + required=True, + help="Required. Confirms the caller intends to write to the _system owner.", + ) + parser.add_argument( + "--family", + required=True, + help="Schema family (e.g., schema:edge-manifest)", + ) + parser.add_argument( + "--version", + required=True, + help="Schema version (e.g., 1.0.0)", + ) + parser.add_argument( + "--schema-file", + required=True, + type=Path, + help="Path to JSON file containing the Draft 2020-12 schema body", + ) + parser.add_argument( + "--source", + required=True, + help="Source field for the entry", + ) + parser.add_argument( + "--tags", + default="", + help="Comma-separated tags (empty string for none)", + ) + parser.add_argument( + "--description", + required=True, + help="Entry description", + ) + args = parser.parse_args() + + # Read + parse schema file + if not args.schema_file.exists(): + print( + json.dumps({"error": {"code": "file_not_found", "message": str(args.schema_file)}}), + file=sys.stderr, + ) + sys.exit(1) + try: + schema_body = json.loads(args.schema_file.read_text()) + except json.JSONDecodeError as e: + print( + json.dumps({"error": {"code": "invalid_json", "message": str(e)}}), + file=sys.stderr, + ) + sys.exit(1) + + # Meta-schema validation + from jsonschema import exceptions as jse + + from mcp_awareness.validation import compose_schema_logical_key, validate_schema_body + + try: + validate_schema_body(schema_body) + except jse.SchemaError as e: + print( + json.dumps( + { + "error": { + "code": "invalid_schema", + "message": str(e.message), + "schema_error_path": "/" + "/".join(str(p) for p in e.absolute_path), + } + } + ), + file=sys.stderr, + ) + sys.exit(1) + + # DB connection + database_url = os.environ.get("AWARENESS_DATABASE_URL", "") + if not database_url: + print( + json.dumps( + { + "error": { + "code": "missing_env", + "message": "AWARENESS_DATABASE_URL required", + } + } + ), + file=sys.stderr, + ) + sys.exit(1) + + from mcp_awareness.language import resolve_language + from mcp_awareness.postgres_store import PostgresStore + from mcp_awareness.schema import Entry, EntryType, make_id, now_utc + + store = PostgresStore(database_url) + logical_key = compose_schema_logical_key(args.family, args.version) + tags = [t.strip() for t in args.tags.split(",") if t.strip()] + # Match the MCP path: run the description through the standard + # language-resolution chain (lingua auto-detection, SIMPLE fallback) + # instead of pinning every CLI-seeded schema to english. + resolved_lang = resolve_language(text_for_detection=args.description) + + entry = Entry( + id=make_id(), + type=EntryType.SCHEMA, + source=args.source, + tags=tags, + created=now_utc(), + expires=None, + data={ + "family": args.family, + "version": args.version, + "schema": schema_body, + "description": args.description, + "learned_from": "cli-bootstrap", + }, + logical_key=logical_key, + language=resolved_lang, + ) + + try: + store.add("_system", entry) + except Exception as e: + print( + json.dumps({"error": {"code": "store_error", "message": str(e)}}), + file=sys.stderr, + ) + sys.exit(1) + + print(json.dumps({"status": "ok", "id": entry.id, "logical_key": logical_key})) + + +if __name__ == "__main__": + main() diff --git a/src/mcp_awareness/helpers.py b/src/mcp_awareness/helpers.py index ce742f3..73799a1 100644 --- a/src/mcp_awareness/helpers.py +++ b/src/mcp_awareness/helpers.py @@ -221,6 +221,7 @@ def _error_response( valid: list[str] | None = None, suggestion: str | None = None, help_url: str | None = None, + **extras: Any, ) -> NoReturn: """Build a structured error envelope and raise ToolError. @@ -228,6 +229,10 @@ def _error_response( so clients get proper error signaling. The JSON envelope provides structured fields for smart clients alongside a human-readable message. + Extra keyword arguments (``**extras``) are merged into the error dict + after the fixed fields, allowing structured context such as + ``schema_ref``, ``validation_errors``, ``searched_owners``, etc. + Raises: ToolError: always — this function never returns. """ @@ -248,6 +253,8 @@ def _error_response( error["suggestion"] = suggestion if help_url is not None: error["help_url"] = help_url + for k, v in extras.items(): + error[k] = v raise ToolError(json.dumps({"status": "error", "error": error})) diff --git a/src/mcp_awareness/instructions.md b/src/mcp_awareness/instructions.md index 0431d63..16aa02c 100644 --- a/src/mcp_awareness/instructions.md +++ b/src/mcp_awareness/instructions.md @@ -18,3 +18,11 @@ unbounded results. Use hint to re-rank by relevance so the best matches come first. Narrow with 2–3 specific tags rather than one broad tag. Use since/until for time-bounded queries. Call get_stats or get_tags first if you're unsure how much data exists. + +When you need typed data contracts for edge providers, tag taxonomies, or any +shape that should be validated on write: register a schema via `register_schema` +(family + version + JSON Schema body), then write records via `create_record` +referencing `schema_ref` + `schema_version`. Schemas are immutable after +registration — to evolve a shape, register a new version and soft-delete the +old one (only allowed when no records still reference it). Built-in shared +schemas live in the `_system` namespace, seeded by the operator. diff --git a/src/mcp_awareness/postgres_store.py b/src/mcp_awareness/postgres_store.py index 76f1739..6eb0150 100644 --- a/src/mcp_awareness/postgres_store.py +++ b/src/mcp_awareness/postgres_store.py @@ -1375,6 +1375,51 @@ def get_referencing_entries(self, owner_id: str, entry_id: str) -> list[Entry]: (json.dumps([entry_id]),), ) + def find_schema(self, owner_id: str, logical_key: str) -> Entry | None: + """Look up a schema, preferring caller-owned over _system-owned. + + Single query with CASE-based ORDER BY for predictable override + semantics: caller's own version wins, _system is fallback. + Soft-deleted entries are excluded. + """ + with self._pool.connection() as conn, conn.transaction(), conn.cursor() as cur: + self._set_rls_context(cur, owner_id) + cur.execute( + _load_sql("find_schema"), + (logical_key, owner_id, owner_id), + ) + row = cur.fetchone() + return self._row_to_entry(row) if row else None + + def count_records_referencing( + self, owner_id: str, schema_logical_key: str + ) -> tuple[int, list[str]]: + """Count and sample-id records referencing a schema version. + + Splits schema_logical_key on the last ':' to obtain schema_ref and version. + schema_ref may itself contain ':' (e.g. "schema:edge-manifest:1.0.0"). + Matches data.schema_ref and data.schema_version in the record entries' JSONB. + + Invariant: schema_logical_key must be a `ref:version` composed by + ``compose_schema_logical_key``. Empty ref or empty version would split + into a non-matching query; empty version is blocked at register_schema, + but we assert here as defense-in-depth since the store API is public. + """ + ref, sep, version = schema_logical_key.rpartition(":") + assert sep == ":", f"schema_logical_key must contain ':': {schema_logical_key!r}" + assert ref, f"schema_logical_key has empty ref component: {schema_logical_key!r}" + assert version, f"schema_logical_key has empty version component: {schema_logical_key!r}" + with self._pool.connection() as conn, conn.transaction(), conn.cursor() as cur: + self._set_rls_context(cur, owner_id) + cur.execute(_load_sql("count_records_referencing"), (owner_id, ref, version)) + count_row = cur.fetchone() + count = int(count_row["cnt"]) if count_row else 0 + if count == 0: + return (0, []) + cur.execute(_load_sql("list_records_referencing_ids"), (owner_id, ref, version)) + ids = [str(r["id"]) for r in cur.fetchall()] + return (count, ids) + # ------------------------------------------------------------------ # User operations (for OAuth auto-provisioning) # ------------------------------------------------------------------ diff --git a/src/mcp_awareness/schema.py b/src/mcp_awareness/schema.py index 192d35a..d6e7973 100644 --- a/src/mcp_awareness/schema.py +++ b/src/mcp_awareness/schema.py @@ -36,6 +36,8 @@ class EntryType(str, Enum): PREFERENCE = "preference" NOTE = "note" INTENTION = "intention" + SCHEMA = "schema" + RECORD = "record" # Valid states for the INTENTION lifecycle diff --git a/src/mcp_awareness/server.py b/src/mcp_awareness/server.py index 4a71e42..be402e6 100644 --- a/src/mcp_awareness/server.py +++ b/src/mcp_awareness/server.py @@ -680,6 +680,7 @@ def _run() -> None: acted_on, add_context, backfill_embeddings, + create_record, delete_entry, get_actions, get_activity, @@ -696,6 +697,7 @@ def _run() -> None: get_tags, get_unread, learn_pattern, + register_schema, remember, remind, report_alert, diff --git a/src/mcp_awareness/sql/count_records_referencing.sql b/src/mcp_awareness/sql/count_records_referencing.sql new file mode 100644 index 0000000..6ce8151 --- /dev/null +++ b/src/mcp_awareness/sql/count_records_referencing.sql @@ -0,0 +1,14 @@ +/* name: count_records_referencing */ +/* mode: literal */ +/* Count records referencing a schema version (for deletion-protection checks). + schema_logical_key is decomposed at the Python layer into (schema_ref, schema_version) + via rpartition(":") — schema_ref may itself contain ':' (e.g. "schema:edge-manifest"). + Params: owner_id, schema_ref, schema_version +*/ +SELECT COUNT(*) AS cnt +FROM entries +WHERE type = 'record' + AND owner_id = %s + AND data->>'schema_ref' = %s + AND data->>'schema_version' = %s + AND deleted IS NULL diff --git a/src/mcp_awareness/sql/find_schema.sql b/src/mcp_awareness/sql/find_schema.sql new file mode 100644 index 0000000..45aee91 --- /dev/null +++ b/src/mcp_awareness/sql/find_schema.sql @@ -0,0 +1,15 @@ +/* name: find_schema */ +/* mode: literal */ +/* Look up a schema entry by logical_key, preferring caller-owned over _system. + Returns the caller's own version if present, otherwise the _system version. + Soft-deleted entries are excluded. + Params: logical_key, caller (owner_id), caller (owner_id again for ORDER BY) +*/ +SELECT id, type, source, tags, created, updated, expires, data, logical_key, owner_id, language, deleted +FROM entries +WHERE type = 'schema' + AND logical_key = %s + AND owner_id IN (%s, '_system') + AND deleted IS NULL +ORDER BY CASE WHEN owner_id = %s THEN 0 ELSE 1 END +LIMIT 1 diff --git a/src/mcp_awareness/sql/list_records_referencing_ids.sql b/src/mcp_awareness/sql/list_records_referencing_ids.sql new file mode 100644 index 0000000..a2f335d --- /dev/null +++ b/src/mcp_awareness/sql/list_records_referencing_ids.sql @@ -0,0 +1,14 @@ +/* name: list_records_referencing_ids */ +/* mode: literal */ +/* Returns up to 10 record ids referencing a schema version, for deletion-blocker detail. + Params: owner_id, schema_ref, schema_version +*/ +SELECT id +FROM entries +WHERE type = 'record' + AND owner_id = %s + AND data->>'schema_ref' = %s + AND data->>'schema_version' = %s + AND deleted IS NULL +ORDER BY created +LIMIT 10 diff --git a/src/mcp_awareness/store.py b/src/mcp_awareness/store.py index d625b2f..3137f40 100644 --- a/src/mcp_awareness/store.py +++ b/src/mcp_awareness/store.py @@ -341,6 +341,24 @@ def get_referencing_entries(self, owner_id: str, entry_id: str) -> list[Entry]: """Find entries whose data.related_ids contains the given entry_id.""" ... + def find_schema(self, owner_id: str, logical_key: str) -> Entry | None: + """Look up a schema entry by logical_key, preferring caller-owned over _system. + + Returns the caller's own schema if present; falls back to the _system-owned + version if one exists. Returns None if not found or soft-deleted. + """ + ... + + def count_records_referencing( + self, owner_id: str, schema_logical_key: str + ) -> tuple[int, list[str]]: + """Return (total_count, first_N_ids) of non-deleted records referencing a schema. + + The schema_logical_key is composed as f"{schema_ref}:{schema_version}". + Caller uses total_count for the error payload and ids for the blocker list. + """ + ... + def clear(self, owner_id: str) -> None: """Delete all entries, reads, actions, and embeddings for an owner.""" ... diff --git a/src/mcp_awareness/tools.py b/src/mcp_awareness/tools.py index ef181f7..0672d9a 100644 --- a/src/mcp_awareness/tools.py +++ b/src/mcp_awareness/tools.py @@ -528,6 +528,211 @@ async def remember( return json.dumps({"status": "ok", "id": entry.id}) +@_srv.mcp.tool() +@_timed +async def register_schema( + source: str, + tags: list[str], + description: str, + family: str, + version: str, + schema: dict[str, Any], + learned_from: str = "conversation", + language: str | None = None, +) -> str: + """Register a new JSON Schema entry for later use by records. + + Validates the schema body against JSON Schema Draft 2020-12 meta-schema + on write. Family + version are combined into the entry's logical_key + (family:version); each version is a separate entry. Schemas are + absolutely immutable once registered — to change one, register a new + version and (if no records reference the old one) delete it. + + Returns: + JSON: {"status": "ok", "id": "", "logical_key": ""} + + If you receive an unstructured error, the failure is in the transport + or platform layer, not in awareness.""" + import psycopg.errors + from jsonschema import exceptions as jse + + from mcp_awareness.validation import compose_schema_logical_key, validate_schema_body + + # Validate inputs + if not family: + _error_response( + "invalid_parameter", + "family must be a non-empty string", + retryable=False, + param="family", + ) + if not version: + _error_response( + "invalid_parameter", + "version must be a non-empty string", + retryable=False, + param="version", + ) + + # Meta-schema validation + try: + validate_schema_body(schema) + except jse.SchemaError as e: + _error_response( + "invalid_schema", + f"Schema does not conform to JSON Schema Draft 2020-12: {e.message}", + retryable=False, + ) + + logical_key = compose_schema_logical_key(family, version) + + now = now_utc() + data: dict[str, Any] = { + "family": family, + "version": version, + "schema": schema, + "description": description, + "learned_from": learned_from, + } + text_for_detect = compose_detection_text("schema", data) + resolved_lang = resolve_language(explicit=language, text_for_detection=text_for_detect) + _check_unsupported_language(text_for_detect, resolved_lang) + + entry = Entry( + id=make_id(), + type=EntryType.SCHEMA, + source=source, + tags=tags, + created=now, + expires=None, + data=data, + logical_key=logical_key, + language=resolved_lang, + ) + + try: + _srv.store.add(_srv._owner_id(), entry) + except psycopg.errors.UniqueViolation: + # Surface the structured fields promised by the error-code table + # (design doc §Error codes): logical_key and existing_id. existing_id + # is best-effort — if the lookup fails for any reason we still return + # a useful error rather than raising over the original error. + _existing = _srv.store.find_schema(_srv._owner_id(), logical_key) + _existing_id = _existing.id if _existing is not None else None + _error_response( + "schema_already_exists", + f"Schema {logical_key} already exists in source {source!r}", + retryable=False, + logical_key=logical_key, + existing_id=_existing_id, + ) + + _srv._generate_embedding(entry) + return json.dumps({"status": "ok", "id": entry.id, "logical_key": logical_key}) + + +@_srv.mcp.tool() +@_timed +async def create_record( + source: str, + tags: list[str], + description: str, + logical_key: str, + schema_ref: str, + schema_version: str, + content: Any, + learned_from: str = "conversation", + language: str | None = None, +) -> str: + """Create or upsert a record validated against a registered schema. + + Resolves the target schema by schema_ref + schema_version (prefers + caller-owned, falls back to _system). Validates content against the + schema on write; rejects with a structured validation_failed error + listing every validation error. Upserts on matching (source, logical_key) + — same logical_key means update in place with changelog. + + Returns: + JSON: {"status": "ok", "id": "", "action": "created" | "updated"} + + If you receive an unstructured error, the failure is in the transport + or platform layer, not in awareness.""" + from mcp_awareness.validation import resolve_schema, validate_record_content + + resolved = resolve_schema(_srv.store, _srv._owner_id(), schema_ref, schema_version) + if resolved is None: + _error_response( + "schema_not_found", + f"No schema {schema_ref}:{schema_version} in your namespace or _system", + retryable=False, + schema_ref=schema_ref, + schema_version=schema_version, + searched_owners=[_srv._owner_id(), "_system"], + ) + + schema_body = resolved.data["schema"] + try: + errors = validate_record_content(schema_body, content) + except Exception as e: + _error_response( + "validation_error", + f"Unexpected content validation error: {e}", + retryable=False, + ) + + if errors: + # Detect truncation sentinel (always last item when present) + truncated = errors[-1].get("truncated") is True + total_errors = errors[-1]["total_errors"] if truncated else len(errors) + validation_errors = errors[:-1] if truncated else errors + vf_extras: dict[str, Any] = { + "schema_ref": schema_ref, + "schema_version": schema_version, + "validation_errors": validation_errors, + } + if truncated: + vf_extras["truncated"] = True + vf_extras["total_errors"] = total_errors + _error_response( + "validation_failed", + ( + f"Record content does not conform to schema" + f" {schema_ref}:{schema_version} ({total_errors} errors)" + ), + retryable=False, + **vf_extras, + ) + + now = now_utc() + data: dict[str, Any] = { + "schema_ref": schema_ref, + "schema_version": schema_version, + "content": content, + "description": description, + "learned_from": learned_from, + } + text_for_detect = compose_detection_text("record", data) + resolved_lang = resolve_language(explicit=language, text_for_detection=text_for_detect) + _check_unsupported_language(text_for_detect, resolved_lang) + + entry = Entry( + id=make_id(), + type=EntryType.RECORD, + source=source, + tags=tags, + created=now, + expires=None, + data=data, + logical_key=logical_key, + language=resolved_lang, + ) + + saved, created = _srv.store.upsert_by_logical_key(_srv._owner_id(), source, logical_key, entry) + _srv._generate_embedding(saved) + action = "created" if created else "updated" + return json.dumps({"status": "ok", "id": saved.id, "action": action}) + + @_srv.mcp.tool() @_timed async def update_entry( @@ -554,17 +759,16 @@ async def update_entry( updates["tags"] = tags if source is not None: updates["source"] = source - if content is not None: - if not isinstance(content, str): - content = json.dumps(content) - updates["content"] = content if content_type is not None: updates["content_type"] = content_type if language is not None: from .language import iso_to_regconfig updates["language"] = iso_to_regconfig(language) - if not updates: + # Content is normalized below once the entry type is known: RECORD entries + # keep native JSON shape (dict/list/primitive) so the wire shape matches the + # create path; other knowledge types stringify non-string content as before. + if content is None and not updates: _error_response( "invalid_parameter", "No fields to update — provide at least one of: " @@ -572,6 +776,61 @@ async def update_entry( retryable=False, param="content", ) + # --- New: type-specific branching for schema and record entries --- + from mcp_awareness.schema import EntryType as _EntryType + from mcp_awareness.validation import resolve_schema, validate_record_content + + _existing = _srv.store.get_entry_by_id(_srv._owner_id(), entry_id) + if _existing is not None: + if _existing.type == _EntryType.SCHEMA: + _error_response( + "schema_immutable", + "Schemas cannot be updated. Register a new version instead.", + retryable=False, + ) + if _existing.type == _EntryType.RECORD and content is not None: + _schema_ref = _existing.data["schema_ref"] + _schema_version = _existing.data["schema_version"] + _resolved = resolve_schema(_srv.store, _srv._owner_id(), _schema_ref, _schema_version) + if _resolved is None: + _error_response( + "schema_not_found", + f"Cannot re-validate: schema {_schema_ref}:{_schema_version} not found", + retryable=False, + schema_ref=_schema_ref, + schema_version=_schema_version, + searched_owners=[_srv._owner_id(), "_system"], + ) + _content_to_validate: Any = json.loads(content) if isinstance(content, str) else content + _errors = validate_record_content(_resolved.data["schema"], _content_to_validate) + if _errors: + _truncated = _errors[-1].get("truncated") is True + _total_errors = _errors[-1]["total_errors"] if _truncated else len(_errors) + _validation_errors = _errors[:-1] if _truncated else _errors + _vf_extras: dict[str, Any] = { + "schema_ref": _schema_ref, + "schema_version": _schema_version, + "validation_errors": _validation_errors, + } + if _truncated: + _vf_extras["truncated"] = True + _vf_extras["total_errors"] = _total_errors + _error_response( + "validation_failed", + ( + f"Record content does not conform to schema" + f" {_schema_ref}:{_schema_version} ({_total_errors} errors)" + ), + retryable=False, + **_vf_extras, + ) + # RECORD content is stored as native JSON to match the create path. + updates["content"] = _content_to_validate + # --- end branching --- + if content is not None and "content" not in updates: + # Non-record knowledge types (note/pattern/context/preference) persist + # content as a string; stringify non-string payloads for consistency. + updates["content"] = content if isinstance(content, str) else json.dumps(content) result = _srv.store.update_entry(_srv._owner_id(), entry_id, updates) if result is None: _error_response( @@ -748,6 +1007,26 @@ async def delete_entry( Returns JSON with status and count. If you receive an unstructured error, the failure is in the transport or platform layer, not in awareness.""" if entry_id: + from mcp_awareness.schema import EntryType + from mcp_awareness.validation import SchemaInUseError, assert_schema_deletable + + _candidate = _srv.store.get_entry_by_id(_srv._owner_id(), entry_id) + if ( + _candidate is not None + and _candidate.type == EntryType.SCHEMA + and _candidate.logical_key is not None + ): + try: + assert_schema_deletable(_srv.store, _srv._owner_id(), _candidate.logical_key) + except SchemaInUseError as e: + _error_response( + "schema_in_use", + f"Cannot delete schema {_candidate.logical_key}:" + f" {e.total_count} record(s) reference it", + retryable=False, + referencing_records=e.referencing_records, + total_count=e.total_count, + ) _srv.store.soft_delete_by_id(_srv._owner_id(), entry_id) return json.dumps( { @@ -758,6 +1037,9 @@ async def delete_entry( } ) if tags: + # NOTE: bulk-delete by tags does NOT currently consult assert_schema_deletable; + # a schema referenced by live records can be soft-deleted here, unlike the + # single-id path above. Tracked as a follow-up — see issue #288. if not confirm: # Use AND logic to match soft_delete_by_tags behavior all_entries = _srv.store.get_entries(_srv._owner_id(), tags=tags) @@ -787,6 +1069,9 @@ async def delete_entry( retryable=False, ) et = _parse_entry_type(entry_type) + # NOTE: bulk-delete by source (± entry_type) does NOT consult + # assert_schema_deletable; schemas referenced by live records can still be + # soft-deleted here, unlike the single-id path above. Tracked by issue #288. if not confirm: entries = _srv.store.get_entries(_srv._owner_id(), entry_type=et, source=source) return json.dumps( diff --git a/src/mcp_awareness/validation.py b/src/mcp_awareness/validation.py new file mode 100644 index 0000000..0c6df5f --- /dev/null +++ b/src/mcp_awareness/validation.py @@ -0,0 +1,121 @@ +# mcp-awareness — ambient system awareness for AI agents +# Copyright (C) 2026 Chris Means +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU Affero General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU Affero General Public License for more details. +# +# You should have received a copy of the GNU Affero General Public License +# along with this program. If not, see . + +"""Validation helpers for Schema and Record entry types. + +Pure functions wrapping jsonschema Draft 2020-12 validation and schema +lookup with _system fallback. Kept out of the store layer so the Store +protocol stays swappable (no jsonschema import in store.py). +""" + +from __future__ import annotations + +from typing import TYPE_CHECKING, Any, Protocol + +if TYPE_CHECKING: + from mcp_awareness.schema import Entry + +from jsonschema import Draft202012Validator, ValidationError + + +def compose_schema_logical_key(family: str, version: str) -> str: + """Derive the canonical logical_key for a schema entry. + + Single source of truth for the family+version → logical_key format. + Used by register_schema on write and by resolve_schema on lookup. + """ + return f"{family}:{version}" + + +def validate_schema_body(schema: Any) -> None: + """Validate a schema body against the JSON Schema Draft 2020-12 meta-schema. + + Raises jsonschema.exceptions.SchemaError on invalid schema. Callers at + the MCP boundary translate this into a structured 'invalid_schema' error + response; direct callers (CLI) format to stderr. + """ + Draft202012Validator.check_schema(schema) + + +_MAX_VALIDATION_ERRORS = 50 + + +def _flatten_error(err: ValidationError) -> dict[str, Any]: + """Flatten a jsonschema ValidationError to a structured dict for the error envelope.""" + return { + "path": err.json_path, + "message": err.message, + "validator": err.validator, + "schema_path": "/" + "/".join(str(p) for p in err.schema_path), + } + + +def validate_record_content(schema_body: dict[str, Any], content: Any) -> list[dict[str, Any]]: + """Validate content against a schema body. Returns list of structured errors. + + Empty list means valid. List truncated at _MAX_VALIDATION_ERRORS; when + truncated, final entry is {'truncated': True, 'total_errors': }. + """ + validator = Draft202012Validator(schema_body) + all_errors = sorted(validator.iter_errors(content), key=lambda e: e.path) + if len(all_errors) <= _MAX_VALIDATION_ERRORS: + return [_flatten_error(e) for e in all_errors] + kept = [_flatten_error(e) for e in all_errors[:_MAX_VALIDATION_ERRORS]] + kept.append({"truncated": True, "total_errors": len(all_errors)}) + return kept + + +class _SchemaFinder(Protocol): + """Minimal protocol for resolve_schema's store dependency.""" + + def find_schema(self, owner_id: str, logical_key: str) -> Entry | None: ... + + +def resolve_schema(store: _SchemaFinder, owner_id: str, family: str, version: str) -> Entry | None: + """Resolve a schema by family + version, preferring caller-owned. + + Delegates to Store.find_schema (which handles the _system fallback at + the SQL level). Returns the schema Entry or None. + """ + return store.find_schema(owner_id, compose_schema_logical_key(family, version)) + + +class SchemaInUseError(Exception): + """Raised when a schema cannot be deleted because records reference it. + + Callers at the MCP boundary translate this into a structured schema_in_use + error response with the referencing_records list and total_count. + """ + + def __init__(self, total_count: int, referencing_records: list[str]): + self.total_count = total_count + self.referencing_records = referencing_records + super().__init__(f"Cannot delete schema: {total_count} record(s) still reference it") + + +class _RefCounter(Protocol): + """Minimal protocol for assert_schema_deletable's store dependency.""" + + def count_records_referencing( + self, owner_id: str, schema_logical_key: str + ) -> tuple[int, list[str]]: ... + + +def assert_schema_deletable(store: _RefCounter, owner_id: str, schema_logical_key: str) -> None: + """Raise SchemaInUseError if any non-deleted records reference this schema.""" + count, ids = store.count_records_referencing(owner_id, schema_logical_key) + if count > 0: + raise SchemaInUseError(total_count=count, referencing_records=ids) diff --git a/tests/conftest.py b/tests/conftest.py index 4ca29ea..655f464 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -26,6 +26,7 @@ from mcp_awareness.postgres_store import PostgresStore TEST_OWNER = "test-owner" +SYSTEM_OWNER = "_system" # Set default owner for all tests before any module imports read it. os.environ["AWARENESS_DEFAULT_OWNER"] = TEST_OWNER @@ -61,5 +62,13 @@ def pg_dsn(pg_container): def store(pg_dsn): """Fresh PostgresStore for each test — tables created, then cleared after.""" s = PostgresStore(pg_dsn) + # Ensure _system user exists for cross-owner schema tests. + with s._pool.connection() as conn, conn.cursor() as cur: + cur.execute( + "INSERT INTO users (id, display_name) VALUES ('_system', 'System-managed schemas') " + "ON CONFLICT (id) DO NOTHING" + ) + conn.commit() yield s s.clear(TEST_OWNER) + s.clear(SYSTEM_OWNER) diff --git a/tests/test_cli_register_schema.py b/tests/test_cli_register_schema.py new file mode 100644 index 0000000..d3b386f --- /dev/null +++ b/tests/test_cli_register_schema.py @@ -0,0 +1,291 @@ +# mcp-awareness — ambient system awareness for AI agents +# Copyright (C) 2026 Chris Means +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU Affero General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU Affero General Public License for more details. +# +# You should have received a copy of the GNU Affero General Public License +# along with this program. If not, see . + +"""Tests for mcp-awareness-register-schema CLI.""" + +from __future__ import annotations + +import json +import tempfile +from pathlib import Path + +import pytest + + +@pytest.fixture +def system_schema_file(): + with tempfile.NamedTemporaryFile(suffix=".json", mode="w", delete=False) as f: + json.dump({"type": "object", "properties": {"name": {"type": "string"}}}, f) + path = f.name + yield path + Path(path).unlink(missing_ok=True) + + +def test_cli_register_schema_happy_path(pg_dsn, system_schema_file, monkeypatch, capsys): + """End-to-end: CLI writes a _system schema via direct store access.""" + from mcp_awareness.cli_register_schema import main + + monkeypatch.setenv("AWARENESS_DATABASE_URL", pg_dsn) + monkeypatch.setattr( + "sys.argv", + [ + "mcp-awareness-register-schema", + "--system", + "--family", + "schema:cli-test", + "--version", + "1.0.0", + "--schema-file", + system_schema_file, + "--source", + "awareness-built-in", + "--tags", + "cli,test", + "--description", + "CLI-registered test schema", + ], + ) + + # Seed _system user so insert doesn't FK-violate (conftest fixture does this for store tests; + # CLI creates its own PostgresStore so we seed manually here) + from mcp_awareness.postgres_store import PostgresStore + + tmp = PostgresStore(pg_dsn) + with tmp._pool.connection() as conn, conn.cursor() as cur: + cur.execute( + "INSERT INTO users (id, display_name) VALUES ('_system', 'System-managed schemas') " + "ON CONFLICT (id) DO NOTHING" + ) + conn.commit() + + main() + captured = capsys.readouterr() + body = json.loads(captured.out.strip()) + assert body["status"] == "ok" + assert body["logical_key"] == "schema:cli-test:1.0.0" + + # Verify entry exists in DB under _system owner + store = PostgresStore(pg_dsn) + entry = store.find_schema("any-caller", "schema:cli-test:1.0.0") + assert entry is not None + assert entry.data["learned_from"] == "cli-bootstrap" + + +def test_cli_register_schema_rejects_invalid_schema_file(pg_dsn, monkeypatch, capsys): + from mcp_awareness.cli_register_schema import main + + with tempfile.NamedTemporaryFile(suffix=".json", mode="w", delete=False) as f: + json.dump({"type": "strng"}, f) # invalid + path = f.name + + monkeypatch.setenv("AWARENESS_DATABASE_URL", pg_dsn) + monkeypatch.setattr( + "sys.argv", + [ + "mcp-awareness-register-schema", + "--system", + "--family", + "schema:bad", + "--version", + "1.0.0", + "--schema-file", + path, + "--source", + "test", + "--tags", + "", + "--description", + "bad", + ], + ) + with pytest.raises(SystemExit) as excinfo: + main() + assert excinfo.value.code == 1 + captured = capsys.readouterr() + assert "invalid_schema" in captured.err + Path(path).unlink(missing_ok=True) + + +def test_cli_register_schema_missing_db_url(monkeypatch, system_schema_file, capsys): + from mcp_awareness.cli_register_schema import main + + monkeypatch.delenv("AWARENESS_DATABASE_URL", raising=False) + monkeypatch.setattr( + "sys.argv", + [ + "mcp-awareness-register-schema", + "--system", + "--family", + "schema:test", + "--version", + "1.0.0", + "--schema-file", + system_schema_file, + "--source", + "test", + "--tags", + "", + "--description", + "test", + ], + ) + with pytest.raises(SystemExit) as excinfo: + main() + assert excinfo.value.code == 1 + captured = capsys.readouterr() + assert "AWARENESS_DATABASE_URL" in captured.err or "missing_env" in captured.err + + +def test_cli_register_schema_missing_schema_file(pg_dsn, monkeypatch, capsys): + from mcp_awareness.cli_register_schema import main + + monkeypatch.setenv("AWARENESS_DATABASE_URL", pg_dsn) + monkeypatch.setattr( + "sys.argv", + [ + "mcp-awareness-register-schema", + "--system", + "--family", + "schema:test", + "--version", + "1.0.0", + "--schema-file", + "/nonexistent/path.json", + "--source", + "test", + "--tags", + "", + "--description", + "test", + ], + ) + with pytest.raises(SystemExit) as excinfo: + main() + assert excinfo.value.code == 1 + + +def test_cli_register_schema_bad_json(monkeypatch, capsys): + """Schema file contains invalid JSON — should exit 1 with invalid_json error.""" + import tempfile + + from mcp_awareness.cli_register_schema import main + + with tempfile.NamedTemporaryFile(suffix=".json", mode="w", delete=False) as f: + f.write("{ not valid json }") + path = f.name + + monkeypatch.delenv("AWARENESS_DATABASE_URL", raising=False) + monkeypatch.setattr( + "sys.argv", + [ + "mcp-awareness-register-schema", + "--system", + "--family", + "schema:bad-json", + "--version", + "1.0.0", + "--schema-file", + path, + "--source", + "test", + "--tags", + "", + "--description", + "bad json test", + ], + ) + with pytest.raises(SystemExit) as excinfo: + main() + assert excinfo.value.code == 1 + captured = capsys.readouterr() + err = json.loads(captured.err.strip()) + assert err["error"]["code"] == "invalid_json" + Path(path).unlink(missing_ok=True) + + +def test_cli_register_schema_store_error(pg_dsn, system_schema_file, monkeypatch, capsys): + """store.add raises — should exit 1 with store_error code.""" + from mcp_awareness.cli_register_schema import main + + monkeypatch.setenv("AWARENESS_DATABASE_URL", pg_dsn) + monkeypatch.setattr( + "sys.argv", + [ + "mcp-awareness-register-schema", + "--system", + "--family", + "schema:store-err", + "--version", + "1.0.0", + "--schema-file", + system_schema_file, + "--source", + "test", + "--tags", + "", + "--description", + "store error test", + ], + ) + + # Patch PostgresStore.add to simulate a DB error + import mcp_awareness.postgres_store as ps_mod + + original_add = ps_mod.PostgresStore.add + + def _boom(self, owner_id, entry): + raise RuntimeError("simulated DB failure") + + monkeypatch.setattr(ps_mod.PostgresStore, "add", _boom) + + with pytest.raises(SystemExit) as excinfo: + main() + assert excinfo.value.code == 1 + captured = capsys.readouterr() + err = json.loads(captured.err.strip()) + assert err["error"]["code"] == "store_error" + assert "simulated DB failure" in err["error"]["message"] + + monkeypatch.setattr(ps_mod.PostgresStore, "add", original_add) + + +def test_cli_register_schema_module_runs_as_main(monkeypatch): + """Cover the `if __name__ == '__main__': main()` guard via runpy.""" + import runpy + + # No DB URL → main exits before touching the store. Covers the guard cleanly. + monkeypatch.delenv("AWARENESS_DATABASE_URL", raising=False) + monkeypatch.setattr( + "sys.argv", + [ + "mcp-awareness-register-schema", + "--system", + "--family", + "s:test", + "--version", + "1.0.0", + "--schema-file", + "/nonexistent/path.json", + "--source", + "test", + "--tags", + "", + "--description", + "test", + ], + ) + with pytest.raises(SystemExit): + runpy.run_module("mcp_awareness.cli_register_schema", run_name="__main__") diff --git a/tests/test_helpers.py b/tests/test_helpers.py index ce9ff4c..6eef5db 100644 --- a/tests/test_helpers.py +++ b/tests/test_helpers.py @@ -153,6 +153,57 @@ def test_multiple_extra_params(self): assert "sslmode=verify-full" in result assert "connect_timeout=10" in result + +class TestErrorResponseExtras: + """Test that _error_response merges **extras into the error envelope.""" + + def test_extras_appear_in_payload(self): + """Extra keyword arguments must be present in the raised ToolError JSON.""" + from mcp.server.fastmcp.exceptions import ToolError + + with pytest.raises(ToolError) as excinfo: + _error_response( + "schema_not_found", + "No matching schema", + retryable=False, + schema_ref="schema:thing", + schema_version="1.0.0", + searched_owners=["alice", "_system"], + ) + payload = json.loads(str(excinfo.value)) + err = payload["error"] + assert err["code"] == "schema_not_found" + assert err["schema_ref"] == "schema:thing" + assert err["schema_version"] == "1.0.0" + assert err["searched_owners"] == ["alice", "_system"] + + def test_extras_do_not_override_fixed_fields(self): + """Extras cannot clobber the mandatory fixed fields.""" + from mcp.server.fastmcp.exceptions import ToolError + + with pytest.raises(ToolError) as excinfo: + _error_response( + "some_error", + "Some message", + retryable=True, + extra_field="extra_value", + ) + err = json.loads(str(excinfo.value))["error"] + assert err["code"] == "some_error" + assert err["message"] == "Some message" + assert err["retryable"] is True + assert err["extra_field"] == "extra_value" + + def test_no_extras_still_works(self): + """Calling without extras should behave as before.""" + from mcp.server.fastmcp.exceptions import ToolError + + with pytest.raises(ToolError) as excinfo: + _error_response("plain_error", "Plain message", retryable=False) + err = json.loads(str(excinfo.value))["error"] + assert err["code"] == "plain_error" + assert "schema_ref" not in err + def test_unix_socket_host(self): """Unix socket path goes in query string, not netloc.""" dsn = "host=/var/run/postgresql dbname=db user=u" diff --git a/tests/test_rls.py b/tests/test_rls.py index 4e24cd1..a5803b1 100644 --- a/tests/test_rls.py +++ b/tests/test_rls.py @@ -48,11 +48,26 @@ def rls_store(pg_dsn: str) -> PostgresStore: cur.execute(f"DROP POLICY IF EXISTS owner_isolation ON {table}") cur.execute(f"DROP POLICY IF EXISTS owner_insert ON {table}") - # Create policies - cur.execute(f""" - CREATE POLICY owner_isolation ON {table} - USING (owner_id = current_setting('app.current_user', true)) - """) + # Create policies — entries gets the _system-schema read carve-out + # added in migration n9i0j1k2l3m4 so non-privileged owners can see + # built-in schemas. The WITH CHECK clause is explicit because + # permissive policies combine with OR, and without it the USING + # clause would leak into the write path (PR #287 Round-3 finding). + # Other tables keep strict owner isolation. + if table == "entries": + cur.execute(f""" + CREATE POLICY owner_isolation ON {table} + USING ( + owner_id = current_setting('app.current_user', true) + OR (owner_id = '_system' AND type = 'schema') + ) + WITH CHECK (owner_id = current_setting('app.current_user', true)) + """) + else: + cur.execute(f""" + CREATE POLICY owner_isolation ON {table} + USING (owner_id = current_setting('app.current_user', true)) + """) cur.execute(f""" CREATE POLICY owner_insert ON {table} FOR INSERT @@ -195,3 +210,188 @@ def test_action_logs_isolated(self, rls_store: PostgresStore) -> None: bob_actions = rls_store.get_actions("bob", entry_id=entry.id) assert len(alice_actions) >= 1 assert len(bob_actions) == 0 + + +class TestRLSSystemSchemaFallback: + """RLS carve-out for `_system`-owned schema reads (migration n9i0j1k2l3m4). + + Regression coverage for the PR #287 Round-2 blocker: the strict + `owner_id = current_user` USING clause made `_system` schemas invisible + to every non-superuser owner, breaking the CLI bootstrap + find_schema + fallback in production. These tests run under FORCE ROW LEVEL SECURITY, + which simulates the production non-superuser role. + """ + + def test_system_schema_visible_to_any_owner(self, rls_store: PostgresStore) -> None: + """A `_system`-owned schema row is readable by `alice` via find_schema.""" + from mcp_awareness.schema import Entry + + schema_entry = Entry( + id=make_id(), + type=EntryType.SCHEMA, + source="system-bootstrap", + tags=[], + created=now_utc(), + expires=None, + data={ + "family": "schema:shared-thing", + "version": "1.0.0", + "schema": {"type": "object"}, + "description": "shared", + }, + logical_key="schema:shared-thing:1.0.0", + ) + rls_store.add("_system", schema_entry) + + found = rls_store.find_schema("alice", "schema:shared-thing:1.0.0") + assert found is not None + assert found.id == schema_entry.id + assert found.data["family"] == "schema:shared-thing" + + def test_caller_schema_wins_over_system(self, rls_store: PostgresStore) -> None: + """If alice has her own copy, find_schema returns that instead of _system's.""" + from mcp_awareness.schema import Entry + + system_entry = Entry( + id=make_id(), + type=EntryType.SCHEMA, + source="system-bootstrap", + tags=[], + created=now_utc(), + expires=None, + data={"family": "schema:override", "version": "1.0.0", "schema": {"type": "object"}}, + logical_key="schema:override:1.0.0", + ) + rls_store.add("_system", system_entry) + + alice_entry = Entry( + id=make_id(), + type=EntryType.SCHEMA, + source="alice-source", + tags=[], + created=now_utc(), + expires=None, + data={"family": "schema:override", "version": "1.0.0", "schema": {"type": "string"}}, + logical_key="schema:override:1.0.0", + ) + rls_store.add("alice", alice_entry) + + found = rls_store.find_schema("alice", "schema:override:1.0.0") + assert found is not None + assert found.id == alice_entry.id + + def test_system_non_schema_rows_remain_invisible(self, rls_store: PostgresStore) -> None: + """The carve-out is narrow: only `type = 'schema'`. Other _system rows stay hidden.""" + from mcp_awareness.schema import Entry + + note_entry = Entry( + id=make_id(), + type=EntryType.NOTE, + source="sys-note", + tags=["rls-sys-note"], + created=now_utc(), + expires=None, + data={"description": "system-only"}, + ) + rls_store.add("_system", note_entry) + + alice_view = rls_store.get_entries("alice", tags=["rls-sys-note"]) + assert alice_view == [] + + def test_nonsuperuser_cannot_insert_as_system(self, rls_store: PostgresStore) -> None: + """Non-privileged owners must not be able to write to `_system`. + + This exercises the WITH CHECK clause against a real non-superuser role + — the production deployment target. Container superusers have + BYPASSRLS implicitly, so the raw INSERT against the default role + would silently succeed and leave the policy untested. We create a + NOSUPERUSER NOBYPASSRLS role, GRANT only what's needed, then + ``SET LOCAL ROLE`` onto it for the duration of the test transaction. + + Regression for PR #287 Round-3: the original migration omitted the + explicit WITH CHECK, so the `_system`-schema carve-out in USING + leaked into INSERT/UPDATE via the FOR ALL permissive policy. + """ + from mcp_awareness.schema import Entry + + entry = Entry( + id=make_id(), + type=EntryType.SCHEMA, + source="impostor", + tags=[], + created=now_utc(), + expires=None, + data={"family": "schema:pwned", "version": "1.0.0", "schema": {"type": "object"}}, + logical_key="schema:pwned:1.0.0", + ) + + # Provision the non-superuser role once per test (idempotent). Use a + # separate connection so the CREATE/GRANT commits regardless of the + # main test transaction's outcome. + with rls_store._pool.connection() as conn, conn.cursor() as cur: + conn.autocommit = True + cur.execute( + "DO $$ BEGIN " + "IF NOT EXISTS (SELECT 1 FROM pg_roles WHERE rolname='rls_prod_sim') THEN " + " CREATE ROLE rls_prod_sim NOSUPERUSER NOBYPASSRLS NOINHERIT; " + "END IF; END $$" + ) + cur.execute("GRANT USAGE ON SCHEMA public TO rls_prod_sim") + cur.execute("GRANT SELECT, INSERT, UPDATE, DELETE ON entries TO rls_prod_sim") + + # Now run the actual test inside a transaction as the simulated prod role. + with ( + pytest.raises(psycopg.errors.InsufficientPrivilege), + rls_store._pool.connection() as conn, + conn.transaction(), + conn.cursor() as cur, + ): + cur.execute("SET LOCAL ROLE rls_prod_sim") + cur.execute("SELECT set_config('app.current_user', 'alice', true)") + cur.execute( + "INSERT INTO entries (id, owner_id, type, source, created, tags, data," + " logical_key, language) VALUES (%s, '_system', 'schema', %s, now(), '[]'," + " %s::jsonb, %s, 'english')", + (entry.id, entry.source, '{"family": "schema:pwned"}', entry.logical_key), + ) + + def test_nonsuperuser_cannot_update_system_schema(self, rls_store: PostgresStore) -> None: + """Same WITH CHECK guard — an existing `_system` schema row cannot be + tampered with by a non-privileged owner via UPDATE.""" + from mcp_awareness.schema import Entry + + seed = Entry( + id=make_id(), + type=EntryType.SCHEMA, + source="system-bootstrap", + tags=[], + created=now_utc(), + expires=None, + data={"family": "schema:readonly", "version": "1.0.0", "schema": {"type": "object"}}, + logical_key="schema:readonly:1.0.0", + ) + rls_store.add("_system", seed) + + with rls_store._pool.connection() as conn, conn.cursor() as cur: + conn.autocommit = True + cur.execute( + "DO $$ BEGIN " + "IF NOT EXISTS (SELECT 1 FROM pg_roles WHERE rolname='rls_prod_sim') THEN " + " CREATE ROLE rls_prod_sim NOSUPERUSER NOBYPASSRLS NOINHERIT; " + "END IF; END $$" + ) + cur.execute("GRANT USAGE ON SCHEMA public TO rls_prod_sim") + cur.execute("GRANT SELECT, INSERT, UPDATE, DELETE ON entries TO rls_prod_sim") + + with ( + pytest.raises(psycopg.errors.InsufficientPrivilege), + rls_store._pool.connection() as conn, + conn.transaction(), + conn.cursor() as cur, + ): + cur.execute("SET LOCAL ROLE rls_prod_sim") + cur.execute("SELECT set_config('app.current_user', 'alice', true)") + cur.execute( + "UPDATE entries SET data = data || '{\"tampered\": true}'::jsonb" + " WHERE owner_id = '_system' AND type = 'schema'" + ) diff --git a/tests/test_schema.py b/tests/test_schema.py index b631ce1..821a349 100644 --- a/tests/test_schema.py +++ b/tests/test_schema.py @@ -247,3 +247,13 @@ def test_to_list_dict_intention_includes_goal_state(): assert d["goal"] == "Pick up milk" assert d["state"] == "pending" assert "data" not in d + + +def test_entry_type_schema_value(): + assert EntryType.SCHEMA.value == "schema" + assert EntryType("schema") is EntryType.SCHEMA + + +def test_entry_type_record_value(): + assert EntryType.RECORD.value == "record" + assert EntryType("record") is EntryType.RECORD diff --git a/tests/test_server.py b/tests/test_server.py index 8264c7d..1e8922e 100644 --- a/tests/test_server.py +++ b/tests/test_server.py @@ -4000,6 +4000,8 @@ class TestWriteResponseShapes: # "id" here means the caller-supplied entry_id (lookup target), # NOT a server-generated entry id like other tools' responses "update_intention": {"id"}, + "register_schema": set(), # response is server-derived id + logical_key + "create_record": set(), # response contains only server-derived fields } # Tools registered on _srv.mcp that are NOT write tools — explicitly @@ -4169,6 +4171,34 @@ async def _invoke_with_sentinels(self, tool_name: str, sentinels: set[str]) -> s state="fired", reason=s(sentinels, "reason"), ) + if tool_name == "register_schema": + return await server_mod.register_schema( + source=s(sentinels, "src"), + tags=[s(sentinels, "tag")], + description=s(sentinels, "desc"), + family="schema:sentinel-test", + version="1.0.0", + schema={"type": "object"}, + ) + if tool_name == "create_record": + # Register a schema first (not sentinel-wrapped — it's a prerequisite) + await server_mod.register_schema( + source="setup", + tags=[], + description="setup", + family="schema:sentinel-record-test", + version="1.0.0", + schema={"type": "object"}, + ) + return await server_mod.create_record( + source=s(sentinels, "src"), + tags=[s(sentinels, "tag")], + description=s(sentinels, "desc"), + logical_key="sentinel-record-key", + schema_ref="schema:sentinel-record-test", + schema_version="1.0.0", + content={"key": "value"}, + ) raise ValueError(f"Unknown tool in registry: {tool_name}") @pytest.mark.anyio diff --git a/tests/test_store.py b/tests/test_store.py index 892f18a..18ec921 100644 --- a/tests/test_store.py +++ b/tests/test_store.py @@ -3260,3 +3260,159 @@ def test_get_all_patterns(store): result = store.get_all_patterns(TEST_OWNER) assert "nas" in result assert "" in result + + +# ------------------------------------------------------------------ +# find_schema tests +# ------------------------------------------------------------------ + +SYSTEM_OWNER = "_system" + + +def _make_schema_entry(logical_key: str, schema_body: dict) -> Entry: + return Entry( + id=make_id(), + type=EntryType.SCHEMA, + source="test", + tags=[], + created=now_utc(), + data={ + "family": logical_key.rsplit(":", 1)[0] if ":" in logical_key else logical_key, + "version": logical_key.rsplit(":", 1)[1] if ":" in logical_key else "1.0.0", + "schema": schema_body, + "description": "test schema", + "learned_from": "test", + }, + logical_key=logical_key, + ) + + +def test_find_schema_returns_caller_owned(store): + """find_schema returns an entry when caller owns it.""" + entry = _make_schema_entry("s:test:1.0.0", {"type": "object"}) + store.add(TEST_OWNER, entry) + found = store.find_schema(TEST_OWNER, "s:test:1.0.0") + assert found is not None + assert found.data["family"] == "s:test" + assert found.data["schema"] == {"type": "object"} + + +def test_find_schema_system_fallback(store): + """find_schema falls back to _system-owned schema when caller has none.""" + entry = _make_schema_entry("s:test:1.0.0", {"type": "object"}) + store.add(SYSTEM_OWNER, entry) + found = store.find_schema(TEST_OWNER, "s:test:1.0.0") + assert found is not None + assert found.data["schema"] == {"type": "object"} + + +def test_find_schema_caller_wins_over_system(store): + """find_schema prefers caller's schema over _system's when both exist.""" + system_entry = _make_schema_entry("s:test:1.0.0", {"type": "object"}) + caller_entry = _make_schema_entry("s:test:1.0.0", {"type": "string"}) + store.add(SYSTEM_OWNER, system_entry) + store.add(TEST_OWNER, caller_entry) + found = store.find_schema(TEST_OWNER, "s:test:1.0.0") + assert found is not None + assert found.data["schema"] == {"type": "string"} + + +def test_find_schema_returns_none_when_missing(store): + """find_schema returns None when no matching schema exists for caller or _system.""" + assert store.find_schema(TEST_OWNER, "s:nonexistent:1.0.0") is None + + +def test_find_schema_excludes_soft_deleted(store): + """find_schema does not return soft-deleted entries.""" + entry = _make_schema_entry("s:test:1.0.0", {"type": "object"}) + stored = store.add(TEST_OWNER, entry) + store.soft_delete_by_id(TEST_OWNER, stored.id) + assert store.find_schema(TEST_OWNER, "s:test:1.0.0") is None + + +# ------------------------------------------------------------------ +# count_records_referencing tests +# ------------------------------------------------------------------ + + +def _make_record_entry(logical_key: str, schema_ref: str, schema_version: str, content) -> Entry: + return Entry( + id=make_id(), + type=EntryType.RECORD, + source="test", + tags=[], + created=now_utc(), + data={ + "schema_ref": schema_ref, + "schema_version": schema_version, + "content": content, + "description": "test record", + "learned_from": "test", + }, + logical_key=logical_key, + ) + + +def test_count_records_referencing_returns_zero_when_none(store): + count, ids = store.count_records_referencing(TEST_OWNER, "s:test:1.0.0") + assert count == 0 + assert ids == [] + + +def test_count_records_referencing_counts_matching_records(store): + for i in range(3): + store.add(TEST_OWNER, _make_record_entry(f"rec-{i}", "s:test", "1.0.0", {"i": i})) + count, ids = store.count_records_referencing(TEST_OWNER, "s:test:1.0.0") + assert count == 3 + assert len(ids) == 3 + + +def test_count_records_referencing_excludes_soft_deleted(store): + entry = _make_record_entry("rec-1", "s:test", "1.0.0", {}) + store.add(TEST_OWNER, entry) + store.soft_delete_by_id(TEST_OWNER, entry.id) + count, ids = store.count_records_referencing(TEST_OWNER, "s:test:1.0.0") + assert count == 0 + assert ids == [] + + +def test_count_records_referencing_ignores_other_versions(store): + store.add(TEST_OWNER, _make_record_entry("rec-1", "s:test", "1.0.0", {})) + store.add(TEST_OWNER, _make_record_entry("rec-2", "s:test", "2.0.0", {})) + count, _ = store.count_records_referencing(TEST_OWNER, "s:test:1.0.0") + assert count == 1 + + +def test_count_records_referencing_caps_id_list_at_ten(store): + for i in range(15): + store.add(TEST_OWNER, _make_record_entry(f"rec-{i}", "s:test", "1.0.0", {"i": i})) + count, ids = store.count_records_referencing(TEST_OWNER, "s:test:1.0.0") + assert count == 15 + assert len(ids) == 10 + + +def test_count_records_referencing_rejects_malformed_key(store): + """The store boundary asserts the ref:version invariant for defense-in-depth.""" + import pytest + + with pytest.raises(AssertionError, match="must contain ':'"): + store.count_records_referencing(TEST_OWNER, "no-colon") + with pytest.raises(AssertionError, match="empty version"): + store.count_records_referencing(TEST_OWNER, "s:test:") + with pytest.raises(AssertionError, match="empty ref"): + store.count_records_referencing(TEST_OWNER, ":1.0.0") + + +def test_system_user_exists_after_migration_idempotent(store): + """The conftest fixture inserts _system — verifies ON CONFLICT DO NOTHING semantics.""" + with store._pool.connection() as conn, conn.cursor() as cur: + cur.execute( + "INSERT INTO users (id, display_name) VALUES ('_system', 'Re-insert') " + "ON CONFLICT (id) DO NOTHING" + ) + conn.commit() + cur.execute("SELECT COUNT(*) FROM users WHERE id = '_system'") + row = cur.fetchone() + # Cursor may be dict_row — handle both styles + count = row["count"] if isinstance(row, dict) else row[0] + assert count == 1 diff --git a/tests/test_tools_schema_record.py b/tests/test_tools_schema_record.py new file mode 100644 index 0000000..5fded75 --- /dev/null +++ b/tests/test_tools_schema_record.py @@ -0,0 +1,895 @@ +# mcp-awareness — ambient system awareness for AI agents +# Copyright (C) 2026 Chris Means +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU Affero General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU Affero General Public License for more details. +# +# You should have received a copy of the GNU Affero General Public License +# along with this program. If not, see . + +"""Integration tests for schema/record MCP tool handlers. + +Uses testcontainers Postgres + direct tool-function calls via the server's +_owner_id / store accessors (both monkeypatched for tests). +""" + +from __future__ import annotations + +import json + +import pytest + +from mcp_awareness.schema import EntryType + +TEST_OWNER = "test-owner" + + +@pytest.fixture +def configured_server(store, monkeypatch): + """Wire the FastMCP server-module helpers to the testcontainers store and owner.""" + import mcp_awareness.server as srv + + monkeypatch.setattr(srv, "store", store) + monkeypatch.setattr(srv, "_owner_id", lambda: TEST_OWNER) + yield srv + + +def _parse_tool_error(excinfo): + """Parse the structured JSON envelope from a ToolError.""" + return json.loads(str(excinfo.value)) + + +@pytest.mark.anyio +async def test_register_schema_happy_path(configured_server): + from mcp_awareness.tools import register_schema + + response = await register_schema( + source="test", + tags=["schema"], + description="test schema", + family="schema:test-thing", + version="1.0.0", + schema={"type": "object", "properties": {"name": {"type": "string"}}, "required": ["name"]}, + ) + body = json.loads(response) + assert body["status"] == "ok" + assert body["logical_key"] == "schema:test-thing:1.0.0" + assert "id" in body + + +@pytest.mark.anyio +async def test_register_schema_rejects_invalid_schema(configured_server): + from mcp.server.fastmcp.exceptions import ToolError + + from mcp_awareness.tools import register_schema + + with pytest.raises(ToolError) as excinfo: + await register_schema( + source="test", + tags=[], + description="bad schema", + family="schema:bad", + version="1.0.0", + schema={"type": "strng"}, # typo — not a valid JSON Schema type + ) + err = _parse_tool_error(excinfo)["error"] + assert err["code"] == "invalid_schema" + + +@pytest.mark.anyio +async def test_register_schema_rejects_duplicate_family_version(configured_server): + from mcp.server.fastmcp.exceptions import ToolError + + from mcp_awareness.tools import register_schema + + await register_schema( + source="test", + tags=[], + description="v1", + family="schema:dup", + version="1.0.0", + schema={"type": "object"}, + ) + with pytest.raises(ToolError) as excinfo: + await register_schema( + source="test", + tags=[], + description="v1 again", + family="schema:dup", + version="1.0.0", + schema={"type": "object"}, + ) + err = _parse_tool_error(excinfo)["error"] + assert err["code"] == "schema_already_exists" + # Structured extras — callers can locate the existing entry without parsing + # the human-readable message. Matches design-doc error-code table. + assert err["logical_key"] == "schema:dup:1.0.0" + assert err["existing_id"] # non-empty; first-register's entry id + + +@pytest.mark.anyio +async def test_register_schema_reraises_non_unique_exception(configured_server, monkeypatch): + """register_schema re-raises generic exceptions that are not unique violations.""" + import mcp_awareness.server as srv + from mcp_awareness.tools import register_schema + + original_add = srv.store.add + + def _fake_add(owner_id, entry): + raise RuntimeError("connection refused") + + monkeypatch.setattr(srv.store, "add", _fake_add) + + with pytest.raises(RuntimeError, match="connection refused"): + await register_schema( + source="test", + tags=[], + description="non-unique exception test", + family="schema:reraise", + version="1.0.0", + schema={"type": "object"}, + ) + + monkeypatch.setattr(srv.store, "add", original_add) + + +@pytest.mark.anyio +async def test_create_record_validate_raises_unexpected_error(configured_server, monkeypatch): + """create_record reports validation_error if validate_record_content raises unexpectedly.""" + from mcp.server.fastmcp.exceptions import ToolError + + from mcp_awareness.tools import create_record, register_schema + + await register_schema( + source="test", + tags=[], + description="s", + family="schema:except", + version="1.0.0", + schema={"type": "object"}, + ) + + import mcp_awareness.validation as validation_mod + + monkeypatch.setattr( + validation_mod, + "validate_record_content", + lambda _s, _c: (_ for _ in ()).throw(RuntimeError("internal jsonschema error")), + ) + + with pytest.raises(ToolError) as excinfo: + await create_record( + source="test", + tags=[], + description="error case", + logical_key="except-rec", + schema_ref="schema:except", + schema_version="1.0.0", + content={}, + ) + err = json.loads(str(excinfo.value))["error"] + assert err["code"] == "validation_error" + assert "internal jsonschema error" in err["message"] + + +@pytest.mark.anyio +async def test_register_schema_rejects_empty_family(configured_server): + from mcp.server.fastmcp.exceptions import ToolError + + from mcp_awareness.tools import register_schema + + with pytest.raises(ToolError) as excinfo: + await register_schema( + source="test", + tags=[], + description="bad", + family="", + version="1.0.0", + schema={"type": "object"}, + ) + err = _parse_tool_error(excinfo)["error"] + assert err["code"] == "invalid_parameter" + + +@pytest.mark.anyio +async def test_register_schema_rejects_empty_version(configured_server): + from mcp.server.fastmcp.exceptions import ToolError + + from mcp_awareness.tools import register_schema + + with pytest.raises(ToolError) as excinfo: + await register_schema( + source="test", + tags=[], + description="bad", + family="schema:test", + version="", + schema={"type": "object"}, + ) + err = _parse_tool_error(excinfo)["error"] + assert err["code"] == "invalid_parameter" + + +@pytest.mark.anyio +async def test_create_record_happy_path(configured_server): + from mcp_awareness.tools import create_record, register_schema + + await register_schema( + source="test", + tags=[], + description="s", + family="schema:thing", + version="1.0.0", + schema={"type": "object", "properties": {"name": {"type": "string"}}, "required": ["name"]}, + ) + response = await create_record( + source="test", + tags=[], + description="a thing", + logical_key="thing-one", + schema_ref="schema:thing", + schema_version="1.0.0", + content={"name": "widget"}, + ) + body = json.loads(response) + assert body["status"] == "ok" + assert body["action"] == "created" + assert "id" in body + + +@pytest.mark.anyio +async def test_create_record_rejects_unknown_schema(configured_server): + from mcp.server.fastmcp.exceptions import ToolError + + from mcp_awareness.tools import create_record + + with pytest.raises(ToolError) as excinfo: + await create_record( + source="test", + tags=[], + description="orphan", + logical_key="thing-one", + schema_ref="schema:does-not-exist", + schema_version="1.0.0", + content={"name": "widget"}, + ) + err = json.loads(str(excinfo.value))["error"] + assert err["code"] == "schema_not_found" + assert err["searched_owners"] == [TEST_OWNER, "_system"] + + +@pytest.mark.anyio +async def test_create_record_surfaces_validation_errors(configured_server): + from mcp.server.fastmcp.exceptions import ToolError + + from mcp_awareness.tools import create_record, register_schema + + await register_schema( + source="test", + tags=[], + description="s", + family="schema:person", + version="1.0.0", + schema={ + "type": "object", + "properties": {"name": {"type": "string"}, "age": {"type": "integer"}}, + "required": ["name"], + }, + ) + with pytest.raises(ToolError) as excinfo: + await create_record( + source="test", + tags=[], + description="bad person", + logical_key="p1", + schema_ref="schema:person", + schema_version="1.0.0", + content={"age": "thirty"}, + ) + err = json.loads(str(excinfo.value))["error"] + assert err["code"] == "validation_failed" + validators = {ve["validator"] for ve in err["validation_errors"]} + assert "required" in validators + assert "type" in validators + + +@pytest.mark.anyio +async def test_create_record_upsert_on_same_logical_key(configured_server): + from mcp_awareness.tools import create_record, register_schema + + await register_schema( + source="test", + tags=[], + description="s", + family="schema:thing", + version="1.0.0", + schema={"type": "object"}, + ) + r1 = json.loads( + await create_record( + source="test", + tags=[], + description="v1", + logical_key="thing-one", + schema_ref="schema:thing", + schema_version="1.0.0", + content={"v": 1}, + ) + ) + assert r1["action"] == "created" + r2 = json.loads( + await create_record( + source="test", + tags=[], + description="v2", + logical_key="thing-one", + schema_ref="schema:thing", + schema_version="1.0.0", + content={"v": 2}, + ) + ) + assert r2["action"] == "updated" + assert r2["id"] == r1["id"] + + +@pytest.mark.anyio +async def test_create_record_uses_system_schema_fallback(configured_server, store): + """A record can reference a schema owned by _system, not the caller.""" + from mcp_awareness.schema import Entry, make_id, now_utc + from mcp_awareness.tools import create_record + + # Seed _system schema directly via store (not via tool — tool writes caller's owner) + store.add( + "_system", + Entry( + id=make_id(), + type=EntryType.SCHEMA, + source="system", + tags=["system"], + created=now_utc(), + expires=None, + data={ + "family": "schema:system-thing", + "version": "1.0.0", + "schema": {"type": "object"}, + "description": "system-seeded", + "learned_from": "cli-bootstrap", + }, + logical_key="schema:system-thing:1.0.0", + ), + ) + + response = await create_record( + source="test", + tags=[], + description="mine", + logical_key="mine-1", + schema_ref="schema:system-thing", + schema_version="1.0.0", + content={"any": "thing"}, + ) + body = json.loads(response) + assert body["status"] == "ok" + assert body["action"] == "created" + + +@pytest.mark.anyio +async def test_update_entry_rejects_schema_update(configured_server): + from mcp.server.fastmcp.exceptions import ToolError + + from mcp_awareness.tools import register_schema, update_entry + + resp = json.loads( + await register_schema( + source="test", + tags=[], + description="s", + family="schema:thing", + version="1.0.0", + schema={"type": "object"}, + ) + ) + with pytest.raises(ToolError) as excinfo: + await update_entry(entry_id=resp["id"], description="new desc") + err = json.loads(str(excinfo.value))["error"] + assert err["code"] == "schema_immutable" + + +@pytest.mark.anyio +async def test_update_entry_record_content_revalidates_valid(configured_server): + from mcp_awareness.tools import create_record, register_schema, update_entry + + await register_schema( + source="test", + tags=[], + description="s", + family="schema:thing", + version="1.0.0", + schema={"type": "object", "properties": {"name": {"type": "string"}}, "required": ["name"]}, + ) + r = json.loads( + await create_record( + source="test", + tags=[], + description="r", + logical_key="r1", + schema_ref="schema:thing", + schema_version="1.0.0", + content={"name": "good"}, + ) + ) + # Valid content update — passes re-validation + await update_entry(entry_id=r["id"], content={"name": "still-good"}) + # Content shape must remain native JSON (dict), not a JSON-encoded string — + # matches the create path so downstream consumers see a stable wire shape. + stored = configured_server.store.get_entry_by_id(TEST_OWNER, r["id"]) + assert stored is not None + assert stored.data["content"] == {"name": "still-good"} + assert isinstance(stored.data["content"], dict) + + +@pytest.mark.anyio +async def test_update_entry_record_preserves_primitive_content_shape(configured_server): + """Primitive (int/array) record content must also keep native JSON shape after update.""" + from mcp_awareness.tools import create_record, register_schema, update_entry + + await register_schema( + source="test", + tags=[], + description="s", + family="schema:counter", + version="1.0.0", + schema={"type": "integer"}, + ) + r = json.loads( + await create_record( + source="test", + tags=[], + description="c", + logical_key="c1", + schema_ref="schema:counter", + schema_version="1.0.0", + content=42, + ) + ) + await update_entry(entry_id=r["id"], content=99) + stored = configured_server.store.get_entry_by_id(TEST_OWNER, r["id"]) + assert stored is not None + assert stored.data["content"] == 99 + assert isinstance(stored.data["content"], int) + + +@pytest.mark.anyio +async def test_update_entry_record_content_rejects_invalid(configured_server): + from mcp.server.fastmcp.exceptions import ToolError + + from mcp_awareness.tools import create_record, register_schema, update_entry + + await register_schema( + source="test", + tags=[], + description="s", + family="schema:thing", + version="1.0.0", + schema={"type": "object", "properties": {"name": {"type": "string"}}, "required": ["name"]}, + ) + r = json.loads( + await create_record( + source="test", + tags=[], + description="r", + logical_key="r1", + schema_ref="schema:thing", + schema_version="1.0.0", + content={"name": "good"}, + ) + ) + with pytest.raises(ToolError) as excinfo: + await update_entry(entry_id=r["id"], content={"name": 123}) + err = json.loads(str(excinfo.value))["error"] + assert err["code"] == "validation_failed" + + +@pytest.mark.anyio +async def test_update_entry_record_non_content_skips_revalidation(configured_server): + from mcp_awareness.tools import create_record, register_schema, update_entry + + await register_schema( + source="test", + tags=[], + description="s", + family="schema:thing", + version="1.0.0", + schema={"type": "object", "properties": {"name": {"type": "string"}}, "required": ["name"]}, + ) + r = json.loads( + await create_record( + source="test", + tags=[], + description="orig", + logical_key="r1", + schema_ref="schema:thing", + schema_version="1.0.0", + content={"name": "good"}, + ) + ) + # Description-only update skips re-validation + await update_entry(entry_id=r["id"], description="updated desc") + + +@pytest.mark.anyio +async def test_delete_entry_schema_with_no_records_succeeds(configured_server): + from mcp_awareness.tools import delete_entry, register_schema + + resp = json.loads( + await register_schema( + source="test", + tags=[], + description="s", + family="schema:thing", + version="1.0.0", + schema={"type": "object"}, + ) + ) + # No records → soft-delete succeeds + await delete_entry(entry_id=resp["id"]) + # Verify soft-deleted: find_schema returns None + assert configured_server.store.find_schema(TEST_OWNER, "schema:thing:1.0.0") is None + + +@pytest.mark.anyio +async def test_delete_entry_schema_with_records_rejected(configured_server): + from mcp.server.fastmcp.exceptions import ToolError + + from mcp_awareness.tools import create_record, delete_entry, register_schema + + resp = json.loads( + await register_schema( + source="test", + tags=[], + description="s", + family="schema:thing", + version="1.0.0", + schema={"type": "object"}, + ) + ) + await create_record( + source="test", + tags=[], + description="r", + logical_key="r1", + schema_ref="schema:thing", + schema_version="1.0.0", + content={}, + ) + with pytest.raises(ToolError) as excinfo: + await delete_entry(entry_id=resp["id"]) + err = json.loads(str(excinfo.value))["error"] + assert err["code"] == "schema_in_use" + assert len(err["referencing_records"]) == 1 + assert err["total_count"] == 1 + + +@pytest.mark.anyio +async def test_delete_entry_schema_allowed_after_records_deleted(configured_server): + from mcp_awareness.tools import create_record, delete_entry, register_schema + + schema_resp = json.loads( + await register_schema( + source="test", + tags=[], + description="s", + family="schema:thing", + version="1.0.0", + schema={"type": "object"}, + ) + ) + record_resp = json.loads( + await create_record( + source="test", + tags=[], + description="r", + logical_key="r1", + schema_ref="schema:thing", + schema_version="1.0.0", + content={}, + ) + ) + # Soft-delete the record first + await delete_entry(entry_id=record_resp["id"]) + # Now schema delete succeeds + await delete_entry(entry_id=schema_resp["id"]) + + +@pytest.mark.anyio +async def test_cross_owner_schema_invisible(configured_server, store, monkeypatch): + """Owner A registers a schema; Owner B cannot resolve it.""" + from mcp.server.fastmcp.exceptions import ToolError + + import mcp_awareness.server as srv + from mcp_awareness.tools import create_record, register_schema + + # Owner A (default TEST_OWNER) registers a schema + await register_schema( + source="test", + tags=[], + description="A's schema", + family="schema:mine", + version="1.0.0", + schema={"type": "object"}, + ) + + # Switch to Owner B by re-patching the _owner_id accessor + monkeypatch.setattr(srv, "_owner_id", lambda: "other-owner") + with pytest.raises(ToolError) as excinfo: + await create_record( + source="test", + tags=[], + description="B's attempt", + logical_key="r-b", + schema_ref="schema:mine", + schema_version="1.0.0", + content={}, + ) + err = json.loads(str(excinfo.value))["error"] + assert err["code"] == "schema_not_found" + + +@pytest.mark.anyio +async def test_both_owners_see_system_schema(configured_server, store, monkeypatch): + """Both A and B can use a _system schema.""" + import mcp_awareness.server as srv + from mcp_awareness.schema import Entry, make_id, now_utc + from mcp_awareness.tools import create_record + + # Seed _system schema directly + store.add( + "_system", + Entry( + id=make_id(), + type=EntryType.SCHEMA, + source="system", + tags=["system"], + created=now_utc(), + expires=None, + data={ + "family": "schema:shared", + "version": "1.0.0", + "schema": {"type": "object"}, + "description": "shared", + "learned_from": "cli-bootstrap", + }, + logical_key="schema:shared:1.0.0", + ), + ) + + # A creates a record against _system schema + a_resp = json.loads( + await create_record( + source="test", + tags=[], + description="A's record", + logical_key="rec-a", + schema_ref="schema:shared", + schema_version="1.0.0", + content={"who": "alice"}, + ) + ) + assert a_resp["status"] == "ok" + + # Switch to Owner B + monkeypatch.setattr(srv, "_owner_id", lambda: "bob") + b_resp = json.loads( + await create_record( + source="test", + tags=[], + description="B's record", + logical_key="rec-b", + schema_ref="schema:shared", + schema_version="1.0.0", + content={"who": "bob"}, + ) + ) + assert b_resp["status"] == "ok" + + +@pytest.mark.anyio +async def test_caller_schema_overrides_system(configured_server, store, monkeypatch): + """When both _system and caller have the same logical_key, caller's version wins.""" + + from mcp.server.fastmcp.exceptions import ToolError + + from mcp_awareness.schema import Entry, make_id, now_utc + from mcp_awareness.tools import create_record, register_schema + + # _system schema allows integer only + store.add( + "_system", + Entry( + id=make_id(), + type=EntryType.SCHEMA, + source="system", + tags=["system"], + created=now_utc(), + expires=None, + data={ + "family": "schema:override", + "version": "1.0.0", + "schema": {"type": "integer"}, + "description": "system strict", + "learned_from": "cli-bootstrap", + }, + logical_key="schema:override:1.0.0", + ), + ) + + # Caller schema allows string only — overrides _system + await register_schema( + source="test", + tags=[], + description="caller's permissive", + family="schema:override", + version="1.0.0", + schema={"type": "string"}, + ) + + # Caller's record with a STRING should pass (caller's schema wins) + resp = json.loads( + await create_record( + source="test", + tags=[], + description="caller-wins", + logical_key="rec-override", + schema_ref="schema:override", + schema_version="1.0.0", + content="string-value", + ) + ) + assert resp["status"] == "ok" + + # Caller's record with an INTEGER should FAIL (caller's schema says string only) + with pytest.raises(ToolError) as excinfo: + await create_record( + source="test", + tags=[], + description="wrong-type", + logical_key="rec-override-2", + schema_ref="schema:override", + schema_version="1.0.0", + content=42, + ) + err = json.loads(str(excinfo.value))["error"] + assert err["code"] == "validation_failed" + + +@pytest.mark.anyio +async def test_create_record_validation_truncation(configured_server, monkeypatch): + """When validate_record_content returns a truncated list, create_record reports truncation.""" + from mcp.server.fastmcp.exceptions import ToolError + + from mcp_awareness.tools import create_record, register_schema + + await register_schema( + source="test", + tags=[], + description="s", + family="schema:trunc", + version="1.0.0", + schema={"type": "object"}, + ) + + # Patch validate_record_content at the module level so the lazy local import picks it up + import mcp_awareness.validation as validation_mod + + fake_errors = [ + {"path": f"$.f{i}", "message": "err", "validator": "type", "schema_path": "/type"} + for i in range(50) + ] + [{"truncated": True, "total_errors": 99}] + + monkeypatch.setattr(validation_mod, "validate_record_content", lambda _s, _c: fake_errors) + + with pytest.raises(ToolError) as excinfo: + await create_record( + source="test", + tags=[], + description="truncated errors test", + logical_key="trunc-rec", + schema_ref="schema:trunc", + schema_version="1.0.0", + content={}, + ) + err = json.loads(str(excinfo.value))["error"] + assert err["code"] == "validation_failed" + assert err["truncated"] is True + assert err["total_errors"] == 99 + + +@pytest.mark.anyio +async def test_update_entry_record_schema_gone(configured_server, store): + """update_entry re-validation returns schema_not_found if schema was deleted.""" + from mcp.server.fastmcp.exceptions import ToolError + + from mcp_awareness.tools import create_record, register_schema, update_entry + + schema_resp = json.loads( + await register_schema( + source="test", + tags=[], + description="s", + family="schema:gone", + version="1.0.0", + schema={"type": "object", "properties": {"name": {"type": "string"}}}, + ) + ) + record_resp = json.loads( + await create_record( + source="test", + tags=[], + description="r", + logical_key="r-gone", + schema_ref="schema:gone", + schema_version="1.0.0", + content={"name": "ok"}, + ) + ) + + # Soft-delete the schema directly via store (bypasses the referencing-records guard) + store.soft_delete_by_id(TEST_OWNER, schema_resp["id"]) + + # Now updating the record's content should fail with schema_not_found + with pytest.raises(ToolError) as excinfo: + await update_entry(entry_id=record_resp["id"], content={"name": "updated"}) + err = json.loads(str(excinfo.value))["error"] + assert err["code"] == "schema_not_found" + + +@pytest.mark.anyio +async def test_update_entry_record_revalidation_truncation(configured_server, monkeypatch): + """update_entry re-validation reports truncation when many errors are returned.""" + from mcp.server.fastmcp.exceptions import ToolError + + from mcp_awareness.tools import create_record, register_schema, update_entry + + await register_schema( + source="test", + tags=[], + description="s", + family="schema:trunc2", + version="1.0.0", + schema={"type": "object"}, + ) + record_resp = json.loads( + await create_record( + source="test", + tags=[], + description="r", + logical_key="r-trunc2", + schema_ref="schema:trunc2", + schema_version="1.0.0", + content={}, + ) + ) + + # Patch validate_record_content at the module level so the lazy local import picks it up + import mcp_awareness.validation as validation_mod + + fake_errors = [ + {"path": f"$.f{i}", "message": "err", "validator": "type", "schema_path": "/type"} + for i in range(50) + ] + [{"truncated": True, "total_errors": 77}] + + monkeypatch.setattr(validation_mod, "validate_record_content", lambda _s, _c: fake_errors) + + with pytest.raises(ToolError) as excinfo: + await update_entry(entry_id=record_resp["id"], content={"bad": "data"}) + err = json.loads(str(excinfo.value))["error"] + assert err["code"] == "validation_failed" + assert err["truncated"] is True + assert err["total_errors"] == 77 diff --git a/tests/test_validation.py b/tests/test_validation.py new file mode 100644 index 0000000..4d7fb37 --- /dev/null +++ b/tests/test_validation.py @@ -0,0 +1,209 @@ +# mcp-awareness — ambient system awareness for AI agents +# Copyright (C) 2026 Chris Means +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU Affero General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU Affero General Public License for more details. +# +# You should have received a copy of the GNU Affero General Public License +# along with this program. If not, see . + +"""Tests for src/mcp_awareness/validation.py.""" + +from __future__ import annotations + +import jsonschema +import pytest + +from mcp_awareness.validation import ( + SchemaInUseError, + assert_schema_deletable, + compose_schema_logical_key, + resolve_schema, + validate_record_content, + validate_schema_body, +) + +_PERSON_SCHEMA = { + "type": "object", + "properties": { + "name": {"type": "string"}, + "age": {"type": "integer", "minimum": 0}, + }, + "required": ["name"], +} + + +def test_compose_schema_logical_key_basic(): + assert ( + compose_schema_logical_key("schema:edge-manifest", "1.0.0") == "schema:edge-manifest:1.0.0" + ) + + +def test_compose_schema_logical_key_no_prefix(): + assert compose_schema_logical_key("tag-taxonomy", "0.1.0") == "tag-taxonomy:0.1.0" + + +def test_validate_schema_body_accepts_valid_object_schema(): + schema = { + "type": "object", + "properties": {"name": {"type": "string"}}, + "required": ["name"], + } + validate_schema_body(schema) # must not raise + + +def test_validate_schema_body_rejects_bad_type(): + schema = {"type": "strng"} # typo: 'strng' is not a valid JSON Schema type + with pytest.raises(jsonschema.exceptions.SchemaError): + validate_schema_body(schema) + + +def test_validate_schema_body_accepts_empty_object(): + # Empty schema matches anything — valid per spec + validate_schema_body({}) + + +def test_validate_schema_body_rejects_non_dict(): + # Schemas must be objects; bare arrays fail meta-schema + with pytest.raises(jsonschema.exceptions.SchemaError): + validate_schema_body([{"type": "string"}]) # type: ignore[arg-type] + + +def test_validate_record_content_valid_returns_empty_list(): + assert validate_record_content(_PERSON_SCHEMA, {"name": "Alice", "age": 30}) == [] + + +def test_validate_record_content_surfaces_missing_required(): + errors = validate_record_content(_PERSON_SCHEMA, {"age": 30}) + assert len(errors) == 1 + assert errors[0]["validator"] == "required" + assert "name" in errors[0]["message"] + + +def test_validate_record_content_surfaces_all_errors(): + # Missing 'name' AND age is wrong type + errors = validate_record_content(_PERSON_SCHEMA, {"age": "thirty"}) + assert len(errors) == 2 + validators = {e["validator"] for e in errors} + assert validators == {"required", "type"} + + +def test_validate_record_content_is_sorted_by_path(): + schema = { + "type": "object", + "properties": { + "a": {"type": "integer"}, + "b": {"type": "integer"}, + "c": {"type": "integer"}, + }, + } + errors = validate_record_content(schema, {"a": "x", "b": "y", "c": "z"}) + paths = [e["path"] for e in errors] + assert paths == sorted(paths) + + +def test_validate_record_content_accepts_primitive_schema(): + schema = {"type": "integer"} + assert validate_record_content(schema, 42) == [] + errors = validate_record_content(schema, "abc") + assert len(errors) == 1 + assert errors[0]["validator"] == "type" + + +def test_validate_record_content_array_schema_with_index_paths(): + schema = {"type": "array", "items": {"type": "integer"}} + errors = validate_record_content(schema, [1, "two", 3, "four"]) + assert len(errors) == 2 + # Array indices should appear in paths + paths = [e["path"] for e in errors] + assert any("1" in p for p in paths) + assert any("3" in p for p in paths) + + +def test_validate_record_content_truncates_at_50(): + schema = { + "type": "array", + "items": {"type": "integer"}, + } + # 60 wrong-type items — all fail + result = validate_record_content(schema, ["x"] * 60) + assert isinstance(result, list) + # Truncation is carried via a special sentinel entry at the end + assert len(result) == 51 # 50 errors + 1 truncation marker + assert result[-1]["truncated"] is True + assert result[-1]["total_errors"] == 60 + + +class _StubStore: + """Minimal Store-like stub for validation unit tests. + + Records calls to find_schema and returns pre-configured results keyed by + (owner_id, logical_key). Only needs to implement find_schema; other Store + methods are never called by resolve_schema. + """ + + def __init__(self): + self._results: dict[tuple[str, str], object] = {} + self.calls: list[tuple[str, str]] = [] + + def set(self, owner_id: str, logical_key: str, result): + self._results[(owner_id, logical_key)] = result + + def find_schema(self, owner_id, logical_key): + self.calls.append((owner_id, logical_key)) + return self._results.get((owner_id, logical_key)) + + +def test_resolve_schema_delegates_to_find_schema(): + stub = _StubStore() + sentinel = object() + stub.set("alice", "s:test:1.0.0", sentinel) + result = resolve_schema(stub, "alice", "s:test", "1.0.0") + assert result is sentinel + + +def test_resolve_schema_returns_none_when_missing(): + stub = _StubStore() + assert resolve_schema(stub, "alice", "s:nope", "1.0.0") is None + + +def test_resolve_schema_composes_logical_key_correctly(): + """Confirms family+version are composed via compose_schema_logical_key.""" + stub = _StubStore() + resolve_schema(stub, "alice", "schema:edge-manifest", "2.3.4") + assert stub.calls == [("alice", "schema:edge-manifest:2.3.4")] + + +class _CounterStore: + """Stub exposing count_records_referencing for assert_schema_deletable tests.""" + + def __init__(self, count: int, ids: list[str]): + self._count = count + self._ids = ids + + def count_records_referencing(self, owner_id, schema_logical_key): + return (self._count, self._ids) + + +def test_assert_schema_deletable_passes_with_zero_refs(): + # Must not raise + assert_schema_deletable(_CounterStore(0, []), "alice", "s:test:1.0.0") + + +def test_assert_schema_deletable_raises_with_refs(): + with pytest.raises(SchemaInUseError) as excinfo: + assert_schema_deletable(_CounterStore(3, ["id1", "id2", "id3"]), "alice", "s:test:1.0.0") + assert excinfo.value.total_count == 3 + assert excinfo.value.referencing_records == ["id1", "id2", "id3"] + + +def test_schema_in_use_error_has_readable_message(): + err = SchemaInUseError(total_count=5, referencing_records=["a", "b"]) + assert "5" in str(err)