Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
885ede7
docs: add schema + record entry types design spec
Apr 14, 2026
8695f62
docs: add schema + record entry types implementation plan
Apr 14, 2026
a59c76d
feat: add SCHEMA and RECORD to EntryType enum
Apr 14, 2026
20ae90c
chore: add jsonschema>=4.26.0 dependency
Apr 14, 2026
1790f74
feat: add validation module with compose_schema_logical_key
Apr 14, 2026
8d7f3a6
feat: add validate_schema_body for Draft 2020-12 meta-schema check
Apr 14, 2026
e37b523
feat: add validate_record_content with iter_errors and truncation
Apr 14, 2026
51d5a97
feat: add Store.find_schema with _system fallback
Apr 14, 2026
e2dc055
feat: add Store.count_records_referencing for schema deletion protection
Apr 14, 2026
1b7bc06
feat: add validation.resolve_schema delegating to Store.find_schema
Apr 14, 2026
524cfe3
feat: add assert_schema_deletable and SchemaInUseError
Apr 14, 2026
ab50543
feat: add migration seeding _system user for shared schemas
Apr 14, 2026
5f93485
feat: add register_schema MCP tool
Apr 14, 2026
d4effbf
feat: add create_record MCP tool with schema validation and _system f…
Apr 14, 2026
65ee9b8
refactor: extend _error_response to accept **extras
Apr 14, 2026
61197a4
feat: update_entry enforces schema immutability + record re-validation
Apr 14, 2026
9f8f4c5
feat: delete_entry protects schemas referenced by live records
Apr 14, 2026
e021e6b
feat: add mcp-awareness-register-schema CLI for _system schemas
Apr 14, 2026
2e02bd6
test: cross-owner isolation and _system override semantics for schema…
Apr 14, 2026
a1126dd
docs: document schema/record entry types, new tools, and CLI
Apr 14, 2026
f5b4d75
style: ruff format + lint fixes (SIM102, E501, I001, F401)
Apr 14, 2026
2865d01
fix: resolve mypy type errors for jsonschema import and union-attr
Apr 14, 2026
2bead4f
test: add coverage tests for truncation, schema-gone, bad-json, store…
Apr 14, 2026
33f7780
fix: ci — switch async tests to @pytest.mark.anyio; lint/type fixes
Apr 14, 2026
adfb145
test: cover __main__ guard in cli_register_schema via runpy
Apr 14, 2026
0e2e629
fix: address PR #287 QA Round 1 findings
Apr 14, 2026
e8affb9
fix: address PR #287 QA Round 2 findings
Apr 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,26 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added
- Two new entry types: `schema` (JSON Schema Draft 2020-12 definition) and `record` (validated payload conforming to a schema). Tools: `register_schema`, `create_record`. Schemas are absolutely immutable after registration; records re-validate on content update. Schema deletion is blocked while live records reference a version. Per-owner storage with a shared `_system` fallback namespace for built-in schemas.
- New CLI: `mcp-awareness-register-schema` for operators to seed `_system`-owned schemas at deploy time.
- New migration: `_system` user seed (idempotent).
- `_error_response()` helper now accepts `**extras` kwargs so tools can include structured fields in error envelopes beyond the fixed set (e.g., `validation_errors`, `schema_ref`, `referencing_records`).

### Fixed
- **RLS carve-out for `_system`-owned schema reads** — migration `n9i0j1k2l3m4` alters the `entries.owner_isolation` policy `USING` clause to also allow reads where `owner_id = '_system' AND type = 'schema'`. Before this change, the strict `owner_id = current_user` USING clause filtered out `_system`-owned rows for non-superuser DB roles, making the `find_schema` fallback (and the whole CLI bootstrap pattern for built-in schemas) a no-op in production. Writes remain isolated — the `owner_insert` `WITH CHECK` policy still requires `owner_id = current_user`, so no one can write to `_system` via the MCP path.
- **`schema_already_exists` error envelope** — register_schema now returns `logical_key` and a best-effort `existing_id` as structured fields alongside the human-readable message (matches the design-doc error-code table; callers no longer have to parse the message to locate the conflicting entry).
- **RECORD content shape preserved across `update_entry`** — previously `update_entry` stringified non-string content via `json.dumps()` before handing it to the store, causing RECORD entries to drift from a native JSON object/array/primitive (how they are stored on create) to a JSON-encoded string after any content update. `update_entry` now branches on the existing entry's type: RECORD entries persist native JSON content to match the create path, while other knowledge types (note / pattern / context / preference) retain the existing stringify-on-write behavior.
- **Bulk-delete paths (`delete_entry` by tags/source) still do not consult `schema_in_use`** — single-id schema deletion is protected; bulk paths are explicitly flagged in the code and tracked by [#288](https://github.com/cmeans/mcp-awareness/issues/288). Not changed in this PR (out of scope per the design), but documented where the gap lives.
- **`count_records_referencing` store boundary hardening** — `schema_logical_key` parsing now asserts the `ref:version` invariant (non-empty ref + non-empty version after the last `:`). Empty version is blocked at `register_schema`, but the store API is public, so we fail loudly here rather than silently running a non-matching query.
- **`_system` user downgrade no-op** — `m8h9i0j1k2l3.downgrade()` now checks for referencing entries before `DELETE`. If any exist, it logs a warning and returns rather than FK-failing the whole transaction. Operators can soft-delete or re-home the referenced entries and re-run downgrade.
- **CLI language resolution** — `mcp-awareness-register-schema` now runs the description through `resolve_language()` (same chain as the MCP path) instead of pinning every CLI-seeded schema to `english`. Auto-detection falls back to `simple` for short/unknown-language descriptions.
- **Dead-code cleanup in `register_schema`** — removed the string-matching fallback (`"unique"` / `"duplicate"` / `"23505"` in the exception message) in favor of the psycopg-native `UniqueViolation` branch. The fallback was unreachable under the `psycopg`-direct driver the project uses.
- **Mypy override cleanup** — dropped the no-op `ignore_errors = true` from the `jsonschema.*` override in `pyproject.toml`. `ignore_missing_imports = true` alone covers the import; there is no project code under `jsonschema.*` to silence.

### Dependencies
- Added `jsonschema>=4.26.0` as a runtime dependency.

## [0.17.0] - 2026-04-13

### Added
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -282,7 +282,7 @@ Results from the initial run (2026-03-27): HNSW query P50 stays under 4ms from 5

## Tools

The server exposes 30 MCP tools. Clients that support MCP resources also get 6 read-only resources, but since not all clients surface resources, every resource has a tool mirror.
The server exposes 32 MCP tools. Clients that support MCP resources also get 6 read-only resources, but since not all clients surface resources, every resource has a tool mirror.

### Read tools

Expand Down Expand Up @@ -318,6 +318,8 @@ The server exposes 30 MCP tools. Clients that support MCP resources also get 6 r
| `remind` | Create a todo, reminder, or planned action. Optional `deliver_at` timestamp for time-based surfacing. Intentions have a lifecycle: pending → fired → active → completed. |
| `update_intention` | Transition an intention state: pending → fired → active → completed/snoozed/cancelled. |
| `acted_on` | Log that you took action because of an entry. Tags inherited from the entry. |
| `register_schema` | Define a typed data contract using JSON Schema Draft 2020-12. Schemas are immutable after registration; family + version become logical_key. Per-owner with `_system` fallback for shared built-in shapes. |
| `create_record` | Write a validated data entry conforming to a registered schema. Records pin exact schema version and re-validate on content update. Validation errors include every failure with structured envelope. |

### Data management tools

Expand Down
73 changes: 73 additions & 0 deletions alembic/versions/m8h9i0j1k2l3_add_system_user_for_schemas.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# mcp-awareness — ambient system awareness for AI agents
# Copyright (C) 2026 Chris Means
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.

"""add _system user for system-owned schemas

Revision ID: m8h9i0j1k2l3
Revises: l7g8h9i0j1k2
Create Date: 2026-04-13 00:00:00.000000

"""

from __future__ import annotations

import logging
from collections.abc import Sequence

from alembic import op

logger = logging.getLogger("alembic.runtime.migration")

revision: str = "m8h9i0j1k2l3"
down_revision: str | Sequence[str] | None = "l7g8h9i0j1k2"
branch_labels: str | Sequence[str] | None = None
depends_on: str | Sequence[str] | None = None


def upgrade() -> None:
"""Seed the _system user for system-owned schema entries.

Idempotent — ON CONFLICT DO NOTHING lets the migration run multiple
times safely (e.g., after a stamp-and-reapply).
"""
op.execute(
"INSERT INTO users (id, display_name) "
"VALUES ('_system', 'System-managed schemas') "
"ON CONFLICT (id) DO NOTHING"
)


def downgrade() -> None:
"""Remove the _system user, if safe to do so.

This downgrade is a no-op when `_system`-owned entries still exist (schemas
seeded via ``mcp-awareness-register-schema --system``, for example). A hard
DELETE would FK-fail and abort the entire downgrade transaction — masking
any subsequent downgrade steps from surfacing. The warning surfaces the
manual step required: operators who really want to remove `_system` must
first soft-delete or re-home the referenced entries, then re-run downgrade.
"""
conn = op.get_bind()
referenced = conn.exec_driver_sql(
"SELECT 1 FROM entries WHERE owner_id = '_system' LIMIT 1"
).fetchone()
if referenced is not None:
logger.warning(
"Skipping delete of users._system — entries still reference it. "
"Soft-delete or re-home those entries, then re-run downgrade."
)
return
op.execute("DELETE FROM users WHERE id = '_system'")
75 changes: 75 additions & 0 deletions alembic/versions/n9i0j1k2l3m4_rls_allow_system_schema_reads.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# mcp-awareness — ambient system awareness for AI agents
# Copyright (C) 2026 Chris Means
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.

"""RLS: allow all owners to read _system-owned schema entries

Revision ID: n9i0j1k2l3m4
Revises: m8h9i0j1k2l3
Create Date: 2026-04-14 00:00:00.000000

The owner_isolation SELECT/UPDATE/DELETE policy on entries was

USING (owner_id = current_setting('app.current_user', true))

which — under FORCE ROW LEVEL SECURITY for non-superuser roles — filters
out `_system`-owned rows. That blocks the schema-fallback design for
built-in schemas registered via ``mcp-awareness-register-schema --system``
because the `find_schema` query's ``owner_id IN (%s, '_system')`` clause is
evaluated AFTER RLS strips the `_system` row.

This migration narrows the read carve-out to `_system`-owned *schema* rows
only. Writes remain isolated by the existing `owner_insert` WITH CHECK
policy (which still requires `owner_id = current_user`), so operators
cannot accidentally write to `_system` via the MCP path — the CLI
(`mcp-awareness-register-schema --system`) bypasses MCP and connects as
whichever DB role the operator chose.

Rationale: option 1 from the PR #287 Round-2 QA review (narrowest
change, read-only carve-out, no SECURITY DEFINER functions needed).
"""

from __future__ import annotations

from collections.abc import Sequence

from alembic import op

revision: str = "n9i0j1k2l3m4"
down_revision: str | Sequence[str] | None = "m8h9i0j1k2l3"
branch_labels: str | Sequence[str] | None = None
depends_on: str | Sequence[str] | None = None


def upgrade() -> None:
"""Replace the owner_isolation policy on `entries` to allow reads of
`_system`-owned schema rows from any owner context."""
op.execute("DROP POLICY IF EXISTS owner_isolation ON entries")
op.execute("""
CREATE POLICY owner_isolation ON entries
USING (
owner_id = current_setting('app.current_user', true)
OR (owner_id = '_system' AND type = 'schema')
)
""")


def downgrade() -> None:
"""Restore the strict-isolation policy on `entries`."""
op.execute("DROP POLICY IF EXISTS owner_isolation ON entries")
op.execute("""
CREATE POLICY owner_isolation ON entries
USING (owner_id = current_setting('app.current_user', true))
""")
30 changes: 29 additions & 1 deletion docs/data-dictionary.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ The UNIQUE constraint is on `canonical_email`, not `email`. Users see and use th
|--------|------|----------|-------------|
| `id` | TEXT | No | Primary key. UUID v4, generated via `uuid.uuid4()`. |
| `owner_id` | TEXT | No | Owner identifier. References the user who owns this entry. All queries are scoped by `owner_id`. |
| `type` | TEXT | No | Entry type. One of: `status`, `alert`, `pattern`, `suppression`, `context`, `preference`, `note`, `intention`. |
| `type` | TEXT | No | Entry type. One of: `status`, `alert`, `pattern`, `suppression`, `context`, `preference`, `note`, `intention`, `schema`, `record`. |
| `source` | TEXT | No | Origin identifier. Describes the subject, not the owner (e.g., `"personal"`, `"synology-nas"`, `"mcp-awareness-project"`). |
| `created` | TIMESTAMPTZ | No | UTC timestamp. Set once when the entry is first created. |
| `updated` | TIMESTAMPTZ | Yes | UTC timestamp. Updated on every upsert or `update_entry` call. `NULL` until first update. |
Expand Down Expand Up @@ -177,6 +177,34 @@ Written by agents via `set_preference`. Keyed by `key` + `scope` (upserted). Por
| `value` | string | Yes | Preference value (e.g., `"one_sentence_warnings"`, `"first_turn_only"`). |
| `scope` | string | Yes | Scope of the preference. Default: `"global"`. |

### `schema` — JSON Schema definitions

Written by operators or agents via `register_schema`. Immutable after registration. Schema body lives in `data.schema`; family + version in `data.family` + `data.version`; `logical_key` derived as `{family}:{version}`. Used by `record` entries for typed validation.

**`data` fields:**

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `family` | string | Yes | Schema family identifier (e.g., `schema:edge-manifest`, `schema:config`). Used as the reference key. |
| `version` | string | Yes | Schema version (user-chosen semantic or sequential, e.g., `"1.0.0"`, `"1"`). |
| `schema` | object | Yes | JSON Schema Draft 2020-12 body. Defines the validation rules and structure. |
| `description` | string | No | Human-readable description of what this schema validates. |
| `learned_from` | string | No | Platform that registered the schema (e.g., `"claude-code"`, `"operator"`). Default: `"conversation"`. |

### `record` — Validated data entries

Written by agents via `create_record`. Content in `data.content`; pinned schema reference in `data.schema_ref` + `data.schema_version` (exact version, no "latest" aliasing). Re-validated on content update via `update_entry`.

**`data` fields:**

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `schema_ref` | string | Yes | Target schema family (e.g., `schema:edge-manifest`). Used to look up the schema definition. |
| `schema_version` | string | Yes | Target schema version (exact pin, e.g., `"1.0.0"`). Pinned at write time; determines which schema is used for validation on updates. |
| `content` | any | Yes | Validated payload — any JSON value (object, array, string, number, boolean, null). Must conform to the pinned schema. |
| `description` | string | No | Human-readable description of what this record represents. |
| `learned_from` | string | No | Platform that created the record (e.g., `"claude-code"`, edge provider name). Default: `"conversation"`. |

## Lifecycle

- **Upsert behavior:** `status` entries are upserted by `source`. `alert` entries by `source` + `alert_id`. `preference` entries by `key` + `scope`. Other types always insert new rows.
Expand Down
Loading