From 7cf53dc7bebc5015a47d76275056e7468fadb324 Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Sun, 16 Nov 2025 15:56:34 -0800 Subject: [PATCH 01/25] feat(adr): add ADR-0004 for hybrid privacy model --- docs/decisions/ADR-0004/DECISION.md | 167 ++++++++++++++++++ docs/decisions/README.md | 13 +- schemas/v1/privacy/opaque_pointer.schema.json | 48 +++++ 3 files changed, 221 insertions(+), 7 deletions(-) create mode 100644 docs/decisions/ADR-0004/DECISION.md create mode 100644 schemas/v1/privacy/opaque_pointer.schema.json diff --git a/docs/decisions/ADR-0004/DECISION.md b/docs/decisions/ADR-0004/DECISION.md new file mode 100644 index 00000000..00a7939c --- /dev/null +++ b/docs/decisions/ADR-0004/DECISION.md @@ -0,0 +1,167 @@ +--- +Status: Accepted +Date: 2025-11-09 +ADR: ADR-0004 +Authors: [flyingrobots] +Requires: [ADR-0001] +Related: [ADR-0002, ADR-0003] +Tags: [Privacy, Projection, Opaque Pointers] +Schemas: + - schemas/v1/privacy/opaque_pointer.schema.json +Supersedes: [] +Superseded-By: [] +--- + +# ADR‑0004: Hybrid Privacy Model (Public Projection + Private Overlay) + +## Scope + +Define a **hybrid privacy** model in which the State Plane produces: +1) a **PublicState** that is pushable and shareable, and +2) a **Private Overlay** that is stored locally (or on a private node) and referenced from the public state via **opaque pointers**. + +## Rationale + +The original model envisioned using a local, out‑of‑repo directory for private data and committing only redacted or pointerized state to Git. This ADR makes that pattern **normative** and **deterministic**: +- Public state remains globally verifiable. +- Sensitive details live in a private overlay but are **addressable** and **auditable** via content hashes. + +## Decision + +### 1. Actor‑Anchored Private Namespace (normative) + +Private overlays are rooted in an **actor identity**, not an ad‑hoc “session”. + +- **Actor ID:** `ed25519:` that resolves in the trust graph. +- **On‑disk refs (private):** + ``` + refs/gatos/private/// + refs/gatos/private//sessions/// # OPTIONAL ephemeral overlays + ``` +- **On‑disk refs (public):** + ``` + refs/gatos/state/public// + ``` + +> The prior “`` at namespace root” concept is deprecated. If you need per‑process isolation, use `sessions/` under the owning ``. + +### 2. Opaque Pointers (normative) + +Where private data is elided from PublicState, emit a canonical JSON **opaque pointer** envelope: + +```json +{ + "kind": "opaque_pointer", + "algo": "blake3", + "digest": "blake3:<64hex>", // blake3 hash of the raw, unencrypted private blob + "size": 12345, // OPTIONAL byte size + "location": "gatos-node://ed25519:", // where to ask + "capability": "gatos-key://v1/aes-256-gcm/" // how to authorize/decrypt +} +``` + +- `location` MUST be a URI. Reserved schemes include: + - `gatos-node://ed25519:` — resolve endpoint(s) via trust graph. + - `file:///...` — local file path (dev/test only). + - `https://...` — HTTPS object store. + - `s3://bucket/key` — S3‑style store. + - `ipfs://` — IPFS address. +- `capability` MUST be a URI. Reserved schemes include: + - `gatos-key://v1/aes-256-gcm/` + - `kms://aws//keys/` + - `age://` / `sops://` +- Canonical JSON (UTF‑8, sorted keys, no insignificant whitespace). The digest of the pointer envelope itself (its **content_id**) is `blake3(canonical_bytes)`. + +**Schema:** `schemas/v1/privacy/opaque_pointer.schema.json` (see repo changes below). + +### 3. Projection Function (normative) + +The Privacy Policy declares **redact/pointerize rules**. The projection: +- takes a **UnifiedState** (contains both public + private), +- transforms it into **PublicState** + a set of **Private Blobs**, and +- commits PublicState; persists blobs at `location` with digest = `blake3(bytes)`. + +```mermaid +flowchart LR + U[UnifiedState] -->|apply privacy rules| P[PublicState + Pointers] + U -->|extract| B[(Private Blobs)] + P -->|commit| G[Ledger Repo] + B -->|store at location| S[(Private Store)] +``` + +### 4. Pointer Resolution (normative, minimal) + +A resolver MUST: +1. Parse `location`. +2. If `gatos-node://ed25519:`: + - Read trust graph entry for `` and obtain `endpoints: [uri]`. + - Attempt `GET {endpoint}/.well-known/gatos/private/{digest}` (exact path shape MAY be extended in a future ADR). +3. If `https|s3|ipfs|file`, use the obvious client. +4. If a `capability` is present: + - Use scheme to select decryption/authorization mechanism. + - Fetch and decrypt the content. Verify that the `blake3` hash of the resulting plaintext bytes matches the `digest` from the pointer. If it does not match, the resolution **MUST FAIL**. + +> This ADR standardizes **envelopes and verification**. The `.well-known` fetch API shape is reserved for a future ADR; implementations may use compatible private APIs short‑term. + +### 5. Policy Hooks (normative) + +Extend `.gatos/policy.yaml`: + +```yaml +privacy: + rules: + - select: "paths.config.secrets.*" + action: "pointerize" + capability: "gatos-key://v1/aes-256-gcm/key-ops-01" + location: "gatos-node://ed25519:" + - select: "attachments.*" + action: "redact" # removes node entirely from PublicState +``` + +### 6. Auditability (normative) + +- Public commits MUST include trailers summarizing redactions/pointers: + ``` + Privacy-Redactions: N + Privacy-Pointers: M + ``` +- Private stores SHOULD keep an index ` -> metadata` for operator inspection. +- Verifiers MUST check that any dereferenced pointer’s content hash equals `digest`. + +### 7. Security Considerations + +- Never embed plaintext secrets in PublicState. Pointer envelopes do **not** leak bytes. +- If `location` is remote and `capability` is non‑null, deny fetch if capability can’t be resolved or verified. +- The trust graph entry for a node SHOULD declare endpoint URIs and allowed capability schemes. +- The `capability` mechanism implies a dependency on a robust and secure key management system. This ADR does not specify the architecture of such a system, but implementations MUST ensure that key access is strictly controlled and auditable. + +### 8. Compatibility + +- Existing per‑process “session” overlays can migrate to `refs/gatos/private//sessions//...` with no behavioral change to the projection. + +### Diagrams + +#### Sequence: Project & Resolve + +```mermaid +sequenceDiagram + participant State as State Engine (echo) + participant Policy as Policy Engine + participant Ledger as Ledger (Git) + participant Node as Private Node + participant Client + participant KMS as Key Mgmt. System + + State->>Policy: Apply privacy rules + Policy-->>State: Redact + replace with OpaquePointers + State->>Ledger: Commit PublicState (+ trailers) + State->>Node: Store Private Blobs by digest + Client->>Ledger: Read PublicState + Client->>Client: See OpaquePointer(digest, location, capability) + Client->>Node: GET /.well-known/gatos/private/{digest} + Node-->>Client: Encrypted blob + Client->>KMS: Resolve capability (e.g., fetch key) + KMS-->>Client: Decryption key + Client->>Client: Decrypt blob with key + Client->>Client: Verify blake3(plaintext) == digest ? OK : FAIL +``` \ No newline at end of file diff --git a/docs/decisions/README.md b/docs/decisions/README.md index cd36682d..ab9fe36d 100644 --- a/docs/decisions/README.md +++ b/docs/decisions/README.md @@ -26,10 +26,9 @@ Each ADR will have a status, typically one of the following: ## Decision Log - - -| ID | Title | Status | Date | -| :--------------------------------- | :---------------------------------------------------- | :------- | :--------- | -| [ADR-0001](./ADR-0001/DECISION.md) | Split gatos-ledger into no\_std Core and std Backends | Accepted | 2025-11-08 | -| [ADR-0002](./ADR-0002/DECISION.md) | Distributed Compute via a Job Plane | Accepted | 2025-11-08 | -| [ADR-0003](./ADR-0003/DECISION.md) | Consensus Governance for Gated Actions | Accepted | 2025-11-08 | +| ID | Title | Status | Date | +|:---|:---|:---|:---| +| [ADR-0001](./ADR-0001/DECISION.md) | Split gatos-ledger into no_std Core and std Backends | Accepted | 2025-11-08 | +| [ADR-0002](./ADR-0002/DECISION.md) | Distributed Compute via a Job Plane | Accepted | 2025-11-08 | +| [ADR-0003](./ADR-0003/DECISION.md) | Consensus Governance for Gated Actions | Accepted | 2025-11-08 | +| [ADR-0004](./ADR-0004/DECISION.md) | Hybrid Privacy Model (Public Projection + Private Overlay) | Accepted | 2025-11-09 | \ No newline at end of file diff --git a/schemas/v1/privacy/opaque_pointer.schema.json b/schemas/v1/privacy/opaque_pointer.schema.json new file mode 100644 index 00000000..fae379fb --- /dev/null +++ b/schemas/v1/privacy/opaque_pointer.schema.json @@ -0,0 +1,48 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "$id": "https://gatos.io/schemas/v1/privacy/opaque_pointer.schema.json", + "title": "Opaque Pointer", + "description": "Schema for an opaque pointer envelope, referencing private data.", + "type": "object", + "required": [ + "kind", + "algo", + "digest", + "location" + ], + "properties": { + "kind": { + "type": "string", + "description": "Indicates the type of the pointer.", + "const": "opaque_pointer" + }, + "algo": { + "type": "string", + "description": "The hashing algorithm used for the digest.", + "enum": ["blake3"] + }, + "digest": { + "type": "string", + "description": "Content-address of the private blob, prefixed with the algorithm (e.g., 'blake3:<64hex>').", + "pattern": "^blake3:[0-9a-fA-F]{64}$" + }, + "size": { + "type": "integer", + "description": "Optional: Byte size of the private blob.", + "minimum": 0 + }, + "location": { + "type": "string", + "description": "URI indicating where to retrieve the private blob.", + "format": "uri", + "pattern": "^(gatos-node://ed25519:[0-9a-fA-F]{64}|file:///|https://|s3://|ipfs://).*" + }, + "capability": { + "type": "string", + "description": "URI indicating how to authorize/decrypt the private blob.", + "format": "uri", + "pattern": "^(gatos-key://v1/aes-256-gcm/|kms://aws/|age://|sops://).*" + } + }, + "additionalProperties": false +} From c2e652755103dfd60055d76aee5c760391d31d7e Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Sun, 16 Nov 2025 15:58:35 -0800 Subject: [PATCH 02/25] docs(adr): enhance ADR-0004 with clarifications and alternatives considered --- docs/decisions/ADR-0004/DECISION.md | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/docs/decisions/ADR-0004/DECISION.md b/docs/decisions/ADR-0004/DECISION.md index 00a7939c..0059f5e5 100644 --- a/docs/decisions/ADR-0004/DECISION.md +++ b/docs/decisions/ADR-0004/DECISION.md @@ -26,6 +26,30 @@ The original model envisioned using a local, out‑of‑repo directory for priva - Public state remains globally verifiable. - Sensitive details live in a private overlay but are **addressable** and **auditable** via content hashes. +## Alternatives Considered + +Several alternative approaches to managing privacy and sensitive data were considered: + +* **1. Pure Public State (No Privacy Model):** + * **Description:** All data, regardless of sensitivity, is committed directly to the Git repository and is part of the `PublicState`. + * **Reason for Rejection:** This approach fundamentally contradicts the requirement to handle sensitive data (e.g., PII, large datasets) and maintain confidentiality. It would render GATOS unsuitable for many real-world applications that necessitate data privacy. + +* **2. Pure Private State (No Public Projection):** + * **Description:** All state is kept entirely private, with only cryptographic proofs or attestations made public. + * **Reason for Rejection:** While offering maximum privacy, this model would significantly diminish the verifiability and auditability of the system. It would impede collaboration and sharing of non-sensitive data and likely increase the complexity of proving computational correctness without revealing any underlying data. GATOS aims for a balanced approach between privacy and verifiability. + +* **3. Ad-hoc Private Data Storage (Original Model):** + * **Description:** Private data is stored in local, out-of-repo directories, and redacted/pointerized state is committed to Git, but without a normative, deterministic framework. + * **Reason for Rejection:** This represents the "original model" described in the Rationale. Its primary drawbacks are a lack of determinism, auditability, and a standardized mechanism for addressing and resolving private data. Reliance on ad-hoc solutions can lead to inconsistencies, operational errors, and makes system evolution difficult. This ADR specifically aims to formalize and standardize this pattern. + +* **4. Encrypted Blobs Directly in Git:** + * **Description:** Instead of opaque pointers, encrypted blobs containing sensitive data are committed directly to the Git repository. + * **Reason for Rejection:** This approach, while technically feasible, introduces several significant issues: + * **Repository Bloat:** Committing large encrypted blobs would drastically increase repository size, leading to slow cloning, fetching, and other Git operations. + * **Key Management Complexity:** Changing encryption keys would necessitate rewriting Git history or re-encrypting and re-committing substantial portions of the repository, which is impractical. + * **Limited Access Control:** Git's native access control operates at the repository level. It would be challenging to implement granular access control for specific encrypted blobs without granting access to the entire repository. + * **Public Metadata Leakage:** Even if encrypted, the mere presence, size, and commit history of these blobs could inadvertently leak sensitive metadata. Opaque pointers offer finer-grained control over what metadata is exposed publicly. + ## Decision ### 1. Actor‑Anchored Private Namespace (normative) From 6e74200ed12a1409486dc1145b74c9e8a33b3554 Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Sun, 16 Nov 2025 16:08:22 -0800 Subject: [PATCH 03/25] feat(adr): add ADR-0005 for shiplog event stream --- docs/decisions/ADR-0005/DECISION.md | 74 +++++++++++++++++++ docs/decisions/README.md | 3 +- .../shiplog/consumer_checkpoint.schema.json | 23 ++++++ schemas/v1/shiplog/event_envelope.schema.json | 49 ++++++++++++ 4 files changed, 148 insertions(+), 1 deletion(-) create mode 100644 docs/decisions/ADR-0005/DECISION.md create mode 100644 schemas/v1/shiplog/consumer_checkpoint.schema.json create mode 100644 schemas/v1/shiplog/event_envelope.schema.json diff --git a/docs/decisions/ADR-0005/DECISION.md b/docs/decisions/ADR-0005/DECISION.md new file mode 100644 index 00000000..2c83c229 --- /dev/null +++ b/docs/decisions/ADR-0005/DECISION.md @@ -0,0 +1,74 @@ +--- +Status: Proposed +Date: 2025-11-09 +ADR: ADR-0005 +Authors: [flyingrobots] +Requires: [ADR-0001] +Related: [ADR-0002, ADR-0003] +Tags: [Shiplog, Event Stream, Consumers] +Schemas: + - ../../../../schemas/v1/shiplog/event_envelope.schema.json + - ../../../../schemas/v1/shiplog/consumer_checkpoint.schema.json +Supersedes: [] +Superseded-By: [] +--- + +## ADR-0005: Shiplog — A Parallel, Queryable Event Stream + +### Scope + +Introduce a **first-class, append-only event stream** ("shiplog") that runs in parallel with snapshot state folds. Provide queryability, consumer checkpoints, and causal ordering for integrations. + +### Rationale + +Problem: SPEC currently emphasizes state snapshots; many use-cases need a **stream** (integration, analytics, replay to external systems). +Context: The origin convo proposed a dedicated, queryable append-only log. + +### Decision + +1. **Shiplog namespaces** + +refs/gatos/shiplog//head # commit parent-chain per topic +refs/gatos/consumers// # checkpoints (by ULID) + +2. **Event envelope (normative)** +Canonical JSON with a ULID and canonical `content_id`: + +{ +“ulid”: “<26-char ULID>”, +“ns”: “”, # e.g., “governance” +“type”: “”, +“payload”: { … }, # canonical JSON +“refs”: { “state”: “blake3:…”, “proposal_id”: “blake3:…” } # OPTIONAL cross-refs +} + +Each shiplog commit message MUST include: + +Event-Id: ulid: +Content-Id: blake3: + +3. **Ordering** +- Per‑topic order is the Git parent chain order; ULIDs MUST be strictly monotonic per topic on a single node. +- Cross-topic causality is not guaranteed; consumers can join via `refs`. + +4. **Consumers** +- Consumers store per‑topic checkpoints under `refs/gatos/consumers//`. +- Checkpoint value is the last processed `ulid` (and optionally commit). + +5. **Queries** +- `gatos-mind` MUST support `shiplog.read(topic, since_ulid, limit)` returning canonical envelopes and commit ids. +- Bus bridge MAY mirror `shiplog` events onto message topics (configurable). + +6. **Interaction with Ledger** +- Ledger events MAY be mirrored into shiplog automatically. +- Governance transitions (ADR‑0003) SHOULD emit shiplog events in the `governance` topic. + +### Consequences + +**Pros**: Clean integration surface; replay; analytics; stable consumer checkpoints. +**Cons**: More refs to manage; duplication if mirroring ledger events. + +### Security Considerations + +- Don’t emit private overlay data (see ADR‑0004). +- Consumers’ checkpoints are not authoritative; they’re advisory markers. \ No newline at end of file diff --git a/docs/decisions/README.md b/docs/decisions/README.md index ab9fe36d..8a6aad78 100644 --- a/docs/decisions/README.md +++ b/docs/decisions/README.md @@ -31,4 +31,5 @@ Each ADR will have a status, typically one of the following: | [ADR-0001](./ADR-0001/DECISION.md) | Split gatos-ledger into no_std Core and std Backends | Accepted | 2025-11-08 | | [ADR-0002](./ADR-0002/DECISION.md) | Distributed Compute via a Job Plane | Accepted | 2025-11-08 | | [ADR-0003](./ADR-0003/DECISION.md) | Consensus Governance for Gated Actions | Accepted | 2025-11-08 | -| [ADR-0004](./ADR-0004/DECISION.md) | Hybrid Privacy Model (Public Projection + Private Overlay) | Accepted | 2025-11-09 | \ No newline at end of file +| [ADR-0004](./ADR-0004/DECISION.md) | Hybrid Privacy Model (Public Projection + Private Overlay) | Accepted | 2025-11-09 | +| [ADR-0005](./ADR-0005/DECISION.md) | Shiplog — A Parallel, Queryable Event Stream | Proposed | 2025-11-09 | \ No newline at end of file diff --git a/schemas/v1/shiplog/consumer_checkpoint.schema.json b/schemas/v1/shiplog/consumer_checkpoint.schema.json new file mode 100644 index 00000000..79bb3e14 --- /dev/null +++ b/schemas/v1/shiplog/consumer_checkpoint.schema.json @@ -0,0 +1,23 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "$id": "https://gatos.io/schemas/v1/shiplog/consumer_checkpoint.schema.json", + "title": "Shiplog Consumer Checkpoint", + "description": "Schema for a shiplog consumer checkpoint.", + "type": "object", + "required": [ + "ulid" + ], + "properties": { + "ulid": { + "type": "string", + "description": "The ULID of the last processed event.", + "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" + }, + "commit": { + "type": "string", + "description": "The Git commit hash of the last processed event.", + "pattern": "^[0-9a-f]{40}$" + } + }, + "additionalProperties": false +} diff --git a/schemas/v1/shiplog/event_envelope.schema.json b/schemas/v1/shiplog/event_envelope.schema.json new file mode 100644 index 00000000..6ef1c649 --- /dev/null +++ b/schemas/v1/shiplog/event_envelope.schema.json @@ -0,0 +1,49 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "$id": "https://gatos.io/schemas/v1/shiplog/event_envelope.schema.json", + "title": "Shiplog Event Envelope", + "description": "Schema for a shiplog event envelope.", + "type": "object", + "required": [ + "ulid", + "ns", + "type", + "payload" + ], + "properties": { + "ulid": { + "type": "string", + "description": "The 26-character ULID of the event.", + "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" + }, + "ns": { + "type": "string", + "description": "The namespace of the event (e.g., 'governance')." + }, + "type": { + "type": "string", + "description": "The type of the event (e.g., 'proposal.created')." + }, + "payload": { + "type": "object", + "description": "The event payload, which should be canonical JSON.", + "additionalProperties": true + }, + "refs": { + "type": "object", + "description": "Optional cross-references to other objects.", + "properties": { + "state": { + "type": "string", + "pattern": "^blake3:[0-9a-fA-F]{64}$" + }, + "proposal_id": { + "type": "string", + "pattern": "^blake3:[0-9a-fA-F]{64}$" + } + }, + "additionalProperties": true + } + }, + "additionalProperties": false +} From e6b1fc2e73ef520b358de964e229b8e38e27a259 Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Sun, 16 Nov 2025 16:10:33 -0800 Subject: [PATCH 04/25] docs(adr): enhance ADR-0005 with clarifications and alternatives --- docs/decisions/ADR-0005/DECISION.md | 50 +++++++++++++++++++++++++---- 1 file changed, 44 insertions(+), 6 deletions(-) diff --git a/docs/decisions/ADR-0005/DECISION.md b/docs/decisions/ADR-0005/DECISION.md index 2c83c229..cd3cab37 100644 --- a/docs/decisions/ADR-0005/DECISION.md +++ b/docs/decisions/ADR-0005/DECISION.md @@ -24,15 +24,31 @@ Introduce a **first-class, append-only event stream** ("shiplog") that runs in p Problem: SPEC currently emphasizes state snapshots; many use-cases need a **stream** (integration, analytics, replay to external systems). Context: The origin convo proposed a dedicated, queryable append-only log. +### Alternatives Considered + +* **1. Using Git Notes:** + * **Description:** Attach event data to existing Git commits using `git notes`. + * **Reason for Rejection:** Git notes are not first-class citizens in the Git object model and are not replicated by default. This would make them difficult to query, replicate, and manage, and would not provide a clean, parallel stream of events. + +* **2. External Message Queue (e.g., Kafka, NATS):** + * **Description:** Use an external message queue as the primary event stream. + * **Reason for Rejection:** This would introduce a significant external dependency, increasing operational complexity and cost. It would also move a critical piece of the system's data model outside of the core Git repository, potentially compromising the project's goal of being self-contained and Git-native. + +* **3. No Shiplog (Consumers Parse Git History):** + * **Description:** Do not create a dedicated shiplog. Require consumers to parse the entire Git history of the main ledger to extract the events they need. + * **Reason for Rejection:** This would be highly inefficient and complex for consumers. It would require each consumer to implement its own logic for traversing the Git history, filtering commits, and managing its own state. A dedicated shiplog provides a much cleaner and more efficient integration surface. + ### Decision 1. **Shiplog namespaces** -refs/gatos/shiplog//head # commit parent-chain per topic -refs/gatos/consumers// # checkpoints (by ULID) +Each event in the shiplog corresponds to a single Git commit. The shiplog is organized into topics using the following ref structure: + +refs/gatos/shiplog//head # commit parent-chain per topic +refs/gatos/consumers// # checkpoints (by ULID) 2. **Event envelope (normative)** -Canonical JSON with a ULID and canonical `content_id`: +Canonical JSON with a ULID and canonical `content_id`. The `content_id` is the `blake3` hash of the canonical JSON of the event envelope itself. { “ulid”: “<26-char ULID>”, @@ -53,10 +69,10 @@ Content-Id: blake3: 4. **Consumers** - Consumers store per‑topic checkpoints under `refs/gatos/consumers//`. -- Checkpoint value is the last processed `ulid` (and optionally commit). +- Checkpoint value is the last processed `ulid` (and optionally commit). Storing the commit hash allows for faster lookups and can help resolve ordering if ULIDs are not strictly monotonic across distributed nodes. 5. **Queries** -- `gatos-mind` MUST support `shiplog.read(topic, since_ulid, limit)` returning canonical envelopes and commit ids. +- `gatos-mind` MUST support `shiplog.read(topic, since_ulid, limit)` returning canonical envelopes and commit ids, ordered by the Git parent chain (oldest to newest). If `since_ulid` is not found, the stream SHOULD start from the beginning of the topic. - Bus bridge MAY mirror `shiplog` events onto message topics (configurable). 6. **Interaction with Ledger** @@ -71,4 +87,26 @@ Content-Id: blake3: ### Security Considerations - Don’t emit private overlay data (see ADR‑0004). -- Consumers’ checkpoints are not authoritative; they’re advisory markers. \ No newline at end of file +- Consumers’ checkpoints are not authoritative; they’re advisory markers. + +### Diagrams + +```mermaid +graph TD + subgraph Main Ledger + L1[Commit 1] --> L2[Commit 2] + end + + subgraph Shiplog (topic: governance) + S1[Event A
ulid: 01...A] --> S2[Event B
ulid: 01...B] + end + + subgraph Consumers + C1[Consumer Group 1
refs/gatos/consumers/group1/governance
Value: 01...B] + C2[Consumer Group 2
refs/gatos/consumers/group2/governance
Value: 01...A] + end + + L2 -- "Mirrors event" --> S1 + S2 -- "Processed by" --> C1 + S1 -- "Processed by" --> C2 +``` \ No newline at end of file From d4fcd8834eb1f0f5a76cb756ba111d3b9d78a2bb Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Sun, 16 Nov 2025 22:40:19 -0800 Subject: [PATCH 05/25] docs ADR-0005 udpates --- Cargo.lock | 9 +- Cargo.toml | 2 +- README.md | 2 +- ROADMAP.md | 74 +++---- .../Cargo.toml | 9 +- .../README.md | 20 +- crates/gatos-message-plane/src/lib.rs | 208 ++++++++++++++++++ crates/gatos-mind/src/lib.rs | 7 - crates/gatosd/Cargo.toml | 2 +- crates/gatosd/src/main.rs | 8 +- crates/gatosd/src/message_plane.rs | 73 ++++++ docs/FAQ.md | 8 +- docs/FEATURES.md | 2 +- docs/ROADMAP.md | 17 +- docs/SPEC.md | 99 ++++----- docs/TECH-SPEC.md | 32 ++- docs/USE-CASES.md | 4 +- docs/decisions/ADR-0001/flyingrobots.md | 4 +- docs/decisions/ADR-0005/DECISION.md | 78 ++++--- docs/decisions/README.md | 2 +- docs/diagrams/architecture.md | 2 +- docs/diagrams/data_flow.md | 2 +- .../docs_SPEC__0679d036ea__mermaid_19.svg | 2 +- .../docs_SPEC__0679d036ea__mermaid_2.svg | 2 +- .../docs_TECH-SPEC__15850d53f4__mermaid_1.svg | 2 +- .../docs_TECH-SPEC__15850d53f4__mermaid_2.svg | 2 +- .../generated/docs__SPEC__mermaid_19.svg | 2 +- .../generated/docs__SPEC__mermaid_2.svg | 2 +- .../generated/docs__TECH-SPEC__mermaid_1.svg | 2 +- .../generated/docs__TECH-SPEC__mermaid_2.svg | 2 +- ...ocs__diagrams__architecture__mermaid_1.svg | 2 +- .../docs__diagrams__data_flow__mermaid_1.svg | 2 +- ...ms_architecture__105fc24d87__mermaid_1.svg | 2 +- ...grams_data_flow__559f0c180d__mermaid_1.svg | 2 +- ...ide_CHAPTER-001__d51d557e71__mermaid_1.svg | 2 +- ...cs_guide_README__5e9e6b7c1c__mermaid_1.svg | 2 +- docs/guide/CHAPTER-001.md | 6 +- docs/guide/CHAPTER-006.md | 18 +- docs/guide/CHAPTER-007.md | 2 +- docs/guide/CHAPTER-010.md | 11 +- docs/guide/CHAPTER-011.md | 2 +- docs/guide/HELLO-PRIVACY.md | 12 +- docs/guide/README.md | 4 +- docs/opaque-pointers.md | 31 +-- docs/opaque-resolver.md | 8 +- docs/research-profile.md | 9 +- schemas/README.md | 6 + .../consumer_checkpoint.schema.json | 6 +- .../event_envelope.schema.json | 6 +- 49 files changed, 575 insertions(+), 238 deletions(-) rename crates/{gatos-mind => gatos-message-plane}/Cargo.toml (53%) rename crates/{gatos-mind => gatos-message-plane}/README.md (69%) create mode 100644 crates/gatos-message-plane/src/lib.rs delete mode 100644 crates/gatos-mind/src/lib.rs create mode 100644 crates/gatosd/src/message_plane.rs rename schemas/v1/{shiplog => message-plane}/consumer_checkpoint.schema.json (69%) rename schemas/v1/{shiplog => message-plane}/event_envelope.schema.json (86%) diff --git a/Cargo.lock b/Cargo.lock index 807b86c0..4fba9fab 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -498,8 +498,13 @@ dependencies = [ ] [[package]] -name = "gatos-mind" +name = "gatos-message-plane" version = "0.1.0" +dependencies = [ + "blake3", + "hex", + "serde_json", +] [[package]] name = "gatos-policy" @@ -522,7 +527,7 @@ dependencies = [ "clap", "gatos-echo", "gatos-ledger", - "gatos-mind", + "gatos-message-plane", "gatos-policy", "serde_json", "tokio", diff --git a/Cargo.toml b/Cargo.toml index 59c6d34a..25afcc98 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -5,7 +5,7 @@ members = [ "crates/gatos-ledger-core", "crates/gatos-ledger-git", "crates/gatos-ledger", - "crates/gatos-mind", + "crates/gatos-message-plane", "crates/gatos-echo", "crates/gatos-policy", "crates/gatos-kv", diff --git a/README.md b/README.md index 585c6b44..991dba4c 100644 --- a/README.md +++ b/README.md @@ -117,7 +117,7 @@ GATOS organizes the repository into five distinct planes using standard Git refe | **2. Policy/Trust** | `refs/gatos/policies/*` | Executable policy (Lua/WASM), capabilities, quorum; **deny-audit** on violations. | | | `refs/gatos/trust/*` | Keys, groups, grants, revocations. | | **3. State** | `refs/gatos/state/*` | Deterministic checkpoints derived from the ledger (**Proof-of-Fold**). | -| **4. Message** | `refs/gatos/mbus/*` | Commit-backed pub/sub (at-least-once + idempotency). | +| **4. Message** | `refs/gatos/messages/*` | Commit-backed message plane (topics served via `messages.read`). | | **5. Job** | `refs/gatos/jobs/*` | Jobs and **Proofs-of-Execution (PoE)**; exclusive claim via CAS. | ----- diff --git a/ROADMAP.md b/ROADMAP.md index 52dc8be1..5891f819 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -16,7 +16,7 @@ It follows a strict **proof-first** philosophy: - **Proof-first Design** — Every claim is verifiable from first principles. - **Deterministic by Construction** — Same history + same policy = same state, bit-for-bit. -- **Git as History, not Database** — Git stores events, checkpoints, and proofs; bulk data lives behind Opaque Pointers; heavy analytics via Explorer off-ramp. +- **Git as History, not Database** — Git stores Message Plane events, checkpoints, and proofs; bulk data lives behind Opaque Pointers; heavy analytics via Explorer off-ramp. - **Research Profile Defaults** — A conservative profile for scientific reproducibility (PoF required, policy FF-only, anchored audit refs). - **At-Least-Once + Idempotency** — Delivery is at-least-once; consumers dedupe idempotently. No “exactly-once” fairy tales. @@ -28,7 +28,7 @@ These are explicit non-goals until after the core truth machine is working: - A fully featured **multi-peer networking layer** (start single-node). - A **cluster scheduler** or full-blown job orchestration system. -- A replacement for **Kafka** or high-throughput brokers. +- A replacement for **Kafka** or high-throughput brokers (Message Plane stays Git-native, not a hosted queue). - A hosted “GATOS Cloud” product. - Strong isolation / capability-based sandboxing beyond basic VM guarantees @@ -43,7 +43,7 @@ These are explicit non-goals until after the core truth machine is working: | **M0** | Repo, scaffolding, canonicalization, ADR process | | **M1** | EchoLua fold engine + Proof-of-Fold (PoF) | | **M2** | Push-gate, .rgs policy, DENY-audit, grants | -| **M3** | Commit-backed Message Bus (segmented, TTL, summaries) | +| **M3** | Message Plane (Git-native append-only stream + queries) | | **M4** | Job Plane + Proof-of-Execution (PoE) | | **M5** | Opaque Pointers + privacy-preserving projection | | **M6** | Explorer off-ramp + Explorer-Root verification | @@ -139,31 +139,29 @@ These are explicit non-goals until after the core truth machine is working: --- -## M3 — Message Bus (Commit-backed Pub/Sub) +## M3 — Message Plane (ADR-0005) -**Goal:** A usable event bus that lives in Git without melting Git. +**Goal:** Land the Git-native Message Plane so integrations can consume ordered events without parsing the entire ledger. **Deliverables:** -- Namespaced mbus: - - `refs/gatos/mbus////
/` -- QoS: - - At-least-once delivery. - - Idempotency keys + content hashes. - - Subscriber-side dedupe. -- Rotation and retention: - - Segment rotation based on `max_messages_per_segment` OR `max_segment_bytes`. - - TTL-based pruning for old segments. - - Summary commits capturing: - - Merkle root of message bodies. - - count, min/max offsets. -- Observability: - - Metrics: messages per segment, pack sizes, TTL age, rotation suggestions. +- Refs & checkpoints: + - `refs/gatos/messages//head` per-topic parent chains. + - `refs/gatos/consumers//` storing last processed `ulid` (+ optional commit) for each consumer group. +- Event envelope: + - Canonical JSON payload with `ulid`, `ns`, `type`, `payload`, `refs`, and `content_id` (BLAKE3 of the canonical envelope). + - Enforce `Event-Id` and `Content-Id` headers in Message Plane commit messages. +- APIs & tooling: + - `gatos-message-plane messages.read(topic, since_ulid, limit)` returning canonical envelopes + commit ids, oldest → newest. + - Consumer checkpoint helpers (list, advance, reset) plus tests for ULID monotonicity. +- Integration: + - Automatically emit Message Plane events for ledger folds and governance transitions (e.g., `governance` topic). + - Optional bridge mirroring Message Plane topics to external brokers (Kafka/NATS) without breaking Git-native ownership. **Done when:** -- Duplicate messages do not cause duplicate effects if consumers obey idempotency. -- Git repos remain manageable under expected message load. +- Consumers can resume from checkpoints and replay Message Plane topics deterministically on fresh clones. +- Governance transitions and ledger mirrors emit Message Plane events discoverable via `messages.read`. --- @@ -176,7 +174,7 @@ These are explicit non-goals until after the core truth machine is working: - Job claims: - Exclusive CAS lock ref `refs/gatos/jobs//claim`. - Worker: - - Subscribe to mbus. + - Subscribe to the Message Plane `jobs` topic (`messages.read` helper). - Claim jobs. - Run configured program/container. - Commit results. @@ -199,9 +197,10 @@ These are explicit non-goals until after the core truth machine is working: **Deliverables:** - Public pointer schema: - - Commitments and ciphertext digests. - - Bucketed sizes (e.g., 1k/4k/16k/64k). - - No plaintext digest for low-entropy classes. + - Canonical JSON envelope with `kind: "opaque_pointer"`, `algo`, `digest`, and optional bucketed `size` (e.g., 1k/4k/16k/64k). + - `location` URI for retrieval (e.g., `gatos-node://`, `https://`, `s3://`, `ipfs://`). + - `capability` URI describing how to authorize/decrypt (e.g., `gatos-key://`, `kms://`, `age://`). + - `digest` is the BLAKE3 hash of the raw plaintext blob; no ciphertext hash is tracked in Git. - Resolver service: - Auth (Bearer JWT; optional HTTP signatures/mTLS). - Returns bytes + `Digest` headers. @@ -300,7 +299,8 @@ These are explicit non-goals until after the core truth machine is working: - `git gatos doctor`: - Checks for misconfigurations in: - profiles, - - mbus rotation/TTL, + - Message Plane head continuity & retention, + - consumer checkpoint drift, - anchors, - PoF presence, - export consistency. @@ -323,7 +323,7 @@ These are explicit non-goals until after the core truth machine is working: - DAG-CBOR parsing, - EchoLua interpreter, - .rgs compiler, - - mbus dedupe, + - Message Plane consumer dedupe/resume logic, - pointer resolver. - External cryptography review: - PoF and PoE signing, @@ -401,15 +401,15 @@ M2 – Push-Gate & Policy - Implement proposal/approval/grant flow - Add policy verify -M3 – Message Bus +M3 – Message Plane -- Implement refs/gatos/mbus/* structure -- Add message publish + subscribe RPC -- Add at-least-once + idempotency support -- Implement segmented topics -- Implement TTL pruning -- Add summary commits with Merkle roots -- Add observability metrics +- Implement `refs/gatos/messages//head` parent chains +- Implement `refs/gatos/consumers//` checkpoints (ULID + commit) +- Define canonical Message Plane envelope + commit annotations (Event-Id/Content-Id) +- Add `gatos-message-plane messages.read` RPC + CLI helper +- Add consumer checkpoint management commands/tests (ULID monotonicity) +- Auto-emit ledger & governance events into appropriate Message Plane topics +- Add optional bridge to mirror Message Plane topics to external brokers M4 – Job Plane + PoE @@ -421,8 +421,8 @@ M4 – Job Plane + PoE M5 – Opaque Pointers + Privacy -- Implement pointer format (ciphertext_digest + bucketed size) -- Implement encrypted meta store +- Implement pointer envelope (kind/algo/digest/size/location/capability) +- Implement private overlay store wired into capability URIs - Implement pointer resolver (JWT + Digest headers) - Integrate privacy projection into fold pipeline - Add projection determinism tests diff --git a/crates/gatos-mind/Cargo.toml b/crates/gatos-message-plane/Cargo.toml similarity index 53% rename from crates/gatos-mind/Cargo.toml rename to crates/gatos-message-plane/Cargo.toml index 9af24687..971ab1bf 100644 --- a/crates/gatos-mind/Cargo.toml +++ b/crates/gatos-message-plane/Cargo.toml @@ -1,13 +1,16 @@ [package] -name = "gatos-mind" +name = "gatos-message-plane" version.workspace = true edition.workspace = true authors.workspace = true license.workspace = true repository.workspace = true -description = "Placeholder for GATOS Message Bus implementation" +description = "Placeholder for the GATOS Message Plane (message bus) implementation" # NOTE: Placeholder metadata until implementation lands; do not publish. -keywords = ["gatos", "message-bus", "git"] +keywords = ["gatos", "message-plane", "git"] categories = ["network-programming", "data-structures"] [dependencies] +serde_json = { workspace = true } +blake3 = { workspace = true } +hex = { workspace = true } diff --git a/crates/gatos-mind/README.md b/crates/gatos-message-plane/README.md similarity index 69% rename from crates/gatos-mind/README.md rename to crates/gatos-message-plane/README.md index 179ce308..f263fb98 100644 --- a/crates/gatos-mind/README.md +++ b/crates/gatos-message-plane/README.md @@ -1,6 +1,6 @@ -# GATOS Mind (Message Bus) +# GATOS Message Plane (Message Bus) -This crate implements the GATOS Message Bus (GMB), an asynchronous, commit-backed publish/subscribe +This crate implements the GATOS Message Plane (GMP), an asynchronous, commit-backed publish/subscribe system. It handles topics, sharding, and different Quality of Service (QoS) guarantees for distributed communication between GATOS components. @@ -36,7 +36,7 @@ in [ADR-0001](../../docs/decisions/ADR-0001/DECISION.md) and protocol details in > For the evolving design and protocol, see [TECH-SPEC.md](../../docs/TECH-SPEC.md). ```text -// use gatos_mind::{Publisher, Subscriber}; +// use gatos_message_plane::{Publisher, Subscriber}; // #[tokio::main] // async fn main() { /* publish/subscribe */ } ``` @@ -45,7 +45,7 @@ Examples are coming once the API lands. ## Integration -GMB is the Message Plane in the GATOS hexagonal architecture. It coordinates messaging across: +GMP is the Message Plane in the GATOS hexagonal architecture. It coordinates messaging across: - `crates/gatos-ledger-core` and `crates/gatos-ledger-git`: ledger state events - `crates/gatos-policy`: policy decision events @@ -54,12 +54,22 @@ GMB is the Message Plane in the GATOS hexagonal architecture. It coordinates mes ### Usage (API Sketch) -- Depend on `gatos-mind` in your crate. +- Depend on `gatos-message-plane` in your crate. - Use a `Publisher` to publish messages to a topic; use a `Subscriber` to consume. - Messages are persisted as Git commits to provide auditability and coordinate exactly-once when combined with acknowledgements/commitments. > Note: This section reflects the intended usage; concrete APIs will be added as implementation proceeds. +## Current API Skeleton + +The crate currently exports lightweight traits and structs so downstream crates can start wiring integrations: + +- `TopicRef` — identifies the repository + logical topic (`refs/gatos/messages/`). +- `MessageEnvelope` — holds the canonical JSON bytes (per `schemas/v1/message-plane/event_envelope.schema.json`) and can be built via `MessageEnvelope::from_json_str` to enforce canonicalization/validation. +- `MessagePublisher`, `MessageSubscriber`, `CheckpointStore` — traits implemented by `gatosd` once the Message Plane RPC lands. They expose `publish`, `read`, and checkpoint persistence hooks mapped to ADR-0005’s `messages.read` contract. + +These types intentionally omit concrete transport plumbing; they document the expected shape so ADR work and downstream SDKs can evolve in parallel. + For protocol details, architecture rationale, and design patterns, see [ADR-0001](../../docs/decisions/ADR-0001/DECISION.md) and [TECH-SPEC.md](../../docs/TECH-SPEC.md). diff --git a/crates/gatos-message-plane/src/lib.rs b/crates/gatos-message-plane/src/lib.rs new file mode 100644 index 00000000..49fb4c70 --- /dev/null +++ b/crates/gatos-message-plane/src/lib.rs @@ -0,0 +1,208 @@ +//! GATOS Message Plane — commit-backed message bus primitives. +//! +//! The real transport lives inside `gatosd`, but this crate defines the +//! public-facing types and traits that publishers/subscribers will use once +//! ADR-0005 is fully implemented. Keeping these definitions here lets other +//! crates depend on the semantics without needing the daemon. + +use std::path::PathBuf; + +use blake3::Hasher; +use serde_json::{Map, Value}; + +/// Placeholder export so downstream builds keep working while the real API +/// is filled in. Remove once the Message Plane lands. +#[allow(clippy::must_use_candidate)] +pub const fn hello_message_plane() -> &'static str { + "Hello from gatos-message-plane!" +} + +/// Canonical reference to a message topic. +#[derive(Debug, Clone, PartialEq, Eq, Hash)] +pub struct TopicRef { + /// Repository path housing the `refs/gatos/messages/` namespace. + pub repo: PathBuf, + /// Logical topic name (e.g., `governance`, `jobs/pending`). + pub name: String, +} + +impl TopicRef { + /// Creates a new topic reference rooted at the provided repository path. + pub fn new, S: Into>(repo: P, name: S) -> Self { + Self { + repo: repo.into(), + name: name.into(), + } + } +} + +/// Canonical envelope payload conforming to +/// `schemas/v1/message-plane/event_envelope.schema.json`. +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct MessageEnvelope { + /// ULID from the envelope body (used for ordering + dedupe). + pub ulid: String, + /// Namespace string (e.g., `governance`). + pub namespace: String, + /// Type string (e.g., `proposal.created`). + pub event_type: String, + /// Canonical JSON bytes written into `message/envelope.json`. + pub canonical_bytes: Vec, +} + +impl MessageEnvelope { + /// Convenience constructor for callers that already produced canonical JSON. + pub fn new, N: Into, T: Into, B: Into>>( + ulid: U, + namespace: N, + event_type: T, + canonical_bytes: B, + ) -> Self { + Self { + ulid: ulid.into(), + namespace: namespace.into(), + event_type: event_type.into(), + canonical_bytes: canonical_bytes.into(), + } + } + + /// Build an envelope from raw JSON and canonicalize it. + pub fn from_json_str(raw: &str) -> Result { + let value: Value = serde_json::from_str(raw) + .map_err(|e| MessagePlaneError::InvalidEnvelope(format!("parse error: {e}")))?; + Self::from_value(value) + } + + /// Build an envelope from an already parsed JSON value. + pub fn from_value(value: Value) -> Result { + let ulid = value + .get("ulid") + .and_then(Value::as_str) + .ok_or_else(|| MessagePlaneError::InvalidEnvelope("missing 'ulid'".into()))?; + validate_ulid_str(ulid)?; + let namespace = value + .get("ns") + .and_then(Value::as_str) + .ok_or_else(|| MessagePlaneError::InvalidEnvelope("missing 'ns'".into()))?; + let event_type = value + .get("type") + .and_then(Value::as_str) + .ok_or_else(|| MessagePlaneError::InvalidEnvelope("missing 'type'".into()))?; + if !value.get("payload").is_some() { + return Err(MessagePlaneError::InvalidEnvelope( + "missing 'payload'".into(), + )); + } + let canonical = canonicalize_json(value); + let canonical_bytes = serde_json::to_vec(&canonical) + .map_err(|e| MessagePlaneError::InvalidEnvelope(format!("serialize error: {e}")))?; + Ok(Self { + ulid: ulid.to_string(), + namespace: namespace.to_string(), + event_type: event_type.to_string(), + canonical_bytes, + }) + } + + /// Returns `blake3:` digest of the canonical bytes. + pub fn content_id(&self) -> String { + let mut hasher = Hasher::new(); + hasher.update(&self.canonical_bytes); + format!("blake3:{}", hex::encode(hasher.finalize().as_bytes())) + } +} + +/// Result of writing a message commit to the ledger/repo. +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct PublishReceipt { + /// Git commit id (`oid`) of the message commit. + pub commit_id: String, + /// Canonical `content_id` (BLAKE3 hex digest of the envelope bytes). + pub content_id: String, +} + +/// Errors encountered during publish/subscribe workflows. +#[derive(Debug)] +pub enum MessagePlaneError { + /// Repository IO or libgit2 failure. + Repo(String), + /// Provided envelope failed schema/canonical validation. + InvalidEnvelope(String), + /// CAS violation while appending to a topic. + HeadConflict, + /// Subscriber checkpoint could not be stored. + Checkpoint(String), +} + +impl std::fmt::Display for MessagePlaneError { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + match self { + Self::Repo(e) => write!(f, "repository error: {}", e), + Self::InvalidEnvelope(e) => write!(f, "invalid envelope: {}", e), + Self::HeadConflict => write!(f, "topic head moved while publishing"), + Self::Checkpoint(e) => write!(f, "checkpoint error: {}", e), + } + } +} + +impl std::error::Error for MessagePlaneError {} + +/// Publish interface implemented by the daemon. +pub trait MessagePublisher { + /// Append a message to `topic`, returning the resulting commit + content ids. + fn publish(&self, topic: &TopicRef, envelope: MessageEnvelope) + -> Result; +} + +/// Subscriber interface for streaming messages off a topic. +pub trait MessageSubscriber { + /// Fetch up to `limit` messages newer than `since_ulid`. + fn read( + &self, + topic: &TopicRef, + since_ulid: Option<&str>, + limit: usize, + ) -> Result, MessagePlaneError>; +} + +/// Persistence for consumer checkpoints (refs/gatos/consumers/**). +pub trait CheckpointStore { + /// Record `ulid`/`commit` as the last-seen event for `topic` and `group`. + fn persist_checkpoint( + &self, + group: &str, + topic: &TopicRef, + ulid: &str, + commit: &str, + ) -> Result<(), MessagePlaneError>; +} + +/// Validates that `input` is a 26-char ULID using uppercase Crockford base32. +pub fn validate_ulid_str(input: &str) -> Result<(), MessagePlaneError> { + if input.len() != 26 { + return Err(MessagePlaneError::InvalidEnvelope("ulid must be 26 chars".into())); + } + if !input.chars().all(|c| matches!(c, '0'..='9' | 'A'..='H' | 'J'..='N' | 'P'..='T' | 'V'..='Z')) + { + return Err(MessagePlaneError::InvalidEnvelope( + "ulid must be uppercase Crockford base32".into(), + )); + } + Ok(()) +} + +fn canonicalize_json(value: Value) -> Value { + match value { + Value::Object(map) => { + let mut entries: Vec<_> = map.into_iter().collect(); + entries.sort_by(|a, b| a.0.cmp(&b.0)); + let mut new_map = Map::with_capacity(entries.len()); + for (k, v) in entries { + new_map.insert(k, canonicalize_json(v)); + } + Value::Object(new_map) + } + Value::Array(arr) => Value::Array(arr.into_iter().map(canonicalize_json).collect()), + other => other, + } +} diff --git a/crates/gatos-mind/src/lib.rs b/crates/gatos-mind/src/lib.rs deleted file mode 100644 index 058baef2..00000000 --- a/crates/gatos-mind/src/lib.rs +++ /dev/null @@ -1,7 +0,0 @@ -//! TODO: Message bus implementation per TECH-SPEC. This is a placeholder API. -//! See docs/TECH-SPEC.md (Message Bus) for the execution plan. - -#[allow(clippy::must_use_candidate)] -pub const fn hello_mind() -> &'static str { - "Hello from gatos-mind!" -} diff --git a/crates/gatosd/Cargo.toml b/crates/gatosd/Cargo.toml index 44e1aa9a..e7345f79 100644 --- a/crates/gatosd/Cargo.toml +++ b/crates/gatosd/Cargo.toml @@ -11,7 +11,7 @@ categories = ["command-line-utilities"] [dependencies] gatos-ledger = { path = "../gatos-ledger" } -gatos-mind = { path = "../gatos-mind" } +gatos-message-plane = { path = "../gatos-message-plane" } gatos-echo = { path = "../gatos-echo" } gatos-policy = { path = "../gatos-policy" } serde_json = { workspace = true } diff --git a/crates/gatosd/src/main.rs b/crates/gatosd/src/main.rs index c3b06542..9616a53d 100644 --- a/crates/gatosd/src/main.rs +++ b/crates/gatosd/src/main.rs @@ -5,7 +5,10 @@ //! an async loop that waits for shutdown signals. The JSONL RPC server //! will be implemented in a subsequent iteration. +mod message_plane; + use clap::Parser; +use message_plane::MessagePlaneService; use tracing::{error, info}; #[derive(Parser, Debug)] @@ -22,7 +25,10 @@ async fn main() -> anyhow::Result<()> { let args = Args::parse(); info!(?args, "starting gatosd"); - // TODO: wire up JSONL RPC server (stdio or TCP) per TECH-SPEC + // TODO: wire up JSONL RPC server (stdio or TCP) per TECH-SPEC, including + // `messages.read` handlers backed by crates/gatos-message-plane (ADR-0005). + let mp = MessagePlaneService::new(); + info!(max_page_size = mp.max_page_size(), "message plane stub ready"); // Placeholder: run until Ctrl-C if let Err(e) = tokio::signal::ctrl_c().await { error!(?e, "failed to install Ctrl-C handler"); diff --git a/crates/gatosd/src/message_plane.rs b/crates/gatosd/src/message_plane.rs new file mode 100644 index 00000000..4400cf4e --- /dev/null +++ b/crates/gatosd/src/message_plane.rs @@ -0,0 +1,73 @@ +use gatos_message_plane::{ + validate_ulid_str, CheckpointStore, MessagePlaneError, MessagePublisher, MessageSubscriber, + TopicRef, +}; + +use gatos_message_plane::{MessageEnvelope, PublishReceipt}; + +const MAX_PAGE_SIZE: usize = 512; + +/// Placeholder service that will eventually wrap the real Git-backed Message Plane. +pub struct MessagePlaneService; + +impl MessagePlaneService { + pub fn new() -> Self { + Self + } + + pub fn max_page_size(&self) -> usize { + MAX_PAGE_SIZE + } + + /// Stub entry point for the upcoming RPC server integration. + pub fn messages_read( + &self, + topic: &TopicRef, + since_ulid: Option<&str>, + limit: usize, + ) -> Result, MessagePlaneError> { + self.read(topic, since_ulid, limit) + } +} + +impl MessagePublisher for MessagePlaneService { + fn publish( + &self, + _topic: &TopicRef, + _envelope: MessageEnvelope, + ) -> Result { + Err(MessagePlaneError::Repo( + "Message Plane publish not implemented (see ADR-0005)".into(), + )) + } +} + +impl MessageSubscriber for MessagePlaneService { + fn read( + &self, + _topic: &TopicRef, + since_ulid: Option<&str>, + limit: usize, + ) -> Result, MessagePlaneError> { + if let Some(cursor) = since_ulid { + validate_ulid_str(cursor)?; + } + let _clamped = limit.clamp(1, MAX_PAGE_SIZE); + Ok(Vec::new()) + } +} + +impl CheckpointStore for MessagePlaneService { + fn persist_checkpoint( + &self, + _group: &str, + _topic: &TopicRef, + ulid: &str, + _commit: &str, + ) -> Result<(), MessagePlaneError> { + validate_ulid_str(ulid)?; + Err(MessagePlaneError::Checkpoint( + "checkpoint persistence not implemented (see ADR-0005)".into(), + )) + } +} diff --git a/docs/FAQ.md b/docs/FAQ.md index 6597e3e9..896bccc6 100644 --- a/docs/FAQ.md +++ b/docs/FAQ.md @@ -300,7 +300,7 @@ Add `fold_root` to proof envelope and make proof field type-tagged: Use a versioned shard map + dual-write migration. -- Store topic config at `refs/gatos/mbus-config/.json`: +- Store topic config at `refs/gatos/messages-config/.json`: ```json { @@ -647,7 +647,7 @@ Grant chain fields (prev, revokes) and a rotation checklist in spec. - Needs: signed events, multi-sig policy changes, human/JSON logs, offline verify. -- We meet: Shiplog DNA + proof envelopes v1. +- We meet: Message Plane DNA + proof envelopes v1. - Add: “evidence pack” command that bundles logs + proof → ✅ audit-ready. ### 3. Air-gapped ML registry @@ -930,8 +930,8 @@ Grant chain fields (prev, revokes) and a rotation checklist in spec. - **Gate**: - finalize `.rgs` grammar + deterministic interpreter; - emit rule ids in Deny. -- **Bus**: - - `mbus-config/.json` with versioned shard maps + dual-write migration. +- **Message Plane**: + - `messages-config/.json` with versioned shard maps + dual-write migration. - **Proofs**: - implement commitment proofs today; - leave ZK behind a trait. diff --git a/docs/FEATURES.md b/docs/FEATURES.md index 74c4a735..da7b2783 100644 --- a/docs/FEATURES.md +++ b/docs/FEATURES.md @@ -274,7 +274,7 @@ Each feature includes user stories per relevant stakeholders (format requested), -- [ ] Pointer includes plaintext hash, ciphertext hash, cipher meta +- [ ] Pointer envelope is Canonical JSON with `kind`, `algo`, `digest`, bucketed `size`, `location`, and `capability` - [ ] Rekey operation available #### F6 Test Plan diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md index a7718d94..907bbe15 100644 --- a/docs/ROADMAP.md +++ b/docs/ROADMAP.md @@ -335,11 +335,11 @@ These are explicit non-goals until after the core truth machine is working: -- Namespaced mbus: +- Namespaced Message Plane topics: - `refs/gatos/mbus////
/` + `refs/gatos/messages////
/` with `refs/gatos/messages//head` pointing at the latest segment. -- QoS: **at-least-once + idempotency + ack/dedupe**. +- QoS: **at-least-once + ULID/idempotency + checkpoint dedupe** via `refs/gatos/consumers//`. - Rotation thresholds: - max 100k messages **or** 192MB per segment @@ -394,7 +394,7 @@ These are explicit non-goals until after the core truth machine is working: -- Exclusive CAS lock ref: `refs/gatos/jobs//claim`. +- Exclusive CAS lock refs: `refs/gatos/jobs//claims/` (expected old = zero OID; policy enforces exclusivity + retries). - Worker loop: - subscribe → claim → run → commit result. - PoE envelope: inputs\_root, program\_id, outputs\_root. @@ -445,10 +445,11 @@ These are explicit non-goals until after the core truth machine is working: - Public pointer schema: - - **ciphertext\_digest** REQUIRED - - NO plaintext digest for low-entropy data - - size bucketed (1k/4k/16k/64k) -- Resolver: JWT + Digests; logs under audit. + - Canonical JSON envelope with `kind: "opaque_pointer"`, `algo`, `digest`, and optional bucketed `size` (1k/4k/16k/64k). + - `location` URI describing where to fetch private bytes (`gatos-node://`, `https://`, `s3://`, `ipfs://`, etc.). + - `capability` URI describing how to authorize/decrypt (`gatos-key://`, `kms://`, `age://`, etc.). + - `digest` is the BLAKE3 hash of the raw plaintext blob committed out-of-band; ciphertext hashes are not recorded in Git. +- Resolver: JWT + Digest verification; logs under audit. - Projection engine performs pointerization deterministically. ### Done When diff --git a/docs/SPEC.md b/docs/SPEC.md index 89755d08..7f4208a9 100644 --- a/docs/SPEC.md +++ b/docs/SPEC.md @@ -29,7 +29,7 @@ - [6. Policy & Decision Audit](#6-policy--decision-audit) - [6.1 Gate Contract](#61-gate-contract) - [7. Blob Pointers & Opaque Storage (Hybrid Privacy)](#7-blob-pointers--opaque-storage-hybrid-privacy) -- [8. Message Bus (Commit-Backed Pub/Sub)](#8-message-bus-commit-backed-pubsub) +- [8. Message Plane (Commit-Backed Pub/Sub)](#8-message-plane-commit-backed-pubsub) - [9. Sessions (Working Branches)](#9-sessions-working-branches) - [10. Proofs (Commitments / ZK)](#10-proofs-commitments--zk) - [10.x Proof-of-Experiment (PoX)](#10x-proof-of-experiment-pox) @@ -157,7 +157,7 @@ graph TD end subgraph "Message Plane" - Mind("gatos-mind"); + Mind("gatos-message-plane"); end subgraph "Job Plane" @@ -224,7 +224,7 @@ graph TD A(refs) --> A1(gatos) A1 --> B1(journal) A1 --> B2(state) - A1 --> B3(mbus) + A1 --> B3(messages) A1 --> B4(jobs) A1 --> B5(sessions) A1 --> B6(audit) @@ -251,11 +251,12 @@ The normative layout is as follows: │ └── gatos/ │ ├── journal/ │ ├── state/ -│ ├── mbus/ -│ ├── mbus-ack/ +│ ├── messages/ +│ ├── consumers/ │ ├── jobs/ │ │ └── / -│ │ └── claim +│ │ └── claims/ +│ │ └── │ ├── proposals/ │ ├── approvals/ │ ├── grants/ @@ -711,63 +712,47 @@ classDiagram +Number size } class OpaquePointer { - +String kind: "opaque" - +String algo // content cipher, e.g., "aes-256-gcm" or "chacha20poly1305" - +String ciphertext_hash // BLAKE3 hex of ciphertext bytes (integrity check) - +Object encrypted_meta // see schema below (REQUIRED for new pointers) + +String kind: "opaque_pointer" + +String algo // hash algo for plaintext bytes (BLAKE3) + +String digest // blake3: over plaintext + +Number size // OPTIONAL bucketed size (1k/4k/16k/64k) + +String location // URI describing where to fetch bytes + +String capability // URI describing how to authorize/decrypt } ``` -Pointers **MUST** refer to bytes in `gatos/objects//`. For opaque -objects, no plaintext **MAY** be stored in Git. Public pointers in low-entropy -classes **MUST NOT** reveal a plaintext digest; they **MUST** include a -ciphertext digest. Pointer `size` **SHOULD** be bucketed (e.g., -1 KB/4 KB/16 KB/64 KB). If a plaintext commitment is required, use a hiding -commitment and store it inside `encrypted_meta`. - -`encrypted_meta` (Normative) — object fields: - -- `enc`: string — cipher suite id, e.g., `aes-256-gcm` or `chacha20poly1305`. -- `iv`: base64url — initialization vector/nonce bytes. -- `salt`: base64url — KDF salt when deriving keys (RECOMMENDED when using passphrase KDFs). -- `aad`: base64url — additional authenticated data bound to encryption (OPTIONAL; empty if unused). -- `tag`: base64url — AEAD tag if not appended to ciphertext (omit when suite appends tag). - -`ciphertext_hash` is an integrity hint only. Authenticity comes from the -envelope signature and repository trust rules (see Sections 3 and 6). Verifiers -**MUST** treat `ciphertext_hash` as unauthenticated unless covered by a -signature. - -Compatibility note: previous drafts used `cipher_meta`. Parsers **SHOULD** -accept inputs with `cipher_meta`; emit `encrypted_meta` going forward. A simple -migration heuristic is: if `cipher_meta` exists and `encrypted_meta` is absent, -rename `cipher_meta` → `encrypted_meta` (see `scripts/migrate_opaque_pointers.py`). - -Before/after example: - -```jsonc -// legacy -{ "kind":"opaque", "algo":"aes-256-gcm", "ciphertext_hash":"blake3:…", - "cipher_meta": {"enc":"aes-256-gcm","iv":"…","salt":"…"} } - -// normalized -{ "kind":"opaque", "algo":"aes-256-gcm", "ciphertext_hash":"blake3:…", - "encrypted_meta": {"enc":"aes-256-gcm","iv":"…","salt":"…"} } -``` +Pointers **MUST** refer to bytes addressed by `digest = blake3(plaintext)`; the plaintext bytes themselves never enter Git. Canonical opaque pointer envelopes follow ADR-0004 and MUST: + +- use RFC 8785 Canonical JSON (UTF-8, sorted keys, no insignificant whitespace), +- set `kind = "opaque_pointer"`, +- include `algo` (currently `"blake3"`), +- include `location` as a URI (`gatos-node://`, `https://`, `s3://`, `ipfs://`, `file:///` dev/test), +- include `capability` as a URI describing how to authorize/decrypt (`gatos-key://`, `kms://`, `age://`, `sops://`, etc.), +- optionally set `size` using coarse buckets to limit metadata leakage. + +`digest` is the BLAKE3 hash of the raw plaintext blob; ciphertext hashes are intentionally omitted from Git history. Implementations store encrypted bytes off-repo according to the `location` scheme and enforce authorization using the declared `capability`. A pointer envelope’s own `content_id` is `blake3(canonical_bytes)`. + +Pointer resolution flow: + +1. Parse `location` and fetch the encrypted blob (e.g., `GET /.well-known/gatos/private/{digest}` for `gatos-node://` targets or the obvious client for HTTPS/S3/IPFS). +2. Use `capability` to locate/derive the decryption capability (KMS key, shared secret, AGE recipient, etc.). +3. Decrypt and verify that `blake3(plaintext) == digest`. Resolution MUST fail if the digest does not match. + +Public state MAY include simple `blobptr` entries when plaintext data is safe to replicate. Opaque pointers are REQUIRED for low-entropy or private data. All pointer `size` metadata SHOULD use coarse buckets (1 KB/4 KB/16 KB/64 KB) to prevent leaking exact sizes. --- -## 8. Message Bus (Commit-Backed Pub/Sub) +## 8. Message Plane (Commit-Backed Pub/Sub) - + - + - + - + - + The message bus provides a pub/sub system built on Git commits. @@ -785,7 +770,7 @@ sequenceDiagram GATOS->>GATOS: Create gmb.commit Event ``` -Messages are appended to `refs/gatos/mbus//`. Delivery is **at-least-once**; consumers **MUST** dedupe on read using the message `ULID` (or content hash) as an idempotency key and **MAY** write `ack`s to `refs/gatos/mbus-ack/`. Producers **SHOULD** set idempotency keys. +Messages are appended to `refs/gatos/messages//head` (optionally sharded by date/size). Delivery is **at-least-once**; consumers **MUST** dedupe using the message `ULID` and persist checkpoints under `refs/gatos/consumers//`. Producers **SHOULD** set `ULID`s deterministically when mirroring external buses. `messages.read` (ADR-0005) exposes canonical envelopes plus commit ids for RPC/CLI clients. Retention and compaction: @@ -954,7 +939,7 @@ Defaults (normative for this profile): - Fast-forward-only refs: `refs/gatos/policies/**`, `refs/gatos/state/**`, and `refs/gatos/audit/**`. - GC anchors: `refs/gatos/audit/**` and the latest `refs/gatos/state/**` checkpoints. - Message bus segmentation and TTL: rotate segments at 100k messages or \~192 MB; TTL 30 days; write summary commits for pruned windows. -- Public pointer hardening: low-entropy classes MUST NOT expose plaintext digests; public pointers MUST include a ciphertext digest; sizes SHOULD be bucketed (e.g., 1 KB, 4 KB, 16 KB, 64 KB). +- Public pointer hardening: low-entropy classes MUST use `opaque_pointer` envelopes with bucketed `size`, `location`/`capability` URIs, and `digest = blake3(plaintext)`; verifiers MUST refuse pointers whose fetched plaintext hash mismatches the digest. Nodes advertising the `research` profile MUST expose diagnostics for the above and SHOULD surface violations in `gatos doctor`. @@ -1151,8 +1136,8 @@ graph TD - PoF required: state pushes to `refs/gatos/state/**` MUST include a verifiable Proof-of-Fold. - Policies FF-only: `refs/gatos/policies/**` MUST be fast-forward only. -- Exclusive job claim: exactly one worker MUST succeed in creating `refs/gatos/jobs//claim` via compare-and-swap. -- Pointer privacy: public pointers for low-entropy classes MUST NOT expose plaintext digests; ciphertext digest present; sizes bucketed. +- Exclusive job claim: exactly one worker MUST succeed in creating `refs/gatos/jobs//claims/` via compare-and-swap (expected old zero; policy denies duplicates). +- Pointer privacy: opaque pointer envelopes MUST include `location` + `capability` URIs, bucketed `size`, and a BLAKE3 `digest` over plaintext; verifiers MUST fetch/decrypt and recompute the digest before trusting data. - Exports: exporters MUST emit `Explorer-Root = blake3(ledger_head || policy_root || extractor_version)` and `gatos export verify` MUST validate it. --- @@ -1224,7 +1209,7 @@ sequenceDiagram participant Client participant Daemon as gatosd participant Ledger as gatos-ledger - participant Bus as gatos-mind + participant Bus as gatos-message-plane participant State as gatos-echo Client->>Daemon: 1. Enqueue Job (Event) diff --git a/docs/TECH-SPEC.md b/docs/TECH-SPEC.md index 5afd12c8..3449aa74 100644 --- a/docs/TECH-SPEC.md +++ b/docs/TECH-SPEC.md @@ -75,7 +75,7 @@ graph TD subgraph gatos A(crates) --> A1(gatos-ledger-core) A --> A2(gatos-ledger-git) - A --> A3(gatos-mind) + A --> A3(gatos-message-plane) A --> A4(gatos-echo) A --> A5(gatos-policy) A --> A6(gatos-kv) @@ -143,7 +143,7 @@ graph TD end subgraph "Message Plane" - Mind("gatos-mind"); + Mind("gatos-message-plane"); end subgraph "Job Plane" @@ -189,7 +189,7 @@ Note: “Policy/Trust Plane” includes the policy engine and trust artifacts (k | `gatos-ledger-core` | `no_std` core logic, data structures, and traits for the ledger. | | `gatos-ledger-git` | `std`-dependent storage backend using `libgit2`. | | `gatos-ledger` | Composes ledger components via feature flags. | -| `gatos-mind` | Asynchronous, commit-backed message bus (pub/sub). | +| `gatos-message-plane` | Asynchronous, commit-backed message bus (pub/sub). | | `gatos-echo` | Deterministic state engine for processing events ("folds"). | | `gatos-policy` | Deterministic policy engine for executing compiled rules and managing the Consensus Governance lifecycle. | | `gatos-kv` | Git-backed key-value state cache. | @@ -649,7 +649,7 @@ sequenceDiagram GATOS (Ledger)->>Bus (Message Plane): 2. Publish Job message Worker->>Bus (Message Plane): 3. Subscribe to job topic Bus (Message Plane)->>Worker: 4. Receive Job message - Worker->>GATOS (Ledger): 5. Atomically create claim ref (update-ref zero→claim) + Worker->>GATOS (Ledger): 5. Atomically create claim ref (update-ref zero→claims/) GATOS (Ledger)-->>Worker: 6. Claim successful Worker->>Worker: 7. Execute Job Worker->>GATOS (Ledger): 8. Create Result commit @@ -663,8 +663,8 @@ sequenceDiagram -1. **Subscription:** The worker will use `gatos-mind` to subscribe to job topics. -2. **Claiming:** The worker will use `gatos-ledger` to atomically claim a job via compare-and-swap on a single lock ref `refs/gatos/jobs//claim`. +1. **Subscription:** The worker will use `gatos-message-plane` to subscribe to job topics. +2. **Claiming:** The worker will use `gatos-ledger` to atomically claim a job via compare-and-swap on `refs/gatos/jobs//claims/`. Policy rejects duplicate claims or unauthorized workers. 3. **Execution:** The worker will execute the job's `command` in a sandboxed environment. 4. **Result & Proof:** The worker will create a `Result` commit containing output artifacts and a `Proof-Of-Execution`. 5. **Lifecycle Management:** The worker will handle timeouts, retries, and failures. @@ -783,3 +783,23 @@ sequenceDiagram Ledger-->>Client: Pending (partial) end ``` +#### Message Plane RPC: `messages.read` + +The daemon exposes `messages.read` over the JSONL RPC channel so workers and bridges can page through commit-backed topics without parsing Git directly. + +- **Input envelope** + - `method`: `messages.read` + - `params`: + - `topic` (string, required) — logical topic name (e.g., `governance`). + - `since_ulid` (string, optional) — resume cursor; server starts from oldest when omitted or unknown. + - `limit` (integer, default 128) — max messages to return (1–512 enforced). +- **Output envelope** + - `messages`: ordered list (oldest→newest). Each entry contains `ulid`, `commit`, `content_id`, `envelope_path` (always `message/envelope.json` unless overridden), and `canonical_json` (base64 of the canonical envelope bytes). + - `next_since`: ULID to use for the next page (empty array when fewer than `limit` rows remain). + - `checkpoint_hint` (optional): `{ "group": , "topic": }` so automated consumers can persist progress without issuing a second RPC. +- **Errors** + - `topic_not_found` (404) — topic ref missing. + - `invalid_ulid` (400) — malformed resume cursor. + - `limit_out_of_range` (409) — `limit < 1`. + +`gatos-message-plane` is responsible for translating RPC calls to actual Git ref walks and enforcing ULID monotonicity per ADR-0005. diff --git a/docs/USE-CASES.md b/docs/USE-CASES.md index 2983d231..62a03a02 100644 --- a/docs/USE-CASES.md +++ b/docs/USE-CASES.md @@ -52,7 +52,7 @@ This document illustrates practical scenarios where GATOS provides unique value. | | | | ------------- | --------------------------------------------------------------------------------------- | | **Goal** | Multi-agent orchestration with exactly-once semantics and audit. | -| **How** | Git message bus (`refs/gatos/mbus/**`) with acks/commitments; capabilities gate topics. | +| **How** | Message Plane (`refs/gatos/messages//head` + `messages.read`) with at-least-once delivery and checkpoint refs; capabilities gate topics. | | **Why GATOS** | Works without Kafka; merges cleanly; persists forever. | --- @@ -76,7 +76,7 @@ This document illustrates practical scenarios where GATOS provides unique value. | | | | ------------- | ---------------------------------------------------------------------------------------- | | **Goal** | Version large models/datasets with provenance and selective export. | -| **How** | Opaque pointers for ciphertext artifacts; policies for export; epochs bound repo growth. | +| **How** | Opaque pointer envelopes (digest+location+capability) guard private artifacts; policies for export; epochs bound repo growth. | | **Why GATOS** | Portable archives; verifiable lineage; offline friendly. | --- diff --git a/docs/decisions/ADR-0001/flyingrobots.md b/docs/decisions/ADR-0001/flyingrobots.md index a189870a..720a1b11 100644 --- a/docs/decisions/ADR-0001/flyingrobots.md +++ b/docs/decisions/ADR-0001/flyingrobots.md @@ -24,7 +24,7 @@ Here’s a breakdown of how the new structure maps to the original goals: **Policy Plane**: This remains the clear responsibility of `gatos-policy`. -**Message Plane**: This is now the clear responsibility of `gatos-mind`. +**Message Plane**: This is now the clear responsibility of `gatos-message-plane`. ## It Strengthens the `no_std` and Portability Goal @@ -41,7 +41,7 @@ The most significant change is splitting `gatos-ledger` into `gatos-ledger-core` `gatos-core` has evolved into the more sophisticated `gatos-ledger-*` structure. -`gatos-bus` is now `gatos-mind`. +`gatos-bus` is now `gatos-message-plane`. `gatos-session` is now `gatos-echo`, clarifying its implementation with the deterministic DPO engine. diff --git a/docs/decisions/ADR-0005/DECISION.md b/docs/decisions/ADR-0005/DECISION.md index cd3cab37..8d3f9288 100644 --- a/docs/decisions/ADR-0005/DECISION.md +++ b/docs/decisions/ADR-0005/DECISION.md @@ -5,19 +5,19 @@ ADR: ADR-0005 Authors: [flyingrobots] Requires: [ADR-0001] Related: [ADR-0002, ADR-0003] -Tags: [Shiplog, Event Stream, Consumers] +Tags: [Message Plane, Message Bus, Consumers] Schemas: - - ../../../../schemas/v1/shiplog/event_envelope.schema.json - - ../../../../schemas/v1/shiplog/consumer_checkpoint.schema.json + - ../../../../schemas/v1/message-plane/event_envelope.schema.json + - ../../../../schemas/v1/message-plane/consumer_checkpoint.schema.json Supersedes: [] Superseded-By: [] --- -## ADR-0005: Shiplog — A Parallel, Queryable Event Stream +## ADR-0005: Message Plane — A Git-Native, Commit-Backed Message Bus ### Scope -Introduce a **first-class, append-only event stream** ("shiplog") that runs in parallel with snapshot state folds. Provide queryability, consumer checkpoints, and causal ordering for integrations. +Introduce a **first-class, append-only Message Plane** (commit-backed message bus) that runs in parallel with snapshot state folds. Provide queryability, consumer checkpoints, and causal ordering for integrations. ### Rationale @@ -34,31 +34,39 @@ Context: The origin convo proposed a dedicated, queryable append-only log. * **Description:** Use an external message queue as the primary event stream. * **Reason for Rejection:** This would introduce a significant external dependency, increasing operational complexity and cost. It would also move a critical piece of the system's data model outside of the core Git repository, potentially compromising the project's goal of being self-contained and Git-native. -* **3. No Shiplog (Consumers Parse Git History):** - * **Description:** Do not create a dedicated shiplog. Require consumers to parse the entire Git history of the main ledger to extract the events they need. - * **Reason for Rejection:** This would be highly inefficient and complex for consumers. It would require each consumer to implement its own logic for traversing the Git history, filtering commits, and managing its own state. A dedicated shiplog provides a much cleaner and more efficient integration surface. +* **3. No Message Plane (Consumers Parse Git History):** + * **Description:** Do not create a dedicated Message Plane. Require consumers to parse the entire Git history of the main ledger to extract the events they need. + * **Reason for Rejection:** This would be highly inefficient and complex for consumers. It would require each consumer to implement its own logic for traversing the Git history, filtering commits, and managing its own state. A dedicated Message Plane provides a much cleaner and more efficient integration surface. ### Decision -1. **Shiplog namespaces** +1. **Message Plane namespaces** -Each event in the shiplog corresponds to a single Git commit. The shiplog is organized into topics using the following ref structure: +Each message corresponds to a single Git commit. Topics are organized using the following ref structure: -refs/gatos/shiplog//head # commit parent-chain per topic +refs/gatos/messages//head # commit parent-chain per topic refs/gatos/consumers// # checkpoints (by ULID) -2. **Event envelope (normative)** -Canonical JSON with a ULID and canonical `content_id`. The `content_id` is the `blake3` hash of the canonical JSON of the event envelope itself. +2. **Message envelope & commit layout (normative)** +Each message commit MUST contain the envelope blob at `message/envelope.json` with no additional top-level files. Optional attachments MUST reside under `message/attachments/` and be referenced inside the envelope `refs` map. + +- `message/envelope.json` MUST be Canonical JSON (UTF-8, sorted keys, no insignificant whitespace) conforming to [`schemas/v1/message-plane/event_envelope.schema.json`](../../../../schemas/v1/message-plane/event_envelope.schema.json). +- Attachments MUST NOT influence the canonical identifier; only the envelope bytes are hashed. +- Clients MAY include detached metadata (e.g., transport headers), but the canonical commit identifiers are derived solely from the envelope blob. + +The `content_id` is the `blake3` hash of the canonical envelope bytes. + +Example envelope payload: { -“ulid”: “<26-char ULID>”, -“ns”: “”, # e.g., “governance” -“type”: “”, -“payload”: { … }, # canonical JSON -“refs”: { “state”: “blake3:…”, “proposal_id”: “blake3:…” } # OPTIONAL cross-refs +"ulid": "<26-char ULID>", +"ns": "", # e.g., "governance" +"type": "", +"payload": { ... }, # canonical JSON +"refs": { "state": "blake3:...", "proposal_id": "blake3:..." } # OPTIONAL cross-refs } -Each shiplog commit message MUST include: +Each Message Plane commit message MUST include: Event-Id: ulid: Content-Id: blake3: @@ -72,12 +80,32 @@ Content-Id: blake3: - Checkpoint value is the last processed `ulid` (and optionally commit). Storing the commit hash allows for faster lookups and can help resolve ordering if ULIDs are not strictly monotonic across distributed nodes. 5. **Queries** -- `gatos-mind` MUST support `shiplog.read(topic, since_ulid, limit)` returning canonical envelopes and commit ids, ordered by the Git parent chain (oldest to newest). If `since_ulid` is not found, the stream SHOULD start from the beginning of the topic. -- Bus bridge MAY mirror `shiplog` events onto message topics (configurable). +- `gatos-message-plane` MUST support `messages.read(topic, since_ulid, limit)` returning canonical envelopes and commit ids, ordered by the Git parent chain (oldest to newest). If `since_ulid` is not found, the stream SHOULD start from the beginning of the topic. +- Bus bridge MAY mirror Message Plane topics onto external brokers (configurable). + +`messages.read` contract (normative): + +- **Request:** + - `topic` — string. Required. Matches `` portion of `refs/gatos/messages//head`. + - `since_ulid` — optional ULID string. When absent, start from the oldest message. + - `limit` — integer 1–512 (inclusive). Servers MUST clamp >512 to 512. +- **Response:** JSON object with `messages: []` ordered oldest→newest. Each entry MUST include: + - `ulid` (string) — envelope ULID. + - `commit` (string) — Git OID of the message commit. + - `content_id` (string) — `blake3:` digest of the envelope bytes. + - `envelope_path` (string) — repository-relative path to `message/envelope.json` (default `message/envelope.json`). + - `canonical_json` (string) — base64-encoded canonical JSON bytes for clients that cannot read from Git directly. + - `checkpoint_hint` (object) — `{ "group": , "topic": }` when the server auto-advances checkpoints; MAY be `null` otherwise. +- **Errors:** + - `404 topic_not_found` when the requested ref does not exist. + - `400 invalid_ulid` when `since_ulid` is malformed. + - `409 range_exceeded` when `limit < 1`. + +Servers SHOULD include the newest `ulid` in the response metadata (`next_since`) so clients can resume without re-reading. 6. **Interaction with Ledger** -- Ledger events MAY be mirrored into shiplog automatically. -- Governance transitions (ADR‑0003) SHOULD emit shiplog events in the `governance` topic. +- Ledger events MAY be mirrored into the Message Plane automatically. +- Governance transitions (ADR‑0003) SHOULD emit Message Plane events in the `governance` topic. ### Consequences @@ -97,7 +125,7 @@ graph TD L1[Commit 1] --> L2[Commit 2] end - subgraph Shiplog (topic: governance) + subgraph Message Plane (topic: governance) S1[Event A
ulid: 01...A] --> S2[Event B
ulid: 01...B] end @@ -109,4 +137,4 @@ graph TD L2 -- "Mirrors event" --> S1 S2 -- "Processed by" --> C1 S1 -- "Processed by" --> C2 -``` \ No newline at end of file +``` diff --git a/docs/decisions/README.md b/docs/decisions/README.md index 8a6aad78..675ce33f 100644 --- a/docs/decisions/README.md +++ b/docs/decisions/README.md @@ -32,4 +32,4 @@ Each ADR will have a status, typically one of the following: | [ADR-0002](./ADR-0002/DECISION.md) | Distributed Compute via a Job Plane | Accepted | 2025-11-08 | | [ADR-0003](./ADR-0003/DECISION.md) | Consensus Governance for Gated Actions | Accepted | 2025-11-08 | | [ADR-0004](./ADR-0004/DECISION.md) | Hybrid Privacy Model (Public Projection + Private Overlay) | Accepted | 2025-11-09 | -| [ADR-0005](./ADR-0005/DECISION.md) | Shiplog — A Parallel, Queryable Event Stream | Proposed | 2025-11-09 | \ No newline at end of file +| [ADR-0005](./ADR-0005/DECISION.md) | Message Plane — A Git-Native, Commit-Backed Message Bus | Proposed | 2025-11-09 | diff --git a/docs/diagrams/architecture.md b/docs/diagrams/architecture.md index 7a910f80..876642f3 100644 --- a/docs/diagrams/architecture.md +++ b/docs/diagrams/architecture.md @@ -28,7 +28,7 @@ graph TD end subgraph "Message Plane" - Mind("gatos-mind"); + Mind("gatos-message-plane"); end subgraph "Job Plane" diff --git a/docs/diagrams/data_flow.md b/docs/diagrams/data_flow.md index ab6e3b65..76fcf7f8 100644 --- a/docs/diagrams/data_flow.md +++ b/docs/diagrams/data_flow.md @@ -13,7 +13,7 @@ sequenceDiagram participant Client participant Daemon as gatosd participant Ledger as gatos-ledger - participant Bus as gatos-mind + participant Bus as gatos-message-plane participant State as gatos-echo Client->>Daemon: 1. Enqueue Job (Event) diff --git a/docs/diagrams/generated/docs_SPEC__0679d036ea__mermaid_19.svg b/docs/diagrams/generated/docs_SPEC__0679d036ea__mermaid_19.svg index 2e9b35d3..c50cf3fc 100644 --- a/docs/diagrams/generated/docs_SPEC__0679d036ea__mermaid_19.svg +++ b/docs/diagrams/generated/docs_SPEC__0679d036ea__mermaid_19.svg @@ -1,2 +1,2 @@ -Workergatos-echogatos-mindgatos-ledgergatosdClientWorkergatos-echogatos-mindgatos-ledgergatosdClientLater, a worker consumes the job...A fold process runs...1. Enqueue Job (Event)2. Append `jobs.enqueue` event3. Success4. Publish `gmb.msg` to topic5. Success6. Job Enqueued7. Subscribe to topic8. Delivers `gmb.msg`9. Report Result (Event)10. Append `jobs.result` event11. Success12. Write `gmb.ack`13. Result Recorded14. Read events from journal15. Compute new state (e.g., update queue view)16. Checkpoint new state +Workergatos-echogatos-message-planegatos-ledgergatosdClientWorkergatos-echogatos-message-planegatos-ledgergatosdClientLater, a worker consumes the job...A fold process runs...1. Enqueue Job (Event)2. Append `jobs.enqueue` event3. Success4. Publish `gmb.msg` to topic5. Success6. Job Enqueued7. Subscribe to topic8. Delivers `gmb.msg`9. Report Result (Event)10. Append `jobs.result` event11. Success12. Write `gmb.ack`13. Result Recorded14. Read events from journal15. Compute new state (e.g., update queue view)16. Checkpoint new state diff --git a/docs/diagrams/generated/docs_SPEC__0679d036ea__mermaid_2.svg b/docs/diagrams/generated/docs_SPEC__0679d036ea__mermaid_2.svg index 5c589a50..fdb880d3 100644 --- a/docs/diagrams/generated/docs_SPEC__0679d036ea__mermaid_2.svg +++ b/docs/diagrams/generated/docs_SPEC__0679d036ea__mermaid_2.svg @@ -1,2 +1,2 @@ -
GATOS System
User / Client
Ledger Plane
Policy/Trust Plane
State Plane
Message Plane
Job Plane
gatosd (Daemon)
gatos-compute
gatos-mind
gatos-echo
gatos-kv
gatos-policy
gatos-ledger
git gatos (CLI)
Client SDK
+
GATOS System
User / Client
Ledger Plane
Policy/Trust Plane
State Plane
Message Plane
Job Plane
gatosd (Daemon)
gatos-compute
gatos-message-plane
gatos-echo
gatos-kv
gatos-policy
gatos-ledger
git gatos (CLI)
Client SDK
diff --git a/docs/diagrams/generated/docs_TECH-SPEC__15850d53f4__mermaid_1.svg b/docs/diagrams/generated/docs_TECH-SPEC__15850d53f4__mermaid_1.svg index d8bc3846..fd35ee00 100644 --- a/docs/diagrams/generated/docs_TECH-SPEC__15850d53f4__mermaid_1.svg +++ b/docs/diagrams/generated/docs_TECH-SPEC__15850d53f4__mermaid_1.svg @@ -1,2 +1,2 @@ -
gatos
gatos-ledger-core
crates
gatos-ledger-git
gatos-mind
gatos-echo
gatos-policy
gatos-kv
gatosd
gatos-compute
bindings
wasm
ffi
+
gatos
gatos-ledger-core
crates
gatos-ledger-git
gatos-message-plane
gatos-echo
gatos-policy
gatos-kv
gatosd
gatos-compute
bindings
wasm
ffi
diff --git a/docs/diagrams/generated/docs_TECH-SPEC__15850d53f4__mermaid_2.svg b/docs/diagrams/generated/docs_TECH-SPEC__15850d53f4__mermaid_2.svg index 76fb1743..6ff04f15 100644 --- a/docs/diagrams/generated/docs_TECH-SPEC__15850d53f4__mermaid_2.svg +++ b/docs/diagrams/generated/docs_TECH-SPEC__15850d53f4__mermaid_2.svg @@ -1,2 +1,2 @@ -
GATOS System
User / Client
Ledger Plane
Policy/Trust Plane
State Plane
Message Plane
Job Plane
gatosd (Daemon)
gatos-compute
gatos-mind
gatos-echo
gatos-kv
gatos-policy
gatos-ledger
git gatos (CLI)
Client SDK
+
GATOS System
User / Client
Ledger Plane
Policy/Trust Plane
State Plane
Message Plane
Job Plane
gatosd (Daemon)
gatos-compute
gatos-message-plane
gatos-echo
gatos-kv
gatos-policy
gatos-ledger
git gatos (CLI)
Client SDK
diff --git a/docs/diagrams/generated/docs__SPEC__mermaid_19.svg b/docs/diagrams/generated/docs__SPEC__mermaid_19.svg index ff976cdd..ff9f9ca5 100644 --- a/docs/diagrams/generated/docs__SPEC__mermaid_19.svg +++ b/docs/diagrams/generated/docs__SPEC__mermaid_19.svg @@ -1,2 +1,2 @@ -Workergatos-echogatos-mindgatos-ledgergatosdClientWorkergatos-echogatos-mindgatos-ledgergatosdClientLater, a worker consumes the job...A fold process runs...1. Enqueue Job (Event)2. Append `jobs.enqueue` event3. Success4. Publish `gmb.msg` to topic5. Success6. Job Enqueued7. Subscribe to topic8. Delivers `gmb.msg`9. Report Result (Event)10. Append `jobs.result` event11. Success12. Write `gmb.ack`13. Result Recorded14. Read events from journal15. Compute new state (e.g., update queue view)16. Checkpoint new state +Workergatos-echogatos-message-planegatos-ledgergatosdClientWorkergatos-echogatos-message-planegatos-ledgergatosdClientLater, a worker consumes the job...A fold process runs...1. Enqueue Job (Event)2. Append `jobs.enqueue` event3. Success4. Publish `gmb.msg` to topic5. Success6. Job Enqueued7. Subscribe to topic8. Delivers `gmb.msg`9. Report Result (Event)10. Append `jobs.result` event11. Success12. Write `gmb.ack`13. Result Recorded14. Read events from journal15. Compute new state (e.g., update queue view)16. Checkpoint new state diff --git a/docs/diagrams/generated/docs__SPEC__mermaid_2.svg b/docs/diagrams/generated/docs__SPEC__mermaid_2.svg index 0a022497..30207dee 100644 --- a/docs/diagrams/generated/docs__SPEC__mermaid_2.svg +++ b/docs/diagrams/generated/docs__SPEC__mermaid_2.svg @@ -1,2 +1,2 @@ -
GATOS System
User / Client
Policy Plane
State Plane
Message Plane
Job Plane
Ledger Plane
gatosd (Daemon)
gatos-ledger
gatos-compute
gatos-mind
gatos-echo
gatos-kv
gatos-policy
gatosd (CLI)
Client SDK
+
GATOS System
User / Client
Policy Plane
State Plane
Message Plane
Job Plane
Ledger Plane
gatosd (Daemon)
gatos-ledger
gatos-compute
gatos-message-plane
gatos-echo
gatos-kv
gatos-policy
gatosd (CLI)
Client SDK
diff --git a/docs/diagrams/generated/docs__TECH-SPEC__mermaid_1.svg b/docs/diagrams/generated/docs__TECH-SPEC__mermaid_1.svg index 8eab40f9..0e3a0fa5 100644 --- a/docs/diagrams/generated/docs__TECH-SPEC__mermaid_1.svg +++ b/docs/diagrams/generated/docs__TECH-SPEC__mermaid_1.svg @@ -1,2 +1,2 @@ -
gatos
gatos-ledger-core
crates
gatos-ledger-git
gatos-mind
gatos-echo
gatos-policy
gatos-kv
gatosd
gatos-compute
bindings
wasm
ffi
+
gatos
gatos-ledger-core
crates
gatos-ledger-git
gatos-message-plane
gatos-echo
gatos-policy
gatos-kv
gatosd
gatos-compute
bindings
wasm
ffi
diff --git a/docs/diagrams/generated/docs__TECH-SPEC__mermaid_2.svg b/docs/diagrams/generated/docs__TECH-SPEC__mermaid_2.svg index da82ffec..406224a2 100644 --- a/docs/diagrams/generated/docs__TECH-SPEC__mermaid_2.svg +++ b/docs/diagrams/generated/docs__TECH-SPEC__mermaid_2.svg @@ -1,2 +1,2 @@ -
GATOS System
User / Client
Policy Plane
State Plane
Message Plane
Job Plane
Ledger Plane
gatosd (Daemon)
gatos-ledger
gatos-compute
gatos-mind
gatos-echo
gatos-kv
gatos-policy
gatosd (CLI)
Client SDK
+
GATOS System
User / Client
Policy Plane
State Plane
Message Plane
Job Plane
Ledger Plane
gatosd (Daemon)
gatos-ledger
gatos-compute
gatos-message-plane
gatos-echo
gatos-kv
gatos-policy
gatosd (CLI)
Client SDK
diff --git a/docs/diagrams/generated/docs__diagrams__architecture__mermaid_1.svg b/docs/diagrams/generated/docs__diagrams__architecture__mermaid_1.svg index 49681e7f..41d4063b 100644 --- a/docs/diagrams/generated/docs__diagrams__architecture__mermaid_1.svg +++ b/docs/diagrams/generated/docs__diagrams__architecture__mermaid_1.svg @@ -1,2 +1,2 @@ -
GATOS System
User / Client
Policy Plane
State Plane
Message Plane
Job Plane
Ledger Plane
gatosd (Daemon)
gatos-ledger
gatos-compute
gatos-mind
gatos-echo
gatos-kv
gatos-policy
gatosd (CLI)
Client SDK
+
GATOS System
User / Client
Policy Plane
State Plane
Message Plane
Job Plane
Ledger Plane
gatosd (Daemon)
gatos-ledger
gatos-compute
gatos-message-plane
gatos-echo
gatos-kv
gatos-policy
gatosd (CLI)
Client SDK
diff --git a/docs/diagrams/generated/docs__diagrams__data_flow__mermaid_1.svg b/docs/diagrams/generated/docs__diagrams__data_flow__mermaid_1.svg index d74191af..3dfea291 100644 --- a/docs/diagrams/generated/docs__diagrams__data_flow__mermaid_1.svg +++ b/docs/diagrams/generated/docs__diagrams__data_flow__mermaid_1.svg @@ -1,2 +1,2 @@ -Workergatos-echogatos-mindgatos-ledgergatosdClientWorkergatos-echogatos-mindgatos-ledgergatosdClientLater, a worker consumes the job...A fold process runs...1. Enqueue Job (Event)2. Append `jobs.enqueue` event3. Success4. Publish `gmb.msg` to topic5. Success6. Job Enqueued7. Subscribe to topic8. Delivers `gmb.msg`9. Report Result (Event)10. Append `jobs.result` event11. Success12. Write `gmb.ack`13. Result Recorded14. Read events from journal15. Compute new state (e.g., update queue view)16. Checkpoint new state +Workergatos-echogatos-message-planegatos-ledgergatosdClientWorkergatos-echogatos-message-planegatos-ledgergatosdClientLater, a worker consumes the job...A fold process runs...1. Enqueue Job (Event)2. Append `jobs.enqueue` event3. Success4. Publish `gmb.msg` to topic5. Success6. Job Enqueued7. Subscribe to topic8. Delivers `gmb.msg`9. Report Result (Event)10. Append `jobs.result` event11. Success12. Write `gmb.ack`13. Result Recorded14. Read events from journal15. Compute new state (e.g., update queue view)16. Checkpoint new state diff --git a/docs/diagrams/generated/docs_diagrams_architecture__105fc24d87__mermaid_1.svg b/docs/diagrams/generated/docs_diagrams_architecture__105fc24d87__mermaid_1.svg index e3521b0a..a22b6729 100644 --- a/docs/diagrams/generated/docs_diagrams_architecture__105fc24d87__mermaid_1.svg +++ b/docs/diagrams/generated/docs_diagrams_architecture__105fc24d87__mermaid_1.svg @@ -1,2 +1,2 @@ -
GATOS System
User / Client
Ledger Plane
Policy/Trust Plane
State Plane
Message Plane
Job Plane
gatosd (Daemon)
gatos-compute
gatos-mind
gatos-echo
gatos-kv
gatos-policy
gatos-ledger
git gatos (CLI)
Client SDK
+
GATOS System
User / Client
Ledger Plane
Policy/Trust Plane
State Plane
Message Plane
Job Plane
gatosd (Daemon)
gatos-compute
gatos-message-plane
gatos-echo
gatos-kv
gatos-policy
gatos-ledger
git gatos (CLI)
Client SDK
diff --git a/docs/diagrams/generated/docs_diagrams_data_flow__559f0c180d__mermaid_1.svg b/docs/diagrams/generated/docs_diagrams_data_flow__559f0c180d__mermaid_1.svg index b06d72ef..c7c9005c 100644 --- a/docs/diagrams/generated/docs_diagrams_data_flow__559f0c180d__mermaid_1.svg +++ b/docs/diagrams/generated/docs_diagrams_data_flow__559f0c180d__mermaid_1.svg @@ -1,2 +1,2 @@ -Workergatos-echogatos-mindgatos-ledgergatosdClientWorkergatos-echogatos-mindgatos-ledgergatosdClientLater, a worker consumes the job...A fold process runs...1. Enqueue Job (Event)2. Append `jobs.enqueue` event3. Success4. Publish `gmb.msg` to topic5. Success6. Job Enqueued7. Subscribe to topic8. Delivers `gmb.msg`9. Report Result (Event)10. Append `jobs.result` event11. Success12. Write `gmb.ack`13. Result Recorded14. Read events from journal15. Compute new state (e.g., update queue view)16. Checkpoint new state +Workergatos-echogatos-message-planegatos-ledgergatosdClientWorkergatos-echogatos-message-planegatos-ledgergatosdClientLater, a worker consumes the job...A fold process runs...1. Enqueue Job (Event)2. Append `jobs.enqueue` event3. Success4. Publish `gmb.msg` to topic5. Success6. Job Enqueued7. Subscribe to topic8. Delivers `gmb.msg`9. Report Result (Event)10. Append `jobs.result` event11. Success12. Write `gmb.ack`13. Result Recorded14. Read events from journal15. Compute new state (e.g., update queue view)16. Checkpoint new state diff --git a/docs/diagrams/generated/docs_guide_CHAPTER-001__d51d557e71__mermaid_1.svg b/docs/diagrams/generated/docs_guide_CHAPTER-001__d51d557e71__mermaid_1.svg index b59799f0..95957afe 100644 --- a/docs/diagrams/generated/docs_guide_CHAPTER-001__d51d557e71__mermaid_1.svg +++ b/docs/diagrams/generated/docs_guide_CHAPTER-001__d51d557e71__mermaid_1.svg @@ -1,2 +1,2 @@ -
GATOS System
User / Client
Ledger Plane
Policy/Trust Plane
State Plane
Message Plane
Job Plane
gatosd (Daemon)
gatos-compute
gatos-mind
gatos-echo
gatos-kv
gatos-policy
gatos-ledger
git gatos (CLI)
Client SDK
+
GATOS System
User / Client
Ledger Plane
Policy/Trust Plane
State Plane
Message Plane
Job Plane
gatosd (Daemon)
gatos-compute
gatos-message-plane
gatos-echo
gatos-kv
gatos-policy
gatos-ledger
git gatos (CLI)
Client SDK
diff --git a/docs/diagrams/generated/docs_guide_README__5e9e6b7c1c__mermaid_1.svg b/docs/diagrams/generated/docs_guide_README__5e9e6b7c1c__mermaid_1.svg index b40c605f..00516708 100644 --- a/docs/diagrams/generated/docs_guide_README__5e9e6b7c1c__mermaid_1.svg +++ b/docs/diagrams/generated/docs_guide_README__5e9e6b7c1c__mermaid_1.svg @@ -1,2 +1,2 @@ -
GATOS System
User / Client
Ledger Plane
Policy/Trust Plane
State Plane
Message Plane
Job Plane
gatosd (Daemon)
gatos-compute
gatos-mind
gatos-echo
gatos-kv
gatos-policy
gatos-ledger
git gatos (CLI)
Client SDK
+
GATOS System
User / Client
Ledger Plane
Policy/Trust Plane
State Plane
Message Plane
Job Plane
gatosd (Daemon)
gatos-compute
gatos-message-plane
gatos-echo
gatos-kv
gatos-policy
gatos-ledger
git gatos (CLI)
Client SDK
diff --git a/docs/guide/CHAPTER-001.md b/docs/guide/CHAPTER-001.md index d9add54e..9190e353 100644 --- a/docs/guide/CHAPTER-001.md +++ b/docs/guide/CHAPTER-001.md @@ -47,7 +47,7 @@ graph TD end subgraph "Message Plane" - Mind("gatos-mind"); + Mind("gatos-message-plane"); end subgraph "Job Plane" @@ -101,7 +101,7 @@ graph TD See \[\[\[\[\[\[[SPEC §5.4](/SPEC#5.4)]\(/SPEC#5.4)]\(/SPEC#5.4)]\(/SPEC#5.4)]\(/SPEC#5.4)]\(/SPEC#5.4) — Proof-of-Fold (PoF)]\(../SPEC.md#5.4-proof-of-fold) for the formal verification link between ledger windows, policy roots, and state checkpoints. -4. **The Message Plane (`gatos-mind`)** +4. **The Message Plane (`gatos-message-plane`)** This plane provides a commit-backed, asynchronous publish/subscribe message bus. It allows different parts of the GATOS system, as well as external agents, to communicate reliably. For example, when a new job is scheduled in the Job Plane, a message is published to a topic on the message bus, allowing available workers to discover and claim the job. @@ -136,7 +136,7 @@ The following chapters will explore each of these planes in greater detail, show - `refs/gatos/journal//` — append-only event journals (FF-only). - `refs/gatos/state/` — deterministic state checkpoints. -- `refs/gatos/mbus//` — message topics (pub/sub). +- `refs/gatos/messages//head` — Message Plane topics (pub/sub). - `refs/gatos/jobs/` — job artifacts (claim/result). - `refs/gatos/proposals|approvals|grants|revocations` — governance. - `refs/gatos/audit/**` — audit decisions and proofs. diff --git a/docs/guide/CHAPTER-006.md b/docs/guide/CHAPTER-006.md index fc1eb8e9..76d2e131 100644 --- a/docs/guide/CHAPTER-006.md +++ b/docs/guide/CHAPTER-006.md @@ -32,7 +32,7 @@ While the Ledger, State, and Policy planes provide the core foundation for a ver -The Message Plane, managed by the **`gatos-mind`** crate, provides a reliable, asynchronous publish/subscribe message bus built directly on Git. It serves as the central nervous system for GATOS. +The Message Plane, managed by the **`gatos-message-plane`** crate, provides a reliable, asynchronous publish/subscribe message bus built directly on Git. It serves as the central nervous system for GATOS. ### How It Works @@ -42,10 +42,10 @@ The Message Plane, managed by the **`gatos-mind`** crate, provides a reliable, a -1. **Topics as Refs:** Each message topic is a Git ref under the `refs/gatos/mbus//` namespace. -2. **Messages as Commits:** When a publisher sends a message to a topic, `gatos-mind` creates a new commit on the topic ref. The message payload is stored in the commit. -3. **Consumption:** Subscribers `git fetch` the topic refs to discover new messages. -4. **Acknowledgements:** Delivery is **at-least-once**. Use idempotency keys (the message `ULID`) and dedupe on read. Consumers write an `ack` commit to a corresponding `refs/gatos/mbus-ack/` ref; the system can then observe that a message has been processed by a quorum before considering it “done.” +1. **Topics as Refs:** Each message topic lives at `refs/gatos/messages//head`, optionally fanned out into dated segments (`refs/gatos/messages////
/`). +2. **Messages as Commits:** Publishing writes a commit containing `message/envelope.json` plus optional attachments; the canonical envelope follows [`schemas/v1/message-plane/event_envelope.schema.json`]. +3. **Consumption:** Subscribers either `git fetch` or call the `messages.read` RPC to stream canonical envelopes (ULID + commit id + `content_id`). +4. **Checkpoints:** Delivery is **at-least-once**. Consumers dedupe by ULID and persist checkpoints under `refs/gatos/consumers//`, so crashed workers can resume deterministically. This Git-native approach provides a message bus that is: @@ -92,7 +92,7 @@ sequenceDiagram 1. **Scheduling:** A job is scheduled by writing a `jobs.enqueue` event to the Ledger Plane. This event contains a manifest describing the work to be done (e.g., a command to run, input data). 2. **Discovery:** A message is published to a topic on the Message Plane (e.g., `gatos.jobs.pending`), announcing the new job. -3. **Claiming:** A `gatos-compute` worker, subscribed to the topic, discovers the job. It then performs an atomic compare-and-swap on a single lock ref `refs/gatos/jobs//claim` using `git update-ref `. The winner (who observed the zero OID) writes its `worker_id` into the claim object, preventing other workers from executing it. +3. **Claiming:** A `gatos-compute` worker, subscribed to the topic, discovers the job. It performs an atomic compare-and-swap on `refs/gatos/jobs//claims/` (expected old = `000..0`). Policy enforces that only the winning worker’s claim is recognized; losers receive a deterministic deny and must retry/back off. 4. **Execution:** The worker executes the job's `command` in a sandboxed environment. 5. **Result:** Upon completion, the worker creates a `jobs.result` event and commits it to the ledger. This event includes the job's output, exit status, and, crucially, a **Proof-of-Execution (PoE)**. @@ -132,11 +132,11 @@ Storage: `refs/gatos/jobs//result` (commit whose tree contains the resul -- Segment topics: `refs/gatos/mbus////
/` (or numeric `0001`, `0002`, …) to bound ref sizes. +- Segment topics: `refs/gatos/messages////
/` (or numeric `0001`, `0002`, …) to bound ref sizes. - Rotation thresholds (defaults): rotate at 100k messages or \~192 MB per segment (whichever comes first). - TTL: retain segments for 30 days, then prune; write a summary commit (counts, Merkle root, last offsets) when pruning to preserve verifiability. - Offsets: snapshot consumer offsets; prune only segments older than the minimum acknowledged offset across active consumers. -- Git optimization: enable `fetch.writeCommitGraph=true`, `repack.writeBitmaps=true`; consider partial clone/promisor remotes for `refs/gatos/mbus/*` on busy installations. +- Git optimization: enable `fetch.writeCommitGraph=true`, `repack.writeBitmaps=true`; consider partial clone/promisor remotes for `refs/gatos/messages/*` on busy installations. ## Summary @@ -146,6 +146,6 @@ Storage: `refs/gatos/jobs//result` (commit whose tree contains the resul -The Message and Job planes are what make GATOS a dynamic, living system. `gatos-mind` provides the nervous system, allowing for reliable, auditable communication. `gatos-compute` provides the motor function, enabling the system to perform work in a distributed and verifiable way. +The Message and Job planes are what make GATOS a dynamic, living system. `gatos-message-plane` provides the nervous system, allowing for reliable, auditable communication. `gatos-compute` provides the motor function, enabling the system to perform work in a distributed and verifiable way. Together, they transform the GATOS repository from a passive record of history into an active, programmable "Operating Surface" that can orchestrate complex, distributed workflows with an unprecedented level of trust and transparency. diff --git a/docs/guide/CHAPTER-007.md b/docs/guide/CHAPTER-007.md index 61a2b1c3..20347ca7 100644 --- a/docs/guide/CHAPTER-007.md +++ b/docs/guide/CHAPTER-007.md @@ -35,7 +35,7 @@ GATOS is designed as a distributed system where each Git repository is a self-co A GATOS **federation** is a network of independent GATOS repositories that have agreed to share some portion of their state or policy. For example, a central "governance" repository could define policies that are consumed by dozens of "project" repositories. -This is achieved through the Message Plane (`gatos-mind`) and the State Plane's ability to read from multiple sources. A project repository can subscribe to the `gatos.policy.updated` topic on the governance repository. When a new policy is published, the project node can fetch it, validate it, and incorporate it into its own local policy engine. +This is achieved through the Message Plane (`gatos-message-plane`) and the State Plane's ability to read from multiple sources. A project repository can subscribe to the `gatos.policy.updated` topic on the governance repository. When a new policy is published, the project node can fetch it, validate it, and incorporate it into its own local policy engine. ## The Challenge: Merging Divergent Realities diff --git a/docs/guide/CHAPTER-010.md b/docs/guide/CHAPTER-010.md index 90eb136e..08af0f4e 100644 --- a/docs/guide/CHAPTER-010.md +++ b/docs/guide/CHAPTER-010.md @@ -92,8 +92,8 @@ Instead of storing the data directly in a Git blob, the system stores a small po graph TD subgraph "GATOS Repository (In-Repo History)" A[Opaque Pointer in Git] - A -- Contains --> C(Ciphertext Hash); - A -- Contains --> D(Encrypted Meta); + A -- Contains --> C(Plaintext Digest); + A -- Contains --> D(Location & Capability URIs); end subgraph "External Blob Store" @@ -104,8 +104,9 @@ graph TD ``` - The actual data is encrypted and stored in a separate, **content-addressed blob store** (which could be anything from a local directory to a cloud storage bucket). -- The **`ciphertext_hash`** is the hash of the encrypted data, allowing for integrity checks. -- The **encrypted meta** contains the information an authorized user needs to decrypt the data (e.g., a reference to a key stored in a KMS, the encryption algorithm used, and the plaintext hash/commitment). The public pointer MUST NOT reveal a raw plaintext hash to avoid dictionary attacks. Use a hiding commitment in the public pointer if needed. +- The pointer stores the **plaintext digest** (`digest = blake3(plaintext)`), proving integrity without revealing the bytes. +- The pointer also records **where to fetch** (`location`, e.g., `gatos-node://ed25519:…`, `https://…`, `s3://…`, `ipfs://…`) and **how to authorize/decrypt** (`capability`, e.g., `gatos-key://`, `kms://`, `age://`). +- Because only the digest and URIs live in Git, private data never enters the public repo. Policy decides which fields (if any) are low-entropy enough to remain plain `blobptr`s; everything sensitive becomes an `opaque_pointer` envelope. ### Verifiable Folds on Private Data @@ -115,7 +116,7 @@ graph TD -Authorized workers can fetch the encrypted blob, decrypt it, verify that the recovered plaintext hash (from encrypted meta) matches expectations, perform a computation, and then produce a new encrypted blob and a new Opaque Pointer. +Authorized workers fetch the encrypted blob via `location`, use the declared `capability` to obtain or derive a key, decrypt, and verify that `blake3(plaintext)` equals the pointer’s `digest`. They can then compute, persist a new blob, and commit a new `opaque_pointer` envelope. If the computation is deterministic, the new plaintext hash will be the same for any authorized worker who performs the same operation. This allows the `state_root` of the system to be updated deterministically, even though the actual data remains private and outside the repository. diff --git a/docs/guide/CHAPTER-011.md b/docs/guide/CHAPTER-011.md index a024e50a..f4225a63 100644 --- a/docs/guide/CHAPTER-011.md +++ b/docs/guide/CHAPTER-011.md @@ -148,7 +148,7 @@ git config --global repack.writeBitmaps true git config --global repack.packKeptObjects false ``` -Consider partial clone and promisor remotes for `refs/gatos/mbus/*`. +Consider partial clone and promisor remotes for `refs/gatos/messages/*`. #### Cache Invalidation Strategy diff --git a/docs/guide/HELLO-PRIVACY.md b/docs/guide/HELLO-PRIVACY.md index cad0afed..b5e1da4a 100644 --- a/docs/guide/HELLO-PRIVACY.md +++ b/docs/guide/HELLO-PRIVACY.md @@ -26,7 +26,7 @@ This walkthrough demonstrates the hybrid privacy model: creating an opaque point > > - Public state: pushable materialized state. > - Private overlay: encrypted blobs addressed via opaque pointers. -> - Public pointers MUST NOT reveal a plaintext hash; store it inside encrypted meta (or use a hiding commitment). +> - Pointers are canonical JSON envelopes with `kind: "opaque_pointer"`, plaintext `digest`, and URIs for `location` + `capability`. ## 0. Prepare a Private Blob @@ -71,10 +71,12 @@ A pointer contains at least: ```json { - "kind": "opaque", + "kind": "opaque_pointer", "algo": "blake3", - "ciphertext_hash": "blake3:", - "encrypted_meta": "base64:..." // contains plaintext commitment, KMS refs, cipher params + "digest": "blake3:", + "size": 4096, + "location": "gatos-node://ed25519:", + "capability": "gatos-key://v1/aes-256-gcm/" } ``` @@ -85,7 +87,7 @@ git gatos event add --ns privacy --type demo.pointer --payload @pointers/secret. git gatos fold --ns privacy ``` -`State-Root` is computed deterministically from the public shape. Authorized workers can decrypt and verify the plaintext commitment from `encrypted_meta` outside the repository as needed. +`State-Root` is computed deterministically from the public shape. Authorized workers fetch via `location`, redeem the `capability`, decrypt, and verify that `blake3(plaintext) == digest` outside the repository as needed. ## 2. Rekey the Blob diff --git a/docs/guide/README.md b/docs/guide/README.md index f595567c..61922c2e 100644 --- a/docs/guide/README.md +++ b/docs/guide/README.md @@ -85,7 +85,7 @@ graph TD end subgraph "Message Plane" - Mind("gatos-mind"); + Mind("gatos-message-plane"); end subgraph "Job Plane" @@ -248,7 +248,7 @@ See the full step-by-step guides: - **Objective:** - To cover the components that enable GATOS to orchestrate communication and asynchronous tasks in a distributed environment. - **Key Concepts:** - - Commit-Backed Message Bus (`gatos-mind`) + - Commit-Backed Message Bus (`gatos-message-plane`) - Pub/Sub on Git Refs - Job Lifecycle (`gatos-compute`) - Proof-of-Execution (PoE) diff --git a/docs/opaque-pointers.md b/docs/opaque-pointers.md index e3f37686..160b477c 100644 --- a/docs/opaque-pointers.md +++ b/docs/opaque-pointers.md @@ -33,12 +33,13 @@ Opaque Pointers allow public verification with private bytes. -See \[\[\[\[\[[SPEC §7](/SPEC#7)]\(/SPEC#7)]\(/SPEC#7)]\(/SPEC#7)]\(/SPEC#7)]\(/SPEC#7) and Research Profile §12.1. +See [[[[[SPEC §7](/SPEC#7)](/SPEC#7)](/SPEC#7)](/SPEC#7)](/SPEC#7) and Research Profile §12.1. -- Low-entropy classes: public pointers **MUST NOT** expose plaintext digests. -- Public pointers **MUST** include a `ciphertext_digest`. -- Pointer `size` **SHOULD** be bucketed (e.g., 1 KB, 4 KB, 16 KB, 64 KB). -- Plaintext commitments, if needed, MUST be hidden (stored in `encrypted_meta`). +- Pointer envelopes **MUST** use Canonical JSON with `kind: "opaque_pointer"`, `algo: "blake3"`, `digest: blake3:`, optional bucketed `size`, and URIs for `location` + `capability`. +- `digest` is the BLAKE3 hash of the raw plaintext blob. Plaintext bytes never enter Git; digest collisions or dictionary attacks are mitigated by policy (use opaque pointers for any low-entropy data). +- `location` declares where to fetch the blob (`gatos-node://`, `https://`, `s3://`, `ipfs://`, `file:///` dev/test). `capability` declares how to authorize/decrypt (`gatos-key://`, `kms://`, `age://`, etc.). +- Pointer `size` metadata **SHOULD** use coarse buckets (e.g., 1 KB, 4 KB, 16 KB, 64 KB) to avoid leaking exact lengths. +- Envelope `content_id = blake3(canonical_bytes)` and MUST be stable across projections. ## Availability & Resolver @@ -51,23 +52,11 @@ See \[\[\[\[\[[SPEC §7](/SPEC#7)]\(/SPEC#7)]\(/SPEC#7)]\(/SPEC#7)]\(/SPEC#7)]\( -Resolvers serve private bytes to authorized clients. +Resolvers serve private bytes to authorized clients. They MUST: -Headers: - -- `Digest: blake3=` -- `X-BLAKE3-Digest: ` (duplicate for intermediaries) - -Auth (normative default): Bearer JWT; log decisions under `refs/gatos/audit/`. - -Required JWT claims: - -- `sub` — subject (requesting principal) -- `aud` — audience (resolver/repo id) -- `exp` — expiry (short-lived) -- Optional `scope` — dataset/namespace scope - -Optional extensions: HTTP Message Signatures and/or mTLS may be supported; they are not required for the Research profile. +1. Parse the `location` URI to determine how to fetch the encrypted blob. For `gatos-node://ed25519:`, resolve the node via the trust graph and fetch `GET /.well-known/gatos/private/{digest}`. For HTTP/S3/IPFS/file URIs, use the obvious client. +2. Require authorization via the declared `capability` (default profile: Bearer JWT). Log every decision under `refs/gatos/audit/` and enforce JWT claims (`sub`, `aud`, `exp`, optional `scope`). Other schemes like KMS, AGE, or HTTP message signatures MAY be layered on. +3. After decrypting, compute `blake3(plaintext)` and compare against the pointer’s `digest`. Respond with `Digest: blake3=`/`X-BLAKE3-Digest: ` headers so clients can double-check integrity. Resolution MUST fail on any mismatch. ## Projection Determinism diff --git a/docs/opaque-resolver.md b/docs/opaque-resolver.md index 9a58f53b..65468515 100644 --- a/docs/opaque-resolver.md +++ b/docs/opaque-resolver.md @@ -29,10 +29,13 @@ Normative default: Bearer JWT authentication; requests and decisions are audited +Resolvers MUST expose a `.well-known` endpoint for the `gatos-node://` scheme defined in ADR-0004: + ```http -GET /resolve// +GET /.well-known/gatos/private/ Authorization: Bearer Accept: application/octet-stream +Capability: gatos-key://v1/aes-256-gcm/ # OPTIONAL when policy demands it ``` JWT claims: @@ -41,6 +44,7 @@ JWT claims: - `aud` — audience (resolver/repo id) - `exp` — expiry (short-lived) - Optional `scope` — dataset/namespace scope +- Optional `cap` — capability URI assertion (mirrors the pointer’s `capability`). ## Response @@ -57,6 +61,8 @@ X-BLAKE3-Digest: Content-Type: application/octet-stream + +Servers MUST recompute `blake3(plaintext)` and only return success when the digest matches the pointer envelope. Clients double-check using the `Digest` header. ``` ## Audit diff --git a/docs/research-profile.md b/docs/research-profile.md index 6724d257..1628dcdf 100644 --- a/docs/research-profile.md +++ b/docs/research-profile.md @@ -25,9 +25,9 @@ These are sensible, proof-first defaults for scientific and high-assurance setup - `refs/gatos/policies/**`, `refs/gatos/state/**`, `refs/gatos/audit/**` are FF-only. - Enforced by repo policy; violations are denied and logged to audit. -- Message bus QoS and retention - - Topics under `refs/gatos/mbus//` rotate at \~100k messages or \~192 MB per shard. - - TTL ≈ 30 days (configurable per topic). Delivery is at-least-once; consumers dedupe by idempotency key (ULID). +- Message Plane QoS and retention + - Topics under `refs/gatos/messages//head` (and optional shard refs) rotate at \~100k messages or \~192 MB per shard. + - TTL ≈ 30 days (configurable per topic). Delivery is at-least-once; consumers dedupe by ULID and persist checkpoints under `refs/gatos/consumers//`. - Audit anchors and GC - Audit proofs and the latest state checkpoints act as retention anchors for Git GC. @@ -35,6 +35,7 @@ These are sensible, proof-first defaults for scientific and high-assurance setup - Opaque pointers and privacy - Public commitments are recorded in history; private bytes live behind a policy-gated resolver. + - Pointers follow the ADR-0004 schema (`kind/algo/digest/size/location/capability`) with bucketed sizes and digest = `blake3(plaintext)`. - Policies may require zero-knowledge proofs, redacted attestations, or external KMS checks. Adjust these with your IRB/compliance requirements; treat them as a starting point for a verifiable research workflow. @@ -53,4 +54,4 @@ Create `gatos/config/profile.yaml` with: profile: research ``` -Then restart `gatosd` (or re-run your commands). Gates will enforce the stricter invariants described above (PoF required on state pushes; FF-only refs; mbus rotation/TTL; pointer privacy buckets; audit anchors). +Then restart `gatosd` (or re-run your commands). Gates will enforce the stricter invariants described above (PoF required on state pushes; FF-only refs; Message Plane rotation/TTL; pointer privacy buckets; audit anchors). diff --git a/schemas/README.md b/schemas/README.md index cb398c8a..9b28cc84 100644 --- a/schemas/README.md +++ b/schemas/README.md @@ -32,3 +32,9 @@ Time values - Integer `ttl` in governance policy is specified in seconds. - String `ttl` and `timeout` values use ISO 8601 duration syntax (e.g., `PT30S`, `PT5M`, `P1DT2H`). + +Message Plane envelopes + +- Envelopes live in `schemas/v1/message-plane/` and describe commits written under `refs/gatos/messages//head`. +- Every message commit MUST contain a `message/envelope.json` blob that validates against `event_envelope.schema.json` and is serialized as Canonical JSON (UTF-8, sorted keys, no insignificant whitespace). +- Optional attachments are stored under `message/attachments/` and referenced via logical names in the envelope `refs` map; attachments never influence the canonical `content_id`. diff --git a/schemas/v1/shiplog/consumer_checkpoint.schema.json b/schemas/v1/message-plane/consumer_checkpoint.schema.json similarity index 69% rename from schemas/v1/shiplog/consumer_checkpoint.schema.json rename to schemas/v1/message-plane/consumer_checkpoint.schema.json index 79bb3e14..cb212793 100644 --- a/schemas/v1/shiplog/consumer_checkpoint.schema.json +++ b/schemas/v1/message-plane/consumer_checkpoint.schema.json @@ -1,8 +1,8 @@ { "$schema": "http://json-schema.org/draft-07/schema#", - "$id": "https://gatos.io/schemas/v1/shiplog/consumer_checkpoint.schema.json", - "title": "Shiplog Consumer Checkpoint", - "description": "Schema for a shiplog consumer checkpoint.", + "$id": "https://gatos.io/schemas/v1/message-plane/consumer_checkpoint.schema.json", + "title": "Message Plane Consumer Checkpoint", + "description": "Schema for a Message Plane consumer checkpoint.", "type": "object", "required": [ "ulid" diff --git a/schemas/v1/shiplog/event_envelope.schema.json b/schemas/v1/message-plane/event_envelope.schema.json similarity index 86% rename from schemas/v1/shiplog/event_envelope.schema.json rename to schemas/v1/message-plane/event_envelope.schema.json index 6ef1c649..7b585a4b 100644 --- a/schemas/v1/shiplog/event_envelope.schema.json +++ b/schemas/v1/message-plane/event_envelope.schema.json @@ -1,8 +1,8 @@ { "$schema": "http://json-schema.org/draft-07/schema#", - "$id": "https://gatos.io/schemas/v1/shiplog/event_envelope.schema.json", - "title": "Shiplog Event Envelope", - "description": "Schema for a shiplog event envelope.", + "$id": "https://gatos.io/schemas/v1/message-plane/event_envelope.schema.json", + "title": "Message Plane Envelope", + "description": "Schema for a Message Plane event envelope.", "type": "object", "required": [ "ulid", From 13b55cf02fb9038246ba2eba6aad0f2cdc8c58b3 Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Sun, 16 Nov 2025 23:03:48 -0800 Subject: [PATCH 06/25] Add ADR-0006, sweep ADR-0005 --- crates/gatos-message-plane/README.md | 12 +-- docs/FAQ.md | 8 +- docs/FEATURES.md | 30 +++--- docs/SPEC.md | 35 ++++--- docs/TECH-SPEC.md | 2 +- docs/decisions/ADR-0006/DECISION.md | 98 +++++++++++++++++++ docs/decisions/README.md | 1 + docs/diagrams/data_flow.md | 12 +-- .../docs_SPEC__0679d036ea__mermaid_19.svg | 2 +- .../docs_SPEC__0679d036ea__mermaid_9.svg | 2 +- .../generated/docs__SPEC__mermaid_19.svg | 2 +- .../generated/docs__SPEC__mermaid_9.svg | 2 +- .../docs__diagrams__data_flow__mermaid_1.svg | 2 +- ...grams_data_flow__559f0c180d__mermaid_1.svg | 2 +- docs/guide/CHAPTER-006.md | 2 +- 15 files changed, 160 insertions(+), 52 deletions(-) create mode 100644 docs/decisions/ADR-0006/DECISION.md diff --git a/crates/gatos-message-plane/README.md b/crates/gatos-message-plane/README.md index f263fb98..c4f75c91 100644 --- a/crates/gatos-message-plane/README.md +++ b/crates/gatos-message-plane/README.md @@ -9,16 +9,14 @@ distributed communication between GATOS components. > details, see [TECH-SPEC.md](../../docs/TECH-SPEC.md). Commit-backed means messages are persisted as Git commits to provide durability, auditability, and -exactly-once semantics when combined with acknowledgements/commitments. See the architecture notes -in [ADR-0001](../../docs/decisions/ADR-0001/DECISION.md) and protocol details in +at-least-once delivery with deterministic replay via ULID checkpoints. See the architecture notes in +[ADR-0005](../../docs/decisions/ADR-0005/DECISION.md) and protocol details in [TECH-SPEC.md](../../docs/TECH-SPEC.md). ## Features - Asynchronous messaging: non-blocking publish/subscribe operations. -- Commit-backed durability: persisted messages with auditability and exactly-once when combined - - with acks/commitments. +- Commit-backed durability: persisted messages with canonical envelopes (`message/envelope.json`). - Topic-based routing: logical message organization and filtering. - Sharding: horizontal scalability via topic partitioning. @@ -55,8 +53,8 @@ GMP is the Message Plane in the GATOS hexagonal architecture. It coordinates mes ### Usage (API Sketch) - Depend on `gatos-message-plane` in your crate. -- Use a `Publisher` to publish messages to a topic; use a `Subscriber` to consume. -- Messages are persisted as Git commits to provide auditability and coordinate exactly-once when combined with acknowledgements/commitments. +- Use a `Publisher` to append canonical envelopes under `refs/gatos/messages//head`; use a `Subscriber` (or the `messages.read` RPC) to stream them oldest→newest. +- Messages are persisted as Git commits and consumers store checkpoints in `refs/gatos/consumers//` so crashes can resume without duplication. > Note: This section reflects the intended usage; concrete APIs will be added as implementation proceeds. diff --git a/docs/FAQ.md b/docs/FAQ.md index 896bccc6..bae0dd40 100644 --- a/docs/FAQ.md +++ b/docs/FAQ.md @@ -318,9 +318,9 @@ Consistent hashing keeps most keys stable when shards changes. Dual-write window: - Publishers write to both old and new shard maps for a configurable epoch. -- Consumers subscribe to both maps; dedupe by (topic, ulid). +- Consumers read from both shard maps via `messages.read` and persist checkpoints per topic/segment; dedupe by `(topic, ulid)`. -When ack lag on the old map is zero for N minutes, flip the active version and retire the old. +When checkpoint lag on the old map is zero for N minutes, flip the active version and retire the old. This gives smooth resharding with exactly-once semantics intact. @@ -690,8 +690,8 @@ Grant chain fields (prev, revokes) and a rotation checklist in spec. -- Needs: pub/sub, exactly-once, backpressure, capability tokens. -- We meet: bus QoS + caps + acks/commit. +- Needs: pub/sub, at-least-once with deterministic replay, capability tokens. +- We meet: commit-backed Message Plane + capability-scoped topics + checkpoints (refs/gatos/consumers/*) for resume/dedupe. - Add: shard-map/versioning + subscription windows → ✅ at scale. ### 5. Cross-app data sharing (RLS-gated state) diff --git a/docs/FEATURES.md b/docs/FEATURES.md index da7b2783..c62c5c84 100644 --- a/docs/FEATURES.md +++ b/docs/FEATURES.md @@ -36,7 +36,7 @@ anchors/TOC stable and idempotent. - [F4-US-PENG](#f4-us-peng) - [F4 Acceptance Criteria](#f4-acceptance-criteria) - [F4 Test Plan](#f4-test-plan) -- [F5 — Message Bus (QoS with Acks/Commits)](#f5--message-bus-qos-with-ackscommits) +- [F5 — Message Plane (Topics + Checkpoints)](#f5--message-plane-topics-checkpoints) - [F5-US-SRE](#f5-us-sre) - [F5 Acceptance Criteria](#f5-acceptance-criteria) - [F5 Test Plan](#f5-test-plan) @@ -221,36 +221,38 @@ Each feature includes user stories per relevant stakeholders (format requested), --- -## F5 — Message Bus (QoS with Acks/Commits) +## F5 — Message Plane (Topics + Checkpoints) - + - + ### F5-US-SRE -| | | -| -------------- | ----------------------------------------- | -| **As a...** | SRE | -| **I want..** | exactly-once delivery for job dispatch | -| **So that...** | batch jobs don’t double-run under retries | +| | | +| -------------- | --------------------------------------------------------------- | +| **As a...** | SRE | +| **I want..** | at-least-once delivery with deterministic replay | +| **So that...** | workers can crash/restart without losing or duplicating events | #### F5 Acceptance Criteria -- [ ] `gmb.msg` + `gmb.ack` + `gmb.commit` protocol -- [ ] De-dup by (topic, ulid) +- [ ] Topics live under `refs/gatos/messages//head`; commits contain `message/envelope.json` plus optional attachments referenced via `message/attachments/`. +- [ ] Envelopes obey `schemas/v1/message-plane/event_envelope.schema.json`; `content_id = blake3(canonical_bytes)` and commit trailers include `Event-Id: ulid:` + `Content-Id: blake3:`. +- [ ] Consumers persist checkpoints at `refs/gatos/consumers//` (ULID + optional commit) and `messages.read` returns `ulid`, `commit`, `content_id`, `canonical_json`. +- [ ] ULIDs are strictly monotonic per topic per publisher; duplicate ULIDs must be rejected. #### F5 Test Plan -- [ ] Golden: dup publishes + consumer crash → single effect -- [ ] Edge: ack lag metrics emitted -- [ ] Failure: commitment without acks → reject +- [ ] Golden: publish N events, crash consumer mid-stream, restart → `messages.read` resumes from checkpoint and every `ulid` is processed exactly once. +- [ ] Edge: request `messages.read` with stale `since_ulid` → server begins at oldest segment and returns a `checkpoint_hint` for automatic advancement. +- [ ] Failure: out-of-order ULID attempt → commit rejected with deterministic error. --- diff --git a/docs/SPEC.md b/docs/SPEC.md index 7f4208a9..6109f7a5 100644 --- a/docs/SPEC.md +++ b/docs/SPEC.md @@ -759,18 +759,27 @@ The message bus provides a pub/sub system built on Git commits. ```mermaid sequenceDiagram participant Publisher - participant GATOS + participant Topic as refs/gatos/messages//head participant Consumer - Publisher->>GATOS: Publish Message (QoS: at_least_once) - GATOS-->>Consumer: Deliver Message - Consumer->>Consumer: Process Message - Consumer->>GATOS: Send Ack - GATOS->>GATOS: Observe Ack Quorum - GATOS->>GATOS: Create gmb.commit Event + Publisher->>Topic: Write commit (message/envelope.json) + Topic-->>Publisher: Event-Id + Content-Id trailers + Consumer->>Topic: messages.read(topic, since_ulid) + Topic-->>Consumer: { ulid, commit, content_id, canonical_json } + Consumer->>Consumer: Process + dedupe by ULID + Consumer->>Topic: Update refs/gatos/consumers// ``` -Messages are appended to `refs/gatos/messages//head` (optionally sharded by date/size). Delivery is **at-least-once**; consumers **MUST** dedupe using the message `ULID` and persist checkpoints under `refs/gatos/consumers//`. Producers **SHOULD** set `ULID`s deterministically when mirroring external buses. `messages.read` (ADR-0005) exposes canonical envelopes plus commit ids for RPC/CLI clients. +Each topic is a Git ref `refs/gatos/messages//head` whose commit history is the ordered event stream. Commits MUST contain a `message` directory with: + +- `message/envelope.json` — Canonical JSON conforming to `schemas/v1/message-plane/event_envelope.schema.json`. +- optional `message/attachments/` blobs referenced via the envelope `refs` map (do not influence the canonical identifier). + +Commits MUST include trailers `Event-Id: ulid:<26-char ULID>` (strictly monotonic per topic per publisher) and `Content-Id: blake3:` computed from the canonical envelope bytes. No additional files MAY appear at the tree root; implementations store extra artifacts under `message/attachments/`. + +Consumers fetch topics directly or page via the `messages.read` RPC. Responses include `ulid`, `commit`, `content_id`, `envelope_path` (defaults to `message/envelope.json`), `canonical_json` (base64), and optionally a `checkpoint_hint` advising which `refs/gatos/consumers//` entry to advance. Clients MUST persist checkpoints (ULID + optional commit) so they can resume idempotently on crash. + +Messages are **at-least-once**. Dedupe is performed using ULIDs + checkpoints; duplicates MUST be ignored, and out-of-order ULIDs for the same topic MUST be rejected. Producers SHOULD deterministically derive ULIDs when mirroring external brokers so replays remain stable. Retention and compaction: @@ -1215,19 +1224,19 @@ sequenceDiagram Client->>Daemon: 1. Enqueue Job (Event) Daemon->>Ledger: 2. Append `jobs.enqueue` event Ledger-->>Daemon: 3. Success - Daemon->>Bus: 4. Publish `gmb.msg` to topic - Bus-->>Daemon: 5. Success + Daemon->>Bus: 4. Write Message Plane commit (`refs/gatos/messages/jobs.pending/head`) + Bus-->>Daemon: 5. Event-Id/Content-Id trailers recorded Daemon-->>Client: 6. Job Enqueued Note over Bus,State: Later, a worker consumes the job... participant Worker - Worker->>Bus: 7. Subscribe to topic - Bus->>Worker: 8. Delivers `gmb.msg` + Worker->>Bus: 7. Call messages.read(topic="jobs.pending", since_ulid) + Bus->>Worker: 8. Deliver envelope {ulid, commit, content_id, canonical_json} Worker->>Daemon: 9. Report Result (Event) Daemon->>Ledger: 10. Append `jobs.result` event Ledger-->>Daemon: 11. Success - Daemon->>Bus: 12. Write `gmb.ack` + Worker->>Bus: 12. Update refs/gatos/consumers// Daemon-->>Worker: 13. Result Recorded Note over Ledger,State: A fold process runs... diff --git a/docs/TECH-SPEC.md b/docs/TECH-SPEC.md index 3449aa74..7c189134 100644 --- a/docs/TECH-SPEC.md +++ b/docs/TECH-SPEC.md @@ -433,7 +433,7 @@ graph TD subgraph "Metrics" M1(gatos_journal_append_latency_ms) M2(gatos_fold_latency_ms) - M3(gatos_bus_ack_lag) + M3(gatos_message_checkpoint_lag) end A --> M1 B --> M2 diff --git a/docs/decisions/ADR-0006/DECISION.md b/docs/decisions/ADR-0006/DECISION.md new file mode 100644 index 00000000..123763fd --- /dev/null +++ b/docs/decisions/ADR-0006/DECISION.md @@ -0,0 +1,98 @@ +--- +Status: Proposed +Date: 2025-11-09 +ADR: ADR-0006 +Authors: [flyingrobots] +Requires: [ADR-0003] +Related: [ADR-0002, ADR-0004] +Tags: [Watcher, Hooks, Locks] +Schemas: [] +Supersedes: [] +Superseded-By: [] +--- + +# ADR-0006: Local Enforcement — Watcher Daemon & Git Hooks + +## Scope + +Provide **local enforcement** of governance policy via (a) a cross-platform Watcher daemon (`gatos watch`) and (b) Git hooks installed by the CLI. The goal is to mirror the guarantees of ADR-0003 (Consensus Governance) on every workstation: Perforce-style read-only locks until a Grant exists, pre-commit/pre-push policy gates, and reactive automation tied to the Job Plane. + +## Rationale + +- **Problem:** Without local enforcement, contributors can edit locked assets, forget to acquire Grants, or push non-compliant history—only to be rejected later by server-side policy. +- **Context:** Artists and developers expect “read-only until lock” workflows, automatic tests on save/fold change, and immediate feedback instead of slow CI rebukes. +- **Outcome:** A consistent, deterministic local experience that reflects policy reality before changes ever leave the workstation. + +## Decision + +### 1. Watcher Daemon (`gatos watch`) + +- Monitors the working tree plus `.git/refs/gatos/**` using cross-platform file notifications (inotify/FSEvents/ReadDirectoryChangesW). When unavailable, the daemon MUST fall back to polling. +- Enforces **read-only masks** for paths matched in `.gatos/policy.yaml` `locks` section until a valid **Grant** exists (ADR-0003). Default enforcement: + - POSIX: `chmod -w`, escalating to `chflags uchg`/`chattr +i` when the user opts in. + - Windows: set `FILE_ATTRIBUTE_READONLY`. +- Emits structured events (JSONL on stdout + desktop notification hooks) whenever policy denies a mutation attempt. Events MUST include the policy rule id, actor, path, and remediation hint. +- Watches `refs/gatos/grants/**` for updates so that newly approved locks are released immediately without requiring a restart. +- The daemon MUST persist state (e.g., last applied masks, pending lock requests) under `~/.config/gatos/watch/` to survive restarts. + +### 2. Git Hooks (managed surface) + +`gatos install-hooks` installs managed hook scripts (POSIX shell + PowerShell). Hooks MUST be idempotent and re-runnable. + +- `pre-commit`: rejects staged changes touching locked paths, consults the Watcher cache, and logs violations under `refs/gatos/audit/locks/`. +- `pre-push`: verifies that every outbound reference has the required Grants (ADR-0003) and that Proof-of-Fold/Proof-of-Execution metadata (when mandated) is present. Failure MUST block the push. +- `post-merge` / `post-checkout`: re-apply read-only masks based on current grants. +- Hooks MUST fail closed if the policy engine cannot evaluate (missing cache, corrupt policy, etc.). Users can bypass only via the documented escape hatch (`GATOS_NO_HOOKS=1`), which emits a warning banner. + +### 3. Lock Acquisition UX + +- `gatos lock acquire `: + 1. Computes the canonical lock id (path glob + repository root). + 2. Creates a **Proposal** under `refs/gatos/proposals/locks/` referencing the governance rule declared in `.gatos/policy.yaml`. + 3. Waits (with progress feedback) for a **Grant** to materialize; once quorum is met, the Watcher daemon removes the read-only bit for the granted files. +- `gatos lock release `: revokes or supersedes the Grant via ADR-0003’s revocation flow. +- CLI helpers MUST support batching (multiple paths) and provide human-friendly status (Pending, Granted, Revoked, Expired). + +### 4. Reactive Automation + +- Policies MAY declare `watcher.tasks[]` entries that run deterministic commands locally when a file is saved or when a fold finishes (e.g., run formatters, lint, or spawn a local Job Plane task in “loopback” mode per ADR-0002). +- Tasks run in a sandbox (`git worktree` or temp dir) and MUST publish their outputs as Job commits if the policy requires proof (e.g., `Proof-Of-Execution: local`). + +### 5. Configuration (`.gatos/policy.yaml`) + +```yaml +locks: + - match: "assets/**" + rule: "governance.locks.assets" # ADR-0003 rule id + read_only: true + - match: "codegen/**" + rule: "governance.locks.codegen" +watcher: + poll_fallback_ms: 5000 + tasks: + - when: "on_save" + match: "**/*.proto" + run_job: "format.proto" +``` + +- The `locks` array declares glob patterns and the governance rule controlling each. +- `watcher.tasks` describes optional automation hooks. Task definitions MUST reference existing Job Plane manifests when `run_job` is used. +- Users MAY opt out by setting `GATOS_NO_HOOKS=1` or `GATOS_NO_WATCH=1`, but the CLI MUST warn that doing so removes local guardrails. + +## Consequences + +**Pros** +- Prevents “foot-gun” edits to locked or policy-controlled files. +- Provides Perforce-style artist workflow inside Git + ADR governance. +- Gives immediate feedback (notifications + hook failures) instead of delayed CI surprises. + +**Cons** +- Platform differences in file permissions; must handle FAT/NTFS quirks and network filesystems. +- Misconfiguration could temporarily lock users out; hooks must degrade gracefully when profiles change. +- Local enforcement can be bypassed; server-side policy remains the source of truth. + +## Security Considerations + +- Hooks and watcher run with user privileges and cannot be trusted for adversarial enforcement; the remote push-gate remains authoritative. +- The watcher MUST respect ADR-0004 privacy rules: never emit private overlay paths in logs/notifications unless the actor already has access, and avoid leaking pointer metadata. +- Daemon communication channels (e.g., JSONL socket) MUST authenticate local clients or restrict to loopback to prevent untrusted processes from spoofing policy events. diff --git a/docs/decisions/README.md b/docs/decisions/README.md index 675ce33f..ce733b37 100644 --- a/docs/decisions/README.md +++ b/docs/decisions/README.md @@ -33,3 +33,4 @@ Each ADR will have a status, typically one of the following: | [ADR-0003](./ADR-0003/DECISION.md) | Consensus Governance for Gated Actions | Accepted | 2025-11-08 | | [ADR-0004](./ADR-0004/DECISION.md) | Hybrid Privacy Model (Public Projection + Private Overlay) | Accepted | 2025-11-09 | | [ADR-0005](./ADR-0005/DECISION.md) | Message Plane — A Git-Native, Commit-Backed Message Bus | Proposed | 2025-11-09 | +| [ADR-0006](./ADR-0006/DECISION.md) | Local Enforcement — Watcher Daemon & Git Hooks | Proposed | 2025-11-09 | diff --git a/docs/diagrams/data_flow.md b/docs/diagrams/data_flow.md index 76fcf7f8..c1bf0f7e 100644 --- a/docs/diagrams/data_flow.md +++ b/docs/diagrams/data_flow.md @@ -13,25 +13,25 @@ sequenceDiagram participant Client participant Daemon as gatosd participant Ledger as gatos-ledger - participant Bus as gatos-message-plane + participant Bus as Message Plane participant State as gatos-echo Client->>Daemon: 1. Enqueue Job (Event) Daemon->>Ledger: 2. Append `jobs.enqueue` event Ledger-->>Daemon: 3. Success - Daemon->>Bus: 4. Publish `gmb.msg` to topic - Bus-->>Daemon: 5. Success + Daemon->>Bus: 4. Write message commit (topic `jobs.pending`) + Bus-->>Daemon: 5. Event-Id / Content-Id recorded Daemon-->>Client: 6. Job Enqueued Note over Bus,State: Later, a worker consumes the job... participant Worker - Worker->>Bus: 7. Subscribe to topic - Bus->>Worker: 8. Delivers `gmb.msg` + Worker->>Bus: 7. `messages.read(jobs.pending, since_ulid)` + Bus->>Worker: 8. Deliver envelope {ulid, commit, content_id} Worker->>Daemon: 9. Report Result (Event) Daemon->>Ledger: 10. Append `jobs.result` event Ledger-->>Daemon: 11. Success - Daemon->>Bus: 12. Write `gmb.ack` + Worker->>Bus: 12. Update `refs/gatos/consumers//` checkpoint Daemon-->>Worker: 13. Result Recorded Note over Ledger,State: A fold process runs... diff --git a/docs/diagrams/generated/docs_SPEC__0679d036ea__mermaid_19.svg b/docs/diagrams/generated/docs_SPEC__0679d036ea__mermaid_19.svg index c50cf3fc..9240473e 100644 --- a/docs/diagrams/generated/docs_SPEC__0679d036ea__mermaid_19.svg +++ b/docs/diagrams/generated/docs_SPEC__0679d036ea__mermaid_19.svg @@ -1,2 +1,2 @@ -Workergatos-echogatos-message-planegatos-ledgergatosdClientWorkergatos-echogatos-message-planegatos-ledgergatosdClientLater, a worker consumes the job...A fold process runs...1. Enqueue Job (Event)2. Append `jobs.enqueue` event3. Success4. Publish `gmb.msg` to topic5. Success6. Job Enqueued7. Subscribe to topic8. Delivers `gmb.msg`9. Report Result (Event)10. Append `jobs.result` event11. Success12. Write `gmb.ack`13. Result Recorded14. Read events from journal15. Compute new state (e.g., update queue view)16. Checkpoint new state +Workergatos-echogatos-message-planegatos-ledgergatosdClientWorkergatos-echogatos-message-planegatos-ledgergatosdClientLater, a worker consumes the job...A fold process runs...1. Enqueue Job (Event)2. Append `jobs.enqueue` event3. Success4. Publish `jobs.pending message` to topic5. Success6. Job Enqueued7. Subscribe to topic8. Delivers `jobs.pending message`9. Report Result (Event)10. Append `jobs.result` event11. Success12. Write `checkpoint update`13. Result Recorded14. Read events from journal15. Compute new state (e.g., update queue view)16. Checkpoint new state diff --git a/docs/diagrams/generated/docs_SPEC__0679d036ea__mermaid_9.svg b/docs/diagrams/generated/docs_SPEC__0679d036ea__mermaid_9.svg index 09605ae5..4558711c 100644 --- a/docs/diagrams/generated/docs_SPEC__0679d036ea__mermaid_9.svg +++ b/docs/diagrams/generated/docs_SPEC__0679d036ea__mermaid_9.svg @@ -1,2 +1,2 @@ -ConsumerGATOSPublisherConsumerGATOSPublisherPublish Message (QoS: at_least_once)Deliver MessageProcess MessageSend AckObserve Ack QuorumCreate gmb.commit Event +ConsumerGATOSPublisherConsumerGATOSPublisherPublish Message (QoS: at_least_once)Deliver MessageProcess MessageSend AckObserve Ack QuorumCreate message commit Event diff --git a/docs/diagrams/generated/docs__SPEC__mermaid_19.svg b/docs/diagrams/generated/docs__SPEC__mermaid_19.svg index ff9f9ca5..008f1996 100644 --- a/docs/diagrams/generated/docs__SPEC__mermaid_19.svg +++ b/docs/diagrams/generated/docs__SPEC__mermaid_19.svg @@ -1,2 +1,2 @@ -Workergatos-echogatos-message-planegatos-ledgergatosdClientWorkergatos-echogatos-message-planegatos-ledgergatosdClientLater, a worker consumes the job...A fold process runs...1. Enqueue Job (Event)2. Append `jobs.enqueue` event3. Success4. Publish `gmb.msg` to topic5. Success6. Job Enqueued7. Subscribe to topic8. Delivers `gmb.msg`9. Report Result (Event)10. Append `jobs.result` event11. Success12. Write `gmb.ack`13. Result Recorded14. Read events from journal15. Compute new state (e.g., update queue view)16. Checkpoint new state +Workergatos-echogatos-message-planegatos-ledgergatosdClientWorkergatos-echogatos-message-planegatos-ledgergatosdClientLater, a worker consumes the job...A fold process runs...1. Enqueue Job (Event)2. Append `jobs.enqueue` event3. Success4. Publish `jobs.pending message` to topic5. Success6. Job Enqueued7. Subscribe to topic8. Delivers `jobs.pending message`9. Report Result (Event)10. Append `jobs.result` event11. Success12. Write `checkpoint update`13. Result Recorded14. Read events from journal15. Compute new state (e.g., update queue view)16. Checkpoint new state diff --git a/docs/diagrams/generated/docs__SPEC__mermaid_9.svg b/docs/diagrams/generated/docs__SPEC__mermaid_9.svg index f7d8db96..6f9f546b 100644 --- a/docs/diagrams/generated/docs__SPEC__mermaid_9.svg +++ b/docs/diagrams/generated/docs__SPEC__mermaid_9.svg @@ -1,2 +1,2 @@ -ConsumerGATOSPublisherConsumerGATOSPublisherPublish Message (QoS: exactly_once)Deliver MessageProcess MessageSend AckObserve Ack QuorumCreate gmb.commit Event +ConsumerGATOSPublisherConsumerGATOSPublisherPublish Message (QoS: exactly_once)Deliver MessageProcess MessageSend AckObserve Ack QuorumCreate message commit Event diff --git a/docs/diagrams/generated/docs__diagrams__data_flow__mermaid_1.svg b/docs/diagrams/generated/docs__diagrams__data_flow__mermaid_1.svg index 3dfea291..e4473e1d 100644 --- a/docs/diagrams/generated/docs__diagrams__data_flow__mermaid_1.svg +++ b/docs/diagrams/generated/docs__diagrams__data_flow__mermaid_1.svg @@ -1,2 +1,2 @@ -Workergatos-echogatos-message-planegatos-ledgergatosdClientWorkergatos-echogatos-message-planegatos-ledgergatosdClientLater, a worker consumes the job...A fold process runs...1. Enqueue Job (Event)2. Append `jobs.enqueue` event3. Success4. Publish `gmb.msg` to topic5. Success6. Job Enqueued7. Subscribe to topic8. Delivers `gmb.msg`9. Report Result (Event)10. Append `jobs.result` event11. Success12. Write `gmb.ack`13. Result Recorded14. Read events from journal15. Compute new state (e.g., update queue view)16. Checkpoint new state +Workergatos-echogatos-message-planegatos-ledgergatosdClientWorkergatos-echogatos-message-planegatos-ledgergatosdClientLater, a worker consumes the job...A fold process runs...1. Enqueue Job (Event)2. Append `jobs.enqueue` event3. Success4. Publish `jobs.pending message` to topic5. Success6. Job Enqueued7. Subscribe to topic8. Delivers `jobs.pending message`9. Report Result (Event)10. Append `jobs.result` event11. Success12. Write `checkpoint update`13. Result Recorded14. Read events from journal15. Compute new state (e.g., update queue view)16. Checkpoint new state diff --git a/docs/diagrams/generated/docs_diagrams_data_flow__559f0c180d__mermaid_1.svg b/docs/diagrams/generated/docs_diagrams_data_flow__559f0c180d__mermaid_1.svg index c7c9005c..18698407 100644 --- a/docs/diagrams/generated/docs_diagrams_data_flow__559f0c180d__mermaid_1.svg +++ b/docs/diagrams/generated/docs_diagrams_data_flow__559f0c180d__mermaid_1.svg @@ -1,2 +1,2 @@ -Workergatos-echogatos-message-planegatos-ledgergatosdClientWorkergatos-echogatos-message-planegatos-ledgergatosdClientLater, a worker consumes the job...A fold process runs...1. Enqueue Job (Event)2. Append `jobs.enqueue` event3. Success4. Publish `gmb.msg` to topic5. Success6. Job Enqueued7. Subscribe to topic8. Delivers `gmb.msg`9. Report Result (Event)10. Append `jobs.result` event11. Success12. Write `gmb.ack`13. Result Recorded14. Read events from journal15. Compute new state (e.g., update queue view)16. Checkpoint new state +Workergatos-echogatos-message-planegatos-ledgergatosdClientWorkergatos-echogatos-message-planegatos-ledgergatosdClientLater, a worker consumes the job...A fold process runs...1. Enqueue Job (Event)2. Append `jobs.enqueue` event3. Success4. Publish `jobs.pending message` to topic5. Success6. Job Enqueued7. Subscribe to topic8. Delivers `jobs.pending message`9. Report Result (Event)10. Append `jobs.result` event11. Success12. Write `checkpoint update`13. Result Recorded14. Read events from journal15. Compute new state (e.g., update queue view)16. Checkpoint new state diff --git a/docs/guide/CHAPTER-006.md b/docs/guide/CHAPTER-006.md index 76d2e131..a762a4ca 100644 --- a/docs/guide/CHAPTER-006.md +++ b/docs/guide/CHAPTER-006.md @@ -135,7 +135,7 @@ Storage: `refs/gatos/jobs//result` (commit whose tree contains the resul - Segment topics: `refs/gatos/messages////
/` (or numeric `0001`, `0002`, …) to bound ref sizes. - Rotation thresholds (defaults): rotate at 100k messages or \~192 MB per segment (whichever comes first). - TTL: retain segments for 30 days, then prune; write a summary commit (counts, Merkle root, last offsets) when pruning to preserve verifiability. -- Offsets: snapshot consumer offsets; prune only segments older than the minimum acknowledged offset across active consumers. +- Offsets: snapshot consumer checkpoints (refs/gatos/consumers//); prune only segments older than the minimum checkpointed ULID across active consumers. - Git optimization: enable `fetch.writeCommitGraph=true`, `repack.writeBitmaps=true`; consider partial clone/promisor remotes for `refs/gatos/messages/*` on busy installations. ## Summary From 0b44935a178dace01b8f72779ef55ac346e2e45b Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Sun, 16 Nov 2025 23:38:26 -0800 Subject: [PATCH 07/25] docs ADR-0006 example schemas --- docs/decisions/ADR-0006/DECISION.md | 20 ++++--- schemas/v1/policy/locks.schema.json | 87 +++++++++++++++++++++++++++++ schemas/v1/watch/events.schema.json | 49 ++++++++++++++++ 3 files changed, 149 insertions(+), 7 deletions(-) create mode 100644 schemas/v1/policy/locks.schema.json create mode 100644 schemas/v1/watch/events.schema.json diff --git a/docs/decisions/ADR-0006/DECISION.md b/docs/decisions/ADR-0006/DECISION.md index 123763fd..b8ccd75d 100644 --- a/docs/decisions/ADR-0006/DECISION.md +++ b/docs/decisions/ADR-0006/DECISION.md @@ -6,7 +6,9 @@ Authors: [flyingrobots] Requires: [ADR-0003] Related: [ADR-0002, ADR-0004] Tags: [Watcher, Hooks, Locks] -Schemas: [] +Schemas: + - ../../../../schemas/v1/policy/locks.schema.json # TBD schema capturing `.gatos/policy.yaml` locks/watch config + - ../../../../schemas/v1/watch/events.schema.json # Structured event payloads emitted by the watcher (defined in implementation ADR) Supersedes: [] Superseded-By: [] --- @@ -31,9 +33,13 @@ Provide **local enforcement** of governance policy via (a) a cross-platform Watc - Enforces **read-only masks** for paths matched in `.gatos/policy.yaml` `locks` section until a valid **Grant** exists (ADR-0003). Default enforcement: - POSIX: `chmod -w`, escalating to `chflags uchg`/`chattr +i` when the user opts in. - Windows: set `FILE_ATTRIBUTE_READONLY`. -- Emits structured events (JSONL on stdout + desktop notification hooks) whenever policy denies a mutation attempt. Events MUST include the policy rule id, actor, path, and remediation hint. +- Emits structured events (JSONL on stdout + desktop notification hooks) whenever policy denies a mutation attempt. Event schema (see `schemas/v1/watch/events.schema.json`): + +```json +{ "ts": "2025-11-09T12:00:00Z", "rule": "governance.locks.assets", "actor": "user:alice", "path": "assets/hero.obj", "action": "deny.write", "remediation": "Acquire lock" } +``` - Watches `refs/gatos/grants/**` for updates so that newly approved locks are released immediately without requiring a restart. -- The daemon MUST persist state (e.g., last applied masks, pending lock requests) under `~/.config/gatos/watch/` to survive restarts. +- The daemon MUST persist state (e.g., last applied masks, pending lock requests) under `~/.config/gatos/watch/` to survive restarts. State files are advisory; corruption or tampering MUST trigger a full resync from Git policy data before enforcement resumes. ### 2. Git Hooks (managed surface) @@ -42,7 +48,7 @@ Provide **local enforcement** of governance policy via (a) a cross-platform Watc - `pre-commit`: rejects staged changes touching locked paths, consults the Watcher cache, and logs violations under `refs/gatos/audit/locks/`. - `pre-push`: verifies that every outbound reference has the required Grants (ADR-0003) and that Proof-of-Fold/Proof-of-Execution metadata (when mandated) is present. Failure MUST block the push. - `post-merge` / `post-checkout`: re-apply read-only masks based on current grants. -- Hooks MUST fail closed if the policy engine cannot evaluate (missing cache, corrupt policy, etc.). Users can bypass only via the documented escape hatch (`GATOS_NO_HOOKS=1`), which emits a warning banner. +- Hooks MUST fail closed if the policy engine cannot evaluate (missing cache, corrupt policy, etc.). Users can bypass only via the documented escape hatch (`GATOS_NO_HOOKS=1`), which emits a warning banner *and* records an audit trailer (`Bypass-Hooks: user:alice reason=env override`) on the next push so server-side policy can flag the session. ### 3. Lock Acquisition UX @@ -51,12 +57,12 @@ Provide **local enforcement** of governance policy via (a) a cross-platform Watc 2. Creates a **Proposal** under `refs/gatos/proposals/locks/` referencing the governance rule declared in `.gatos/policy.yaml`. 3. Waits (with progress feedback) for a **Grant** to materialize; once quorum is met, the Watcher daemon removes the read-only bit for the granted files. - `gatos lock release `: revokes or supersedes the Grant via ADR-0003’s revocation flow. -- CLI helpers MUST support batching (multiple paths) and provide human-friendly status (Pending, Granted, Revoked, Expired). +- CLI helpers MUST support batching (multiple paths). “Per-path best-effort” means the CLI issues one Proposal per path and continues processing remaining entries even if some fail; failures MUST be reported individually, and commands MUST exit non-zero if any path failed. A summary (“2/3 locks granted”) is shown and detailed status recorded under `~/.config/gatos/locks/`. ### 4. Reactive Automation - Policies MAY declare `watcher.tasks[]` entries that run deterministic commands locally when a file is saved or when a fold finishes (e.g., run formatters, lint, or spawn a local Job Plane task in “loopback” mode per ADR-0002). -- Tasks run in a sandbox (`git worktree` or temp dir) and MUST publish their outputs as Job commits if the policy requires proof (e.g., `Proof-Of-Execution: local`). +- Tasks run in a sandbox (`git worktree` or temp dir) and MUST publish their outputs as Job commits if the policy requires proof (e.g., `Proof-Of-Execution: local`). Implementations MUST enforce sane defaults: max concurrent tasks = 2, default timeout = 120s, configurable via `.gatos/policy.yaml`. Exceeding limits terminates the task and logs a warning. ### 5. Configuration (`.gatos/policy.yaml`) @@ -77,7 +83,7 @@ watcher: - The `locks` array declares glob patterns and the governance rule controlling each. - `watcher.tasks` describes optional automation hooks. Task definitions MUST reference existing Job Plane manifests when `run_job` is used. -- Users MAY opt out by setting `GATOS_NO_HOOKS=1` or `GATOS_NO_WATCH=1`, but the CLI MUST warn that doing so removes local guardrails. +- Users MAY opt out by setting `GATOS_NO_HOOKS=1` or `GATOS_NO_WATCH=1`, but the CLI MUST warn that doing so removes local guardrails. Opt-outs SHOULD be persisted to `refs/gatos/audit/locks/` so reviewers know the session bypassed local enforcement. ## Consequences diff --git a/schemas/v1/policy/locks.schema.json b/schemas/v1/policy/locks.schema.json new file mode 100644 index 00000000..94a859af --- /dev/null +++ b/schemas/v1/policy/locks.schema.json @@ -0,0 +1,87 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "$id": "https://gatos.io/schemas/v1/policy/locks.schema.json", + "title": "GATOS Policy Locks & Watcher Configuration", + "type": "object", + "additionalProperties": false, + "properties": { + "locks": { + "type": "array", + "description": "Glob-based lock declarations bound to governance rules.", + "items": { + "type": "object", + "required": ["match", "rule"], + "additionalProperties": false, + "properties": { + "match": { + "type": "string", + "description": "Glob pattern (gitignore syntax) for files subject to the lock." + }, + "rule": { + "type": "string", + "description": "ADR-0003 governance rule id controlling the lock lifecycle." + }, + "read_only": { + "type": "boolean", + "description": "Whether the watcher should apply filesystem read-only bits.", + "default": true + } + } + } + }, + "watcher": { + "type": "object", + "additionalProperties": false, + "properties": { + "poll_fallback_ms": { + "type": "integer", + "minimum": 0, + "description": "Polling interval used when native filesystem notifications are unavailable." + }, + "tasks": { + "type": "array", + "description": "Local automation tasks triggered by watcher events.", + "items": { + "type": "object", + "required": ["when", "match"], + "additionalProperties": false, + "properties": { + "when": { + "type": "string", + "enum": ["on_save", "on_fold", "on_change"], + "description": "Trigger condition." + }, + "match": { + "type": "string", + "description": "Glob selecting files that activate this task." + }, + "run_command": { + "type": "array", + "items": { "type": "string" }, + "description": "Command (argv) executed locally." + }, + "run_job": { + "type": "string", + "description": "Job Plane manifest id to execute in loopback mode." + }, + "timeout_seconds": { + "type": "integer", + "minimum": 1, + "description": "Per-task timeout override (defaults to 120 seconds)." + }, + "concurrency": { + "type": "integer", + "minimum": 1, + "description": "Max concurrent instances for this task (defaults to 2)." + } + }, + "anyOf": [ + { "required": ["run_command"] }, + { "required": ["run_job"] } + ] + } + } + } + } + } +} diff --git a/schemas/v1/watch/events.schema.json b/schemas/v1/watch/events.schema.json new file mode 100644 index 00000000..d7eab8ec --- /dev/null +++ b/schemas/v1/watch/events.schema.json @@ -0,0 +1,49 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "$id": "https://gatos.io/schemas/v1/watch/events.schema.json", + "title": "GATOS Watcher Event", + "type": "object", + "additionalProperties": false, + "required": ["ts", "rule", "actor", "path", "action"], + "properties": { + "ts": { + "type": "string", + "format": "date-time", + "description": "Timestamp in UTC." + }, + "rule": { + "type": "string", + "description": "Governance rule id (ADR-0003)." + }, + "actor": { + "type": "string", + "description": "Local actor identity (user:alice, agent:renderbot, etc.)." + }, + "path": { + "type": "string", + "description": "Relative path within the repository." + }, + "action": { + "type": "string", + "enum": [ + "deny.write", + "deny.delete", + "lock.applied", + "lock.released", + "task.started", + "task.finished", + "task.failed" + ], + "description": "Event type emitted by the watcher." + }, + "remediation": { + "type": "string", + "description": "User-facing remediation hint (optional)." + }, + "details": { + "type": "object", + "description": "Additional context such as task id, error messages, or grant ids.", + "additionalProperties": true + } + } +} From 8004abc6237f93affab555e3ed2519a928c1aef7 Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Mon, 17 Nov 2025 00:28:23 -0800 Subject: [PATCH 08/25] ADR 0006-0017 --- CONTRIBUTING.md | 8 ++++ README.md | 4 ++ ROADMAP.md | 5 +++ docs/ROADMAP.md | 5 +++ docs/SPEC.md | 66 +++++++++++++++++++++++++++ docs/TECH-SPEC.md | 57 +++++++++++++++++++++++- docs/cli/watch.md | 43 ++++++++++++++++++ docs/decisions/ADR-0006/DECISION.md | 2 +- docs/decisions/ADR-0007/DECISION.md | 41 +++++++++++++++++ docs/decisions/ADR-0008/DECISION.md | 44 ++++++++++++++++++ docs/decisions/ADR-0009/DECISION.md | 43 ++++++++++++++++++ docs/decisions/ADR-0010/DECISION.md | 39 ++++++++++++++++ docs/decisions/ADR-0011/DECISION.md | 36 +++++++++++++++ docs/decisions/ADR-0012/DECISION.md | 45 +++++++++++++++++++ docs/decisions/ADR-0013/DECISION.md | 35 +++++++++++++++ docs/decisions/ADR-0014/DECISION.md | 45 +++++++++++++++++++ docs/decisions/README.md | 10 ++++- docs/guide/CHAPTER-005.md | 7 +++ docs/guide/CHAPTER-006.md | 2 + schemas/README.md | 3 ++ schemas/v1/policy/locks.schema.json | 69 +++++++---------------------- schemas/v1/watch/events.schema.json | 52 +++++++--------------- 22 files changed, 567 insertions(+), 94 deletions(-) create mode 100644 docs/cli/watch.md create mode 100644 docs/decisions/ADR-0007/DECISION.md create mode 100644 docs/decisions/ADR-0008/DECISION.md create mode 100644 docs/decisions/ADR-0009/DECISION.md create mode 100644 docs/decisions/ADR-0010/DECISION.md create mode 100644 docs/decisions/ADR-0011/DECISION.md create mode 100644 docs/decisions/ADR-0012/DECISION.md create mode 100644 docs/decisions/ADR-0013/DECISION.md create mode 100644 docs/decisions/ADR-0014/DECISION.md diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index bca76ee4..897dbb62 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -80,6 +80,14 @@ scripts/setup-hooks.sh If the hook fails, fix the reported issues and retry the commit. +> **ADR-0006 preview:** Once `git gatos install-hooks` lands, use it instead of `scripts/setup-hooks.sh`. It installs the managed `pre-commit`, `pre-push`, and `post-checkout/merge` hooks referenced in the spec, plus records bypasses under `refs/gatos/audit/locks/*`. + +### Watcher / Lock Testing + +- `git gatos watch --once` — run a single enforcement pass to verify `.gatos/policy.yaml` `locks[]` before committing docs. +- `git gatos lock acquire path1 path2` — exercise the governance workflow locally; grants appear under `refs/gatos/grants/...` and the watcher should clear read-only bits automatically. +- Event logs live in `~/.config/gatos/watch/events.log`; attach them to issues when debugging local enforcement. + ### Docs Normalization (AST pipeline) We run a deterministic Markdown normalizer (unified/remark) as a prebuild check. It parses Markdown to an AST, applies project transforms (link fixes, SPEC/TECH-SPEC linkification), and stringifies back. This keeps formatting/linting idempotent without touching anchors. diff --git a/README.md b/README.md index 991dba4c..e51094b7 100644 --- a/README.md +++ b/README.md @@ -105,6 +105,10 @@ git push Store sensitive data (PII, large datasets) in private stores, but commit their **cryptographic commitments** to the public graph — public commitments; private bytes behind a policy-gated resolver. ***Verify the integrity of the computation without revealing the raw bytes***. +### 4. Local Guardrails (Watcher + Hooks) + +Artists and infra engineers get Perforce-style safety without leaving Git. The `gatos watch` daemon keeps locked files read-only until a governance Grant exists, `gatos lock acquire/release` walks you through the approval flow, and managed Git hooks (`gatos install-hooks`) block bad pushes before they ever hit the remote—while logging any bypass under `refs/gatos/audit/locks/*`. + ----- ## How it Works: The 5 Planes diff --git a/ROADMAP.md b/ROADMAP.md index 5891f819..d04c8005 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -130,12 +130,17 @@ These are explicit non-goals until after the core truth machine is working: - Governance: - Proposals → approvals → grants mapped to signed events. - Grants bound to `policy_root`. +- Local enforcement: + - Ship `gatos watch` daemon enforcing read-only locks from `.gatos/policy.yaml` until grants land. + - Managed Git hooks (`pre-commit`, `pre-push`, `post-merge`) installed via `gatos install-hooks` and logged under audit refs. + - Lock UX: `gatos lock acquire/release` wired to ADR-0003 so artists get Perforce-style flows. **Done when:** - Rewriting policy history via rebase is impossible. - Violating commits produce DENY entries with links back to the responsible ADR/policy. - Policy rules can enforce e.g. “no API changes without 2-of-3 quorum”. +- Locked assets stay read-only locally until a Grant is available and hooks reject bypass attempts. --- diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md index 907bbe15..c5a7d561 100644 --- a/docs/ROADMAP.md +++ b/docs/ROADMAP.md @@ -289,6 +289,10 @@ These are explicit non-goals until after the core truth machine is working: - DENY-audit under `refs/gatos/audit/policy/**`. - Governance MVP: - proposals → approvals → grants (N-of-M). +- Local enforcement: + - `gatos watch` daemon enforces `.gatos/policy.yaml` locks and mirrors grants locally. + - Managed Git hooks (`pre-commit`, `pre-push`, `post-checkout`) installed via CLI and logged under `refs/gatos/audit/locks/*`. + - `gatos lock acquire/release` CLI bridges users to ADR-0003 grants. ### Done When @@ -301,6 +305,7 @@ These are explicit non-goals until after the core truth machine is working: - Rebasing policy refs is impossible. - Violating commits produce DENY events. - Policy ADR-as-code works end-to-end. +- Locked assets remain read-only locally until the matching Grant lands; hook bypasses are audited. --- diff --git a/docs/SPEC.md b/docs/SPEC.md index 6109f7a5..fb221b32 100644 --- a/docs/SPEC.md +++ b/docs/SPEC.md @@ -56,6 +56,11 @@ - [20.4 Lifecycle States](#204-lifecycle-states) - [20.5 Revocation](#205-revocation) - [20.6 Bus Topics (recommended)](#206-bus-topics-recommended) +- [21. Local Enforcement (Watcher + Hooks)](#21-local-enforcement) + - [21.1 Watcher Daemon (`gatos watch`)](#211-watcher-daemon-gatos-watch) + - [21.2 Git Hooks (managed)](#212-git-hooks-managed) + - [21.3 Lock UX & Automation](#213-lock-ux--automation) + - [21.4 Security Notes](#214-security-notes) - [Glossary](#glossary) @@ -1511,6 +1516,67 @@ Revoked-By: --- +## 21. Local Enforcement (Watcher + Hooks) + + + +ADR-0006 extends governance guarantees to every workstation. Local enforcement is advisory (server-side push gates remain authoritative) but MUST provide clear guardrails. + +### 21.1 Watcher Daemon (`gatos watch`) + + + +- Monitors the working tree + `.git/refs/gatos/**` via native OS notifications; falls back to polling when unavailable (`watcher.poll_fallback_ms`). +- Applies read-only masks for every `locks[]` entry in `.gatos/policy.yaml` until the corresponding governance Grant exists. POSIX: `chmod -w` (optionally `chattr +i`/`chflags uchg`). Windows: `FILE_ATTRIBUTE_READONLY`. +- Emits structured JSONL events conforming to `schemas/v1/watch/events.schema.json`; events include `ts`, `rule`, `actor`, `path`, `action`, and remediation hints. Events are logged under `refs/gatos/audit/locks/` and surfaced via desktop hooks. +- Persists state (`~/.config/gatos/watch/`) describing applied masks and pending lock requests; corruption MUST trigger a resync from Git before enforcement resumes. +- Watches `refs/gatos/grants/**` to immediately lift locks when a Grant lands. + +### 21.2 Git Hooks (managed) + + + +- Installed/updated via `gatos install-hooks`; implemented for POSIX shell + PowerShell. Hooks are idempotent and recorded in audit logs when bypassed. +- `pre-commit`: rejects staged changes that violate lock rules; suggests `gatos lock acquire` as remediation. +- `pre-push`: verifies outbound refs have required Grants and mandated proofs (PoF/PoE) before pushes leave the workstation. +- `post-merge` / `post-checkout`: reapply read-only masks so new worktrees inherit current lock state. +- Hooks MUST fail closed if policy evaluation fails. Users MAY set `GATOS_NO_HOOKS=1`, but the next push MUST record a `Bypass-Hooks` trailer. + +### 21.3 Lock UX & Automation + + + +- CLI commands: + - `gatos lock acquire ` — batches lock requests, creates governance proposals, waits for Grants, and instructs the watcher to clear masks when approved. + - `gatos lock release ` — revokes grants per ADR-0003 or supersedes them with new grants. +- `.gatos/policy.yaml` additions: + +```yaml +locks: + - match: "assets/**" + rule: "governance.locks.assets" + read_only: true +watcher: + poll_fallback_ms: 5000 + tasks: + - when: "on_save" + match: "**/*.proto" + run_job: "format.proto" +``` + +- `locks[]` defines glob patterns bound to governance rule ids. `watcher.tasks[]` optionally triggers deterministic automation or Job Plane runs when files change. +- Schemas live at `schemas/v1/policy/locks.schema.json` (policy extension) and `schemas/v1/watch/events.schema.json` (watcher events). + +### 21.4 Security Notes + + + +- Watcher + hooks run with user privileges; they cannot replace server enforcement. Their goal is fast feedback and local safety rails. +- Event payloads MUST respect ADR-0004 privacy guidance (no plaintext from private overlays without authorization). +- All bypass mechanisms (env vars, CLI flags) MUST emit audit entries so reviewers can correlate pushes with potential local policy violations. + +--- + ## Glossary diff --git a/docs/TECH-SPEC.md b/docs/TECH-SPEC.md index 7c189134..3438bf22 100644 --- a/docs/TECH-SPEC.md +++ b/docs/TECH-SPEC.md @@ -24,8 +24,9 @@ - [5. Epochs & Compaction](#5-epochs--compaction) - [6. Opaque Pointers](#6-opaque-pointers) - [7. JSONL Protocol](#7-jsonl-protocol) -- [8. Observability](#8-observability) -- [9. CI & Cross-Platform Determinism](#9-ci--cross-platform-determinism) +- [8. Local Enforcement (Watcher + Hooks)](#tech-watcher) +- [9. Observability](#8-observability) +- [10. CI & Cross-Platform Determinism](#9-ci--cross-platform-determinism) - [10. Security](#10-security) - [11. Performance Guidance](#11--performance-guidance) - [12. Client SDKs](#12-client-sdks) @@ -40,6 +41,7 @@ - [Group Resolution](#group-resolution) - [Revocation Propagation](#revocation-propagation) - [End-to-End Flow](#end-to-end-flow) +- [17. Local Enforcement (Watcher + Hooks)](#tech-watcher) @@ -783,6 +785,57 @@ sequenceDiagram Ledger-->>Client: Pending (partial) end ``` + +## 17. Local Enforcement (Watcher + Hooks) + + + +### 8.1 Watcher Daemon (`gatos watch`) + +- Process model: long-running daemon started via `gatos watch` or Auto-Start. Watches the workspace root and `.git/refs/gatos/**`. +- Platform integrations: + - macOS: FSEvents + `chflags uchg` for hardened locks. + - Linux: inotify + `chattr +i` optional; default `chmod -w`. + - Windows: ReadDirectoryChangesW + Win32 file attributes. +- State machine: + 1. Load `.gatos/policy.yaml` (`locks[]`, `watcher.tasks[]`). + 2. Compute lock table keyed by canonical path glob + rule id. + 3. For each locked path, determine Grant status by reading `refs/gatos/grants/**` (compare `Proposal-Id` + `rule`). + 4. Apply/clear read-only bit accordingly and persist snapshot to `~/.config/gatos/watch/state.json`. + 5. Emit JSONL events (schema: `schemas/v1/watch/events.schema.json`). Events are appended to `~/.config/gatos/watch/events.log` and optionally forwarded to notification centers. +- Automation hooks (`watcher.tasks[]`): when a save/fold event hits a matching glob, the watcher either runs a local command (`run`) or enqueues a Job Plane manifest (`run_job`). Concurrency cap defaults to 2 with per-task `timeout_s` config. + +### 8.2 Managed Hooks & CLI + +- `gatos install-hooks` + - Detects OS/shell, writes shim scripts into `.git/hooks/`, and records metadata under `.git/gatos/hooks.json`. + - Hooks call back into `gatos hook run ` so logic lives in Rust. +- Hook behavior: + - `pre-commit`: ask watcher cache for lock status; reject staged diffs touching locked files without Grants. + - `pre-push`: recompute outbound refs, ensure Grants exist, verify PoF/PoE trailers per profile. Records success/failure under `refs/gatos/audit/policy/locks/`. + - `post-merge`/`post-checkout`: reapply masks and trigger watcher rescan. +- CLI additions: + - `gatos lock acquire [--wait/--no-wait] [--reason ]` — builds proposals (via governance API), polls Grants, prints table of results. + - `gatos lock release ` — revokes or supersedes grants. + - `gatos watch events --tail` — streams watcher JSONL output for debugging. +- Env overrides: `GATOS_NO_HOOKS=1` and `GATOS_NO_WATCH=1` are honored but print warnings and emit audit entries. + +### 8.3 Data Structures + +- `schemas/v1/policy/locks.schema.json` + - `locks[]`: `{ match: string, rule: string, read_only?: bool }` + - `watcher`: `{ poll_fallback_ms?: integer, tasks?: [] }` + - `watcher.tasks[]`: `{ when: "on_save"|"on_fold", match: string, run?: string[], run_job?: string, timeout_s?: integer }` +- `schemas/v1/watch/events.schema.json` + - Base fields: `ts`, `actor`, `path`, `rule`, `action`, `grant_id?`, `proposal_id?`, `remediation?`. + - `action` enum includes `deny.write`, `lock.applied`, `lock.released`, `task.started`, `task.succeeded`, `task.failed`. +- Hooks log structure: `refs/gatos/audit/locks/` commit tree contains `event.json` (canonical JSON) plus log attachments (stdout/ stderr) when tasks fire. + +### 8.4 Open Work + +- Privilege boundaries: future ADR will define optional elevated helper for immutable flags on corporate-managed machines. +- Sandboxed automation: evaluate WASM job runners to avoid arbitrary shell execution for `watcher.tasks[].run`. +- UX: integrate with `gatos ui` for desktop notifications and lock browser. #### Message Plane RPC: `messages.read` The daemon exposes `messages.read` over the JSONL RPC channel so workers and bridges can page through commit-backed topics without parsing Git directly. diff --git a/docs/cli/watch.md b/docs/cli/watch.md new file mode 100644 index 00000000..ad94a22f --- /dev/null +++ b/docs/cli/watch.md @@ -0,0 +1,43 @@ +# gatos watch / lock CLI + +This is a planning stub for ADR-0006. The implementation is in progress; this doc captures the intent so downstream teams can plan integrations. + +## `gatos watch` + +``` +git gatos watch [--once] [--state ] +``` + +- Starts the local enforcement daemon. Observes the working tree plus `.git/refs/gatos/**`. +- By default runs continuously; `--once` performs a single scan (useful in CI or troubleshooting). +- Emits structured JSONL events to stdout (see `schemas/v1/watch/events.schema.json`). Use `--state` to override the default state directory (`~/.config/gatos/watch`). + +## `gatos lock acquire` + +``` +git gatos lock acquire [--reason ] [--no-wait] +``` + +- Computes canonical lock ids for each path/glob. +- Creates governance proposals referencing the rule declared in `.gatos/policy.yaml`. +- Waits for Grants unless `--no-wait` is provided. The output lists `path`, `proposal`, `grant`, and `status` columns. + +## `gatos lock release` + +``` +git gatos lock release [--reason ] +``` + +- Revokes or supersedes existing grants, making the files writable again once the watcher processes the change. + +## `gatos install-hooks` + +``` +git gatos install-hooks [--force] +``` + +- Writes managed `pre-commit`, `pre-push`, and `post-checkout`/`post-merge` hooks. +- Hooks call back into `gatos hook run ` so logic stays centralized. +- `--force` reinstalls even if hooks already exist. + +> **Status:** CLI surface is being implemented. This document is intentionally aspirational so docs/spec stay aligned with ADR-0006. diff --git a/docs/decisions/ADR-0006/DECISION.md b/docs/decisions/ADR-0006/DECISION.md index b8ccd75d..cd7c65cf 100644 --- a/docs/decisions/ADR-0006/DECISION.md +++ b/docs/decisions/ADR-0006/DECISION.md @@ -1,5 +1,5 @@ --- -Status: Proposed +Status: Accepted Date: 2025-11-09 ADR: ADR-0006 Authors: [flyingrobots] diff --git a/docs/decisions/ADR-0007/DECISION.md b/docs/decisions/ADR-0007/DECISION.md new file mode 100644 index 00000000..988085d1 --- /dev/null +++ b/docs/decisions/ADR-0007/DECISION.md @@ -0,0 +1,41 @@ +--- +Status: Draft +Date: 2025-11-09 +ADR: ADR-0007 +Authors: [flyingrobots] +Requires: [ADR-0001, ADR-0004, ADR-0005] +Related: [ADR-0008, ADR-0009] +Tags: [API, GraphQL, State] +Schemas: + - schemas/v1/api/graphql_state_mapping.schema.json +--- + +# ADR-0007: GraphQL State API (Read-Only) + +## Scope +Expose a **read-only GraphQL API** for querying GATOS state snapshots (“shape”) with precise, single-roundtrip selection. + +## Rationale +State is hierarchical and interlinked; GraphQL matches the access pattern and avoids REST under/over-fetching. + +## Decision +1. **Endpoint**: `POST /api/v1/graphql`. +2. **Versioning**: HTTP header `x-gatos-api: v1`. Introspection **MAY** be disabled in prod. +3. **State Targeting**: Every query **MUST** include one of: + - `stateRef: ""` (recommended), or + - `refPath: "refs/gatos/state/public//"` (server resolves to head). +4. **Object Identity**: `id` fields are stable content IDs: `::`. +5. **Pagination**: Relay connections for lists; cursors are opaque, signed. +6. **Pointer Handling**: Opaque pointers (ADR-0004) are exposed as typed nodes; server **MUST NOT** auto-resolve private blobs. +7. **AuthZ**: Queries are filtered by policy view; unauthorised paths are elided or pointerized per privacy policy. +8. **Caching**: `ETag` = `Shape-Root` of the resolved state; `Cache-Control: immutable` for historical `stateRef`. +9. **Errors**: Deterministic, typed error codes (e.g., `POLICY_DENIED`, `STATE_NOT_FOUND`). +10. **Rate Limits**: Default window limits; per-actor overrides via policy. + +## Consequences +- Clients can build efficient UIs without bespoke endpoints. +- Server complexity moves into resolvers and policy filters. + +## Open Questions +- Schema publishing: static SDL vs generated at build from spec? +- Field deprecation cadence. diff --git a/docs/decisions/ADR-0008/DECISION.md b/docs/decisions/ADR-0008/DECISION.md new file mode 100644 index 00000000..9ffb589f --- /dev/null +++ b/docs/decisions/ADR-0008/DECISION.md @@ -0,0 +1,44 @@ +--- +Status: Draft +Date: 2025-11-09 +ADR: ADR-0008 +Authors: [flyingrobots] +Requires: [ADR-0001, ADR-0002, ADR-0003] +Related: [ADR-0007, ADR-0009] +Tags: [API, REST, Commands, Webhooks] +Schemas: + - schemas/v1/api/command_envelope.schema.json + - schemas/v1/api/webhook_delivery.schema.json +--- + +# ADR-0008: REST Commands & Webhooks + +## Scope +Define a minimal **REST mutation surface** for commands and a **webhook** mechanism for outbound events. + +## Rationale +Commands are side-effecting; REST is adequate and tool-friendly. Integrations need push-based notifications. + +## Decision +1. **Command Endpoint**: `POST /api/v1/commands` + - Body: + ```json + { "type": "", "args": {...}, "expect_state": "", "request_id": "" } + ``` + - Semantics: Return quickly with `{ "ack": true, "job_id": "" }` (async) or `{ "ok": true, ... }` (sync). + - **Idempotency**: `request_id` **MUST** dedupe within a 24h window. +2. **Result Plumbing**: Long work **SHOULD** create a Job (ADR-0002) and stream progress on the Message Plane. +3. **Webhooks**: + - Subscription admin endpoint: `POST /api/v1/webhooks`. + - Events (normative names): `proposal.created`, `approval.created`, `grant.created`, `grant.revoked`, `job.created`, `job.claimed`, `job.succeeded`, `job.failed`. + - Delivery: JSON body, `X-GATOS-Event`, `X-GATOS-Delivery`, **HMAC-SHA256** signature header. + - Retries with exponential backoff; dead-letter queue optional. +4. **AuthN/Z**: OAuth2/JWT bearer; scopes per command prefix; webhook secrets per subscription. +5. **HTTP Codes**: `202` for async ack, `200` for sync success, `409` for `EXPECT_STATE_MISMATCH`. + +## Consequences +- Clean separation of **mutations** (REST) vs **reads** (GraphQL). +- Webhooks unlock automation without polling. + +## Open Questions +- Do we surface a lightweight sync mode with server-side time budget? diff --git a/docs/decisions/ADR-0009/DECISION.md b/docs/decisions/ADR-0009/DECISION.md new file mode 100644 index 00000000..6c825f36 --- /dev/null +++ b/docs/decisions/ADR-0009/DECISION.md @@ -0,0 +1,43 @@ +--- +Status: Draft +Date: 2025-11-09 +ADR: ADR-0009 +Authors: [flyingrobots] +Requires: [ADR-0001, ADR-0005] +Related: [ADR-0007, ADR-0008] +Tags: [API, WebSocket, Streaming, Refs] +Schemas: + - schemas/v1/api/stream_frame.schema.json +--- + +# ADR-0009: Real-Time Streams & Ref Subscriptions + +## Scope +Provide **WebSocket** streams for ref updates and bus topics to enable reactive UIs and workers. + +## Rationale +UIs and workers need near-real-time updates without wasteful polling. + +## Decision +1. **Endpoint**: `GET /api/v1/stream` → WebSocket (JSON frames). +2. **Subscribe/Unsubscribe** frames: + ```json + {"op":"sub","refs":["refs/gatos/state/public/**","refs/mind/sessions/main"],"topics":["gatos.jobs.*"]} + {"op":"unsub","refs":[...],"topics":[...]} + ``` +3. **Server → client frames**: + ```json + { "kind":"ref.update","ref":"refs/...","old":"","new":"","seq":123,"ts":"" } + { "kind":"bus.event","topic":"gatos.jobs.pending","payload":{...},"seq":124,"ts":"" } + ``` +4. **Delivery**: At-least-once with monotonic `seq` per connection; clients MUST dedupe. +5. **Replay**: Optional `sinceSeq` on connect to catch recent history (bounded window). +6. **AuthZ**: Same policy filters as GraphQL; forbidden refs are not emitted. +7. **Heartbeat**: `{"kind":"ping"}` / `{"kind":"pong"}` every 30s. + +## Consequences +- Reactive UX and workers with minimal glue. +- Requires sequence indexing on the server side. + +## Open Questions +- Cross-node streaming for federation (see ADR-0012) — do we bridge or require local subscription? diff --git a/docs/decisions/ADR-0010/DECISION.md b/docs/decisions/ADR-0010/DECISION.md new file mode 100644 index 00000000..8516a7ff --- /dev/null +++ b/docs/decisions/ADR-0010/DECISION.md @@ -0,0 +1,39 @@ +--- +Status: Draft +Date: 2025-11-09 +ADR: ADR-0010 +Authors: [flyingrobots] +Requires: [ADR-0002, ADR-0003, ADR-0007, ADR-0008] +Related: [] +Tags: [Integration, GitHub App, CI/CD, Governance] +--- + +# ADR-0010: First-Class GitHub App Integration + +## Scope +Define a **GitHub App** that enforces policy, brokers commands, and mirrors necessary context into the ledger. + +## Rationale +Most teams live on GitHub; native enforcement and UX reduce friction. + +## Decision +1. **Capabilities**: + - Read PRs, reviews, comments; write status checks; limited content permissions. + - Webhook ingestion → map to ledger events (`pr.opened`, `review.submitted`, etc.). +2. **Command Triggers**: + - PR comment prefix `git mind ...` or `/gatos ...` → hits `POST /api/v1/commands`. + - Responses posted as PR comments (summarised) + status checks. +3. **Policy Enforcement**: + - `gatos-policy` exposes `merge_gate` check; the app **MUST** block merges until grants exist for gated actions (ADR-0003). + - Status checks: `gatos/policy`, `gatos/fold`, `gatos/jobs`. +4. **Attestations**: + - Job and fold proofs surfaced as artifacts/annotations with digest links. +5. **Security**: + - Rotate app secrets, least-privilege scopes, per-repo installation. + +## Consequences +- One consistent control point for GitHub-centric teams. +- Another moving part to maintain (secrets, webhooks, scale). + +## Open Questions +- Optional mapping of GH approvals → governance approvals (off by default)? diff --git a/docs/decisions/ADR-0011/DECISION.md b/docs/decisions/ADR-0011/DECISION.md new file mode 100644 index 00000000..cece7301 --- /dev/null +++ b/docs/decisions/ADR-0011/DECISION.md @@ -0,0 +1,36 @@ +--- +Status: Draft +Date: 2025-11-09 +ADR: ADR-0011 +Authors: [flyingrobots] +Requires: [ADR-0001, ADR-0005] +Related: [] +Tags: [Analytics, Export, SQL, Parquet] +--- + +# ADR-0011: GATOS-to-SQL/Parquet Exporter + +## Scope +Provide an **exporter** that materializes the ledger and state into **SQLite** and/or **Parquet** for analytics. + +## Rationale +Teams want ad-hoc analytics without learning internals; SQL + columnar files cover 95% of needs. + +## Decision +1. **CLI**: `gatos export --format sqlite|parquet --out [--since ]` +2. **Schema (SQLite)**: + - `commits(id, parent_id, author, ts, msg, trailers JSON)` + - `events(id, ns, kind, payload JSON, commit_id)` + - `state_nodes(id, path, digest, shape JSON, state_ref)` + - `pointers(digest, location, capability, state_ref)` # ADR-0004 + - `jobs(id, status, started_at, finished_at, proof_digest)`# ADR-0002 + - `governance(proposals, approvals, grants, revocations)` # ADR-0003 +3. **Incremental**: `--since` resumes from last exported commit. +4. **Determinism**: Stable ordering; integrity table with exported root SHA. + +## Consequences +- Easy dashboards, BI, notebooks. +- Must be careful not to leak private overlay data (only pointer metadata exported). + +## Open Questions +- Do we support query pushdown (pre-filtered exports) in v1? diff --git a/docs/decisions/ADR-0012/DECISION.md b/docs/decisions/ADR-0012/DECISION.md new file mode 100644 index 00000000..2052d7af --- /dev/null +++ b/docs/decisions/ADR-0012/DECISION.md @@ -0,0 +1,45 @@ +--- +Status: Draft +Date: 2025-11-09 +ADR: ADR-0012 +Authors: [flyingrobots] +Requires: [ADR-0001, ADR-0004, ADR-0005, ADR-0007, ADR-0009] +Related: [] +Tags: [Federation, Mounts, Cross-Repo] +Schemas: + - schemas/v1/federation/mounts.schema.json +--- + +# ADR-0012: Federated Repositories & Mounts + +## Scope +Allow a repository to **subscribe to** and **mount** public state from other repos/nodes. + +## Rationale +Enables decentralized composition (e.g., central governance repo consumed by many project repos) without monorepos. + +## Decision +1. **Config**: `.gatos/federation.yaml` declares mounts: + ```yaml + mounts: + - name: governance + source: "git+https://example.com/org/gov-repo.git#refs/gatos/state/public/policy/main" + verify: "ed25519:" + refresh: "PT5M" + ``` +2. **On-Disk Refs**: + - `refs/gatos/remotes//state//` (read-only mirror). +3. **Resolution**: + - Fetch at refresh cadence or on demand. + - Verify signed commit against `verify` key in trust graph; reject otherwise. +4. **Usage**: + - State folds can read mounted refs as input; mounts MUST NOT be mutated locally. +5. **UI/API**: + - GraphQL exposes mounts under a separate namespace; streams emit updates when mount advances. + +## Consequences +- Clean, verifiable cross-repo composition. +- Requires remote availability and verification logic. + +## Open Questions +- Cycles between mounts (A mounts B, B mounts A) — forbid or handle with depth limits? diff --git a/docs/decisions/ADR-0013/DECISION.md b/docs/decisions/ADR-0013/DECISION.md new file mode 100644 index 00000000..3198ebc7 --- /dev/null +++ b/docs/decisions/ADR-0013/DECISION.md @@ -0,0 +1,35 @@ +--- +Status: Draft +Date: 2025-11-09 +ADR: ADR-0013 +Authors: [flyingrobots] +Requires: [ADR-0001, ADR-0005] +Related: [] +Tags: [Performance, State Engine, Caching] +--- + +# ADR-0013: Partial & Lazy Folds + +## Scope +Specify **incremental** and **lazy** state folding to avoid recomputing the entire shape on small changes. + +## Rationale +Large repos need sub-linear recomputation to stay responsive. + +## Decision +1. **Fold Units**: Define shard boundaries (by namespace/path). Each unit has a cache key: + ```text + key = blake3(code_hash || policy_root || input_event_ids || upstream_unit_digests) + ``` +2. **Cache**: `gatos-kv` stores fold outputs per unit with `key -> digest`. +3. **Invalidation**: On new events, compute affected unit set via dependency graph; only recompute those. +4. **Lazy Materialization**: Units not requested by a client **MAY** remain cold; materialize on demand. +5. **Concurrency**: Units may fold in parallel if dependencies permit; global join computes `Shape-Root`. +6. **Telemetry**: Commit trailers: `Fold-Cache-Hit: `, `Fold-Units: `, `Fold-Duration: `. + +## Consequences +- Orders-of-magnitude faster folds; predictable latencies. +- Requires dependency modeling and careful cache invalidation. + +## Open Questions +- Do we allow background prewarming or keep strictly on-demand for MVP? diff --git a/docs/decisions/ADR-0014/DECISION.md b/docs/decisions/ADR-0014/DECISION.md new file mode 100644 index 00000000..fa0643a3 --- /dev/null +++ b/docs/decisions/ADR-0014/DECISION.md @@ -0,0 +1,45 @@ +--- +Status: Draft +Date: 2025-11-09 +ADR: ADR-0014 +Authors: [flyingrobots] +Requires: [ADR-0001, ADR-0005] +Related: [ADR-0002] +Tags: [Attestation, Proofs, State Engine] +Schemas: + - schemas/v1/state/proof_of_fold_envelope.schema.json +--- + +# ADR-0014: Proof-Of-Fold (Attestation of State) + +## Scope +Define a **cryptographic attestation** for state folds that proves which code and inputs produced a given `Shape-Root`. + +## Rationale +Jobs already attest execution (ADR-0002 PoE). Folds need equivalent integrity guarantees. + +## Decision +1. **Envelope** (canonical JSON): + ```json + { + "fold_id": "blake3:<...>", + "engine": { "program": "sha256:<...>", "version": "x.y.z", "platform": "..." }, + "policy_root": "sha256:<...>", + "inputs": { "events": ["..."], "upstreams": ["blake3:...", "..."] }, + "output_shape_root": "blake3:<...>", + "metrics": { "units": "N", "duration_ms": "M" }, + "ts": "" + } + ``` +2. **Signature**: Engine signs `blake3(envelope)` with its key; trailers: + - `Proof-Of-Fold: blake3:` + - `Fold-Sig: ed25519:` +3. **Storage**: Persist envelope under `refs/gatos/audit/proofs/folds/`. +4. **Verification**: `gatos fold verify ` checks engine key in trust graph, envelope hash, and output match. + +## Consequences +- Auditable state derivations; reproducibility at the protocol layer. +- Requires key management for fold engines. + +## Open Questions +- Do we include WASM module hash for portable fold engines in v1? diff --git a/docs/decisions/README.md b/docs/decisions/README.md index ce733b37..edfe83d3 100644 --- a/docs/decisions/README.md +++ b/docs/decisions/README.md @@ -33,4 +33,12 @@ Each ADR will have a status, typically one of the following: | [ADR-0003](./ADR-0003/DECISION.md) | Consensus Governance for Gated Actions | Accepted | 2025-11-08 | | [ADR-0004](./ADR-0004/DECISION.md) | Hybrid Privacy Model (Public Projection + Private Overlay) | Accepted | 2025-11-09 | | [ADR-0005](./ADR-0005/DECISION.md) | Message Plane — A Git-Native, Commit-Backed Message Bus | Proposed | 2025-11-09 | -| [ADR-0006](./ADR-0006/DECISION.md) | Local Enforcement — Watcher Daemon & Git Hooks | Proposed | 2025-11-09 | +| [ADR-0006](./ADR-0006/DECISION.md) | Local Enforcement — Watcher Daemon & Git Hooks | Accepted | 2025-11-09 | +| [ADR-0007](./ADR-0007/DECISION.md) | GraphQL State API (Read-Only) | Draft | 2025-11-09 | +| [ADR-0008](./ADR-0008/DECISION.md) | REST Commands & Webhooks | Draft | 2025-11-09 | +| [ADR-0009](./ADR-0009/DECISION.md) | Real-Time Streams & Ref Subscriptions | Draft | 2025-11-09 | +| [ADR-0010](./ADR-0010/DECISION.md) | First-Class GitHub App Integration | Draft | 2025-11-09 | +| [ADR-0011](./ADR-0011/DECISION.md) | GATOS-to-SQL/Parquet Exporter | Draft | 2025-11-09 | +| [ADR-0012](./ADR-0012/DECISION.md) | Federated Repositories & Mounts | Draft | 2025-11-09 | +| [ADR-0013](./ADR-0013/DECISION.md) | Partial & Lazy Folds | Draft | 2025-11-09 | +| [ADR-0014](./ADR-0014/DECISION.md) | Proof-Of-Fold (Attestation of State) | Draft | 2025-11-09 | diff --git a/docs/guide/CHAPTER-005.md b/docs/guide/CHAPTER-005.md index 295a04b5..888dffd5 100644 --- a/docs/guide/CHAPTER-005.md +++ b/docs/guide/CHAPTER-005.md @@ -82,6 +82,13 @@ By intercepting all writes, the Stargate can run powerful server-side **`pre-rec 2. **Attestation & Validation:** For performance, GATOS clients can attach **attestation trailers** to their commits. These trailers contain pre-computed hashes of the proposed changes. The Stargate's `pre-receive` hook validates these trailers in constant time per push with respect to file count, given attestation trailers and object pinning—verifying integrity without walking every file. 3. **Linear History:** The hook enforces that all journals are fast-forward only, preventing history rewrites and preserving the immutability of the ledger. +### Local Guardrails for Humans + +- **Watcher (`git gatos watch`)** — runs on every workstation, consulting `.gatos/policy.yaml` `locks[]`. Files remain read-only until the required Grants land, giving artists a Perforce-style “read-only until lock” experience. +- **Locks CLI** — `git gatos lock acquire assets/hero.obj` opens a governance proposal, waits for quorum, and tells the watcher to unmask the path once approved. `git gatos lock release` revokes the Grant when the work is done. +- **Managed hooks** — `git gatos install-hooks` writes `pre-commit`, `pre-push`, and `post-checkout/merge` hooks. They reject accidental edits/pushes that would be denied later and record bypasses under `refs/gatos/audit/locks/*` if a user disables them. +- **Automation** — `.gatos/policy.yaml` can declare `watcher.tasks[]` that run deterministic tools on save (formatters, lint) or enqueue Job Plane runs (`run_job: format.proto`). Each task inherits the lock + privacy policies defined elsewhere in the repo. + This local-first enforcement provides low-latency, high-security writes that would be impossible on a public SaaS platform. ## The Magic Mirror diff --git a/docs/guide/CHAPTER-006.md b/docs/guide/CHAPTER-006.md index a762a4ca..ca836dde 100644 --- a/docs/guide/CHAPTER-006.md +++ b/docs/guide/CHAPTER-006.md @@ -148,4 +148,6 @@ Storage: `refs/gatos/jobs//result` (commit whose tree contains the resul The Message and Job planes are what make GATOS a dynamic, living system. `gatos-message-plane` provides the nervous system, allowing for reliable, auditable communication. `gatos-compute` provides the motor function, enabling the system to perform work in a distributed and verifiable way. +> **Local tie-in (ADR-0006):** When a policy declares `watcher.tasks[] run_job: `, the watcher daemon publishes a Job as soon as a matching file changes. That job advertises itself on the Message Plane just like any other workload, so infrastructure tasks kicked off by local edits still flow through the normal Job/PoE lifecycle. + Together, they transform the GATOS repository from a passive record of history into an active, programmable "Operating Surface" that can orchestrate complex, distributed workflows with an unprecedented level of trust and transparency. diff --git a/schemas/README.md b/schemas/README.md index 9b28cc84..55d3f0b2 100644 --- a/schemas/README.md +++ b/schemas/README.md @@ -38,3 +38,6 @@ Message Plane envelopes - Envelopes live in `schemas/v1/message-plane/` and describe commits written under `refs/gatos/messages//head`. - Every message commit MUST contain a `message/envelope.json` blob that validates against `event_envelope.schema.json` and is serialized as Canonical JSON (UTF-8, sorted keys, no insignificant whitespace). - Optional attachments are stored under `message/attachments/` and referenced via logical names in the envelope `refs` map; attachments never influence the canonical `content_id`. +- Local enforcement (ADR-0006): + - `schemas/v1/policy/locks.schema.json` extends `.gatos/policy.yaml` with `locks[]` and `watcher` blocks. + - `schemas/v1/watch/events.schema.json` defines the JSONL payload emitted by `gatos watch`. diff --git a/schemas/v1/policy/locks.schema.json b/schemas/v1/policy/locks.schema.json index 94a859af..7b7bf50f 100644 --- a/schemas/v1/policy/locks.schema.json +++ b/schemas/v1/policy/locks.schema.json @@ -1,87 +1,48 @@ { "$schema": "http://json-schema.org/draft-07/schema#", - "$id": "https://gatos.io/schemas/v1/policy/locks.schema.json", - "title": "GATOS Policy Locks & Watcher Configuration", + "title": "GATOS Policy Locks + Watcher Config (v1)", "type": "object", - "additionalProperties": false, "properties": { "locks": { "type": "array", - "description": "Glob-based lock declarations bound to governance rules.", "items": { "type": "object", "required": ["match", "rule"], - "additionalProperties": false, "properties": { - "match": { - "type": "string", - "description": "Glob pattern (gitignore syntax) for files subject to the lock." - }, - "rule": { - "type": "string", - "description": "ADR-0003 governance rule id controlling the lock lifecycle." - }, - "read_only": { - "type": "boolean", - "description": "Whether the watcher should apply filesystem read-only bits.", - "default": true - } + "match": { "type": "string", "description": "Glob relative to repo root" }, + "rule": { "type": "string", "description": "Governance rule id (ADR-0003)" }, + "read_only": { "type": "boolean", "default": true } } } }, "watcher": { "type": "object", - "additionalProperties": false, "properties": { - "poll_fallback_ms": { - "type": "integer", - "minimum": 0, - "description": "Polling interval used when native filesystem notifications are unavailable." - }, + "poll_fallback_ms": { "type": "integer", "minimum": 0, "default": 5000 }, "tasks": { "type": "array", - "description": "Local automation tasks triggered by watcher events.", "items": { "type": "object", "required": ["when", "match"], - "additionalProperties": false, "properties": { - "when": { - "type": "string", - "enum": ["on_save", "on_fold", "on_change"], - "description": "Trigger condition." - }, - "match": { - "type": "string", - "description": "Glob selecting files that activate this task." - }, - "run_command": { + "when": { "enum": ["on_save", "on_fold" ] }, + "match": { "type": "string" }, + "run": { "type": "array", "items": { "type": "string" }, - "description": "Command (argv) executed locally." - }, - "run_job": { - "type": "string", - "description": "Job Plane manifest id to execute in loopback mode." - }, - "timeout_seconds": { - "type": "integer", - "minimum": 1, - "description": "Per-task timeout override (defaults to 120 seconds)." + "description": "Local command to run with deterministic args" }, - "concurrency": { - "type": "integer", - "minimum": 1, - "description": "Max concurrent instances for this task (defaults to 2)." - } + "run_job": { "type": "string", "description": "Job manifest id to enqueue" }, + "timeout_s": { "type": "integer", "minimum": 1, "default": 120 } }, - "anyOf": [ - { "required": ["run_command"] }, + "oneOf": [ + { "required": ["run"] }, { "required": ["run_job"] } ] } } } } - } + }, + "additionalProperties": false } diff --git a/schemas/v1/watch/events.schema.json b/schemas/v1/watch/events.schema.json index d7eab8ec..a75b16e5 100644 --- a/schemas/v1/watch/events.schema.json +++ b/schemas/v1/watch/events.schema.json @@ -1,49 +1,29 @@ { "$schema": "http://json-schema.org/draft-07/schema#", - "$id": "https://gatos.io/schemas/v1/watch/events.schema.json", - "title": "GATOS Watcher Event", + "title": "GATOS Watcher Event (v1)", "type": "object", - "additionalProperties": false, - "required": ["ts", "rule", "actor", "path", "action"], + "required": ["ts", "actor", "path", "rule", "action"], "properties": { - "ts": { - "type": "string", - "format": "date-time", - "description": "Timestamp in UTC." - }, - "rule": { - "type": "string", - "description": "Governance rule id (ADR-0003)." - }, - "actor": { - "type": "string", - "description": "Local actor identity (user:alice, agent:renderbot, etc.)." - }, - "path": { - "type": "string", - "description": "Relative path within the repository." - }, + "ts": { "type": "string", "format": "date-time" }, + "actor": { "type": "string" }, + "path": { "type": "string" }, + "rule": { "type": "string" }, "action": { "type": "string", "enum": [ - "deny.write", - "deny.delete", "lock.applied", "lock.released", + "deny.write", "task.started", - "task.finished", + "task.succeeded", "task.failed" - ], - "description": "Event type emitted by the watcher." - }, - "remediation": { - "type": "string", - "description": "User-facing remediation hint (optional)." + ] }, - "details": { - "type": "object", - "description": "Additional context such as task id, error messages, or grant ids.", - "additionalProperties": true - } - } + "grant_id": { "type": "string" }, + "proposal_id": { "type": "string" }, + "task": { "type": "string" }, + "remediation": { "type": "string" }, + "details": { "type": "object", "additionalProperties": true } + }, + "additionalProperties": false } From 6cb9a2009eab19aa29382bb35ff4472d719254ed Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Mon, 17 Nov 2025 00:51:37 -0800 Subject: [PATCH 09/25] ADR-0007 fix some details that were missing --- docs/decisions/ADR-0007/DECISION.md | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/docs/decisions/ADR-0007/DECISION.md b/docs/decisions/ADR-0007/DECISION.md index 988085d1..e9dfa5db 100644 --- a/docs/decisions/ADR-0007/DECISION.md +++ b/docs/decisions/ADR-0007/DECISION.md @@ -20,22 +20,29 @@ State is hierarchical and interlinked; GraphQL matches the access pattern and av ## Decision 1. **Endpoint**: `POST /api/v1/graphql`. -2. **Versioning**: HTTP header `x-gatos-api: v1`. Introspection **MAY** be disabled in prod. +2. **Versioning & Schema**: HTTP header `x-gatos-api: v1`. The server publishes the canonical SDL at `GET /api/v1/graphql/schema` (checked into the repo under `api/graphql/schema.graphql`). Introspection stays enabled in non-production environments; in prod it is disabled and the SDL endpoint is authoritative. 3. **State Targeting**: Every query **MUST** include one of: - `stateRef: ""` (recommended), or - `refPath: "refs/gatos/state/public//"` (server resolves to head). 4. **Object Identity**: `id` fields are stable content IDs: `::`. -5. **Pagination**: Relay connections for lists; cursors are opaque, signed. -6. **Pointer Handling**: Opaque pointers (ADR-0004) are exposed as typed nodes; server **MUST NOT** auto-resolve private blobs. -7. **AuthZ**: Queries are filtered by policy view; unauthorised paths are elided or pointerized per privacy policy. -8. **Caching**: `ETag` = `Shape-Root` of the resolved state; `Cache-Control: immutable` for historical `stateRef`. -9. **Errors**: Deterministic, typed error codes (e.g., `POLICY_DENIED`, `STATE_NOT_FOUND`). -10. **Rate Limits**: Default window limits; per-actor overrides via policy. +5. **Pagination**: Relay connections for lists; cursors are opaque, HMAC-signed. `first`/`last` arguments are clamped to `[1, 500]` (default 100). Ordering is deterministic (lexicographic by path unless a field specifies a different order). Requests exceeding the limit return `USER_INPUT_ERROR` with message "PAGE_LIMIT_EXCEEDED". +6. **Pointer Handling**: Opaque pointers (ADR-0004) resolve to a dedicated `OpaquePointerNode { kind, algo, digest, location, capability }`. The API **MUST NOT** download private blobs automatically. +7. **AuthZ Behaviour**: Policy filters (ADR-0003/0004) apply per field. If the actor lacks read access: + - When a pointerized projection exists, return the `OpaquePointerNode` and append a GraphQL error with `code: "POLICY_DENIED"`. + - When no projection exists, return `null` and append the same error. Clients **MUST** inspect `errors[]` to detect truncation. +8. **Caching**: `ETag` = `Shape-Root` of the resolved state and the response body **MUST** include `shapeRoot` and `stateRefResolved` top-level fields. `Cache-Control: immutable` applies to historical `stateRef`; `refPath` responses are `Cache-Control: no-cache` so clients revalidate. +9. **Errors**: Deterministic JSON error extensions: + - `POLICY_DENIED` (403) + - `STATE_NOT_FOUND` (404) + - `PAGE_LIMIT_EXCEEDED` (422) + - `INVALID_STATE_REF` (400) + - `INTERNAL_ERROR` (500, last resort) + Each error entry includes `extensions.code` and `extensions.ref` (support ULID). +10. **Rate Limits**: Default 600 requests / 60s window per actor, enforced via shared limiter. Policy rules may override per namespace/project; responses include `X-RateLimit-Remaining` headers. ## Consequences - Clients can build efficient UIs without bespoke endpoints. - Server complexity moves into resolvers and policy filters. ## Open Questions -- Schema publishing: static SDL vs generated at build from spec? - Field deprecation cadence. From 7ca80b035f0ac1989dddba46515f41342d94c951 Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Mon, 17 Nov 2025 01:20:36 -0800 Subject: [PATCH 10/25] docs: more ADR refinement --- Cargo.toml | 1 + README.md | 4 + ROADMAP.md | 21 ++++ api/graphql/schema.graphql | 8 ++ crates/gatos-graphql/Cargo.toml | 17 ++++ crates/gatos-graphql/src/lib.rs | 50 ++++++++++ docs/ROADMAP.md | 27 +++++ docs/SPEC.md | 55 ++++++++++- docs/decisions/ADR-0003/DECISION.md | 42 ++++---- docs/decisions/ADR-0006/DECISION.md | 25 +++++ docs/decisions/ADR-0007/DECISION.md | 23 ++++- docs/decisions/ADR-0008/DECISION.md | 48 ++++++--- docs/decisions/ADR-0009/DECISION.md | 38 +++++-- docs/decisions/ADR-0010/DECISION.md | 20 ++++ docs/decisions/ADR-0011/DECISION.md | 11 +++ docs/decisions/ADR-0012/DECISION.md | 8 ++ docs/decisions/ADR-0013/DECISION.md | 10 ++ docs/decisions/ADR-0014/DECISION.md | 14 +++ docs/decisions/README.md | 2 +- docs/guide/CHAPTER-008.md | 24 +++++ schemas/README.md | 2 + schemas/v1/api/command_envelope.schema.json | 33 +++++++ .../v1/api/graphql_state_mapping.schema.json | 21 ++++ schemas/v1/api/stream_frame.schema.json | 99 +++++++++++++++++++ schemas/v1/api/webhook_delivery.schema.json | 35 +++++++ 25 files changed, 589 insertions(+), 49 deletions(-) create mode 100644 api/graphql/schema.graphql create mode 100644 crates/gatos-graphql/Cargo.toml create mode 100644 crates/gatos-graphql/src/lib.rs create mode 100644 schemas/v1/api/command_envelope.schema.json create mode 100644 schemas/v1/api/graphql_state_mapping.schema.json create mode 100644 schemas/v1/api/stream_frame.schema.json create mode 100644 schemas/v1/api/webhook_delivery.schema.json diff --git a/Cargo.toml b/Cargo.toml index 25afcc98..fecd5542 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -6,6 +6,7 @@ members = [ "crates/gatos-ledger-git", "crates/gatos-ledger", "crates/gatos-message-plane", + "crates/gatos-graphql", "crates/gatos-echo", "crates/gatos-policy", "crates/gatos-kv", diff --git a/README.md b/README.md index e51094b7..43b864fc 100644 --- a/README.md +++ b/README.md @@ -109,6 +109,10 @@ Store sensitive data (PII, large datasets) in private stores, but commit their * Artists and infra engineers get Perforce-style safety without leaving Git. The `gatos watch` daemon keeps locked files read-only until a governance Grant exists, `gatos lock acquire/release` walks you through the approval flow, and managed Git hooks (`gatos install-hooks`) block bad pushes before they ever hit the remote—while logging any bypass under `refs/gatos/audit/locks/*`. +### 5. GraphQL Truth Service + +Need a typed API for dashboards or custom UIs? The GraphQL endpoint (`POST /api/v1/graphql`) lets you query any state snapshot by commit (`stateRef`) or ref (`refPath`), with Relay pagination, rate limiting, and automatic policy filtering. Opaque pointers surface private blobs without leaking bytes, so you can build richly typed clients on top of verified state. + ----- ## How it Works: The 5 Planes diff --git a/ROADMAP.md b/ROADMAP.md index d04c8005..feb3c700 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -47,6 +47,7 @@ These are explicit non-goals until after the core truth machine is working: | **M4** | Job Plane + Proof-of-Execution (PoE) | | **M5** | Opaque Pointers + privacy-preserving projection | | **M6** | Explorer off-ramp + Explorer-Root verification | +| **M6.5** | GraphQL State API (read-only) | | **M7** | Proof-of-Experiment (PoX) + reproduce/verify CLI | | **M8** | Demos & examples (Bisect, ADR-as-policy, PoX) | | **M9** | Conformance suite + `gatos doctor` | @@ -243,6 +244,26 @@ These are explicit non-goals until after the core truth machine is working: --- +## M6.5 — GraphQL State API + +**Goal:** Provide a typed, cache-friendly read surface for state snapshots. + +**Deliverables:** + +- API service (crate or module) exposing `POST /api/v1/graphql` with the schema defined in `api/graphql/schema.graphql`. +- SDL publishing endpoint + CI check to keep schema + resolvers in sync. +- Resolver contract honoring `stateRef` / `refPath`, Relay pagination (`first/last`, opaque cursors, max 500), opaque pointer nodes, and deterministic ordering. +- Policy + privacy integration mirroring ADR-0003/0004 (return `POLICY_DENIED` errors; never auto-fetch private blobs). +- Rate-limiting (600 req / 60s default) and caching semantics (`shapeRoot`, `stateRefResolved`, `Cache-Control`/`ETag`). + +**Done when:** + +- Clients can issue GraphQL queries against historical or live state and receive deterministic results tied to a specific `stateRef`. +- SDL + schema live in-repo and the service passes conformance tests covering pagination, pointer handling, and error codes. +- Docs (README, SPEC, Guide) describe how to target states, interpret errors, and respect policy filters. + +--- + ## M7 — Proof-of-Experiment (PoX) & Reproduce/Verify **Goal:** Make experiments machine-checkable. diff --git a/api/graphql/schema.graphql b/api/graphql/schema.graphql new file mode 100644 index 00000000..c9b9fc25 --- /dev/null +++ b/api/graphql/schema.graphql @@ -0,0 +1,8 @@ +""" +Placeholder SDL for the GraphQL state API (ADR-0007). +Actual schema will be generated once resolvers land. +""" + +type Query { + _placeholder: Boolean! +} diff --git a/crates/gatos-graphql/Cargo.toml b/crates/gatos-graphql/Cargo.toml new file mode 100644 index 00000000..956048b7 --- /dev/null +++ b/crates/gatos-graphql/Cargo.toml @@ -0,0 +1,17 @@ +[package] +name = "gatos-graphql" +version = "0.1.0" +edition = "2021" +license = "Apache-2.0" +description = "GraphQL gateway scaffolding for the GATOS project (ADR-0007)." + +[dependencies] +async-trait = { version = "0.1", optional = true } +serde = { version = "1", features = ["derive"], optional = true } +thiserror = "1" + +[features] +server = ["async-trait", "serde"] + +[lib] +path = "src/lib.rs" diff --git a/crates/gatos-graphql/src/lib.rs b/crates/gatos-graphql/src/lib.rs new file mode 100644 index 00000000..d9a44db4 --- /dev/null +++ b/crates/gatos-graphql/src/lib.rs @@ -0,0 +1,50 @@ +//! Placeholder GraphQL gateway crate. +//! +//! This crate will eventually expose the `POST /api/v1/graphql` endpoint +//! described in ADR-0007. Right now it only defines struct stubs so other +//! crates can start wiring dependencies without the server existing yet. ++ ++#[cfg(feature = "server")] ++pub mod api { ++ use serde::{Deserialize, Serialize}; ++ ++ /// Parameters accepted by the GraphQL endpoint. ++ #[derive(Debug, Clone, Serialize, Deserialize)] ++ pub struct GraphQlRequest { ++ pub query: String, ++ #[serde(default)] ++ pub variables: serde_json::Value, ++ #[serde(default)] ++ pub operation_name: Option, ++ #[serde(default)] ++ pub state_ref: Option, ++ #[serde(default)] ++ pub ref_path: Option, ++ } ++ ++ /// Standard GraphQL response envelope. ++ #[derive(Debug, Clone, Serialize, Deserialize)] ++ pub struct GraphQlResponse { ++ pub data: Option, ++ #[serde(default)] ++ pub errors: Vec, ++ #[serde(default)] ++ pub state_ref_resolved: Option, ++ #[serde(default)] ++ pub shape_root: Option, ++ } ++} ++ ++/// Minimal marker trait so downstream crates can depend on this crate before ++/// the real server lands. +pub trait GraphQlService { ++ /// Executes a GraphQL request and returns the JSON response body. ++ fn execute(&self, request: &str) -> Result; ++} ++ ++/// Placeholder error type. ++#[derive(Debug, thiserror::Error)] ++pub enum GraphQlError { ++ #[error("not yet implemented" )] ++ NotImplemented, ++} diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md index c5a7d561..69384002 100644 --- a/docs/ROADMAP.md +++ b/docs/ROADMAP.md @@ -120,6 +120,7 @@ These are explicit non-goals until after the core truth machine is working: | **M4** | Job Plane + Proof-of-Execution (PoE) | | **M5** | Opaque Pointers + privacy-preserving projection | | **M6** | Explorer off-ramp + Explorer-Root verification | +| **M6.5** | GraphQL State API (read-only) | | **M7** | Proof-of-Experiment (PoX) + reproduce/verify CLI | | **M8** | Demos & examples (Bisect, ADR-as-policy, PoX) | | **M9** | Conformance suite + `gatos doctor` | @@ -518,6 +519,32 @@ These are explicit non-goals until after the core truth machine is working: --- +## **M6.5 — GraphQL State API (Read-Only)** + + + +**3–4 weeks** + +### Goals + +- Typed, single-roundtrip read access to any committed state snapshot. + +### Deliverables + +- Gateway crate/binary exposing `POST /api/v1/graphql` plus `GET /api/v1/graphql/schema` (SDL published from `api/graphql/schema.graphql`). +- Resolver layer honoring `stateRef` / `refPath`, enforcing Relay pagination (opaque cursors, `[1,500]` bounds, deterministic ordering), and returning `OpaquePointerNode` objects when policy hides data. +- Integration with policy/privacy planes so denied paths surface as GraphQL errors with `POLICY_DENIED` codes and never fetch private blobs automatically. +- Rate limiting (default 600 requests / 60s) with `X-RateLimit-*` headers, plus caching semantics (`shapeRoot`, `stateRefResolved`, `ETag`). +- Schema + conformance tests checked into the repo (SDL diff, pagination/authorization fixtures). + +### Done When + +- Clients can query historical (`stateRef`) or head (`refPath`) state and get stable JSON tied to a `shapeRoot`. +- SDL + resolver contract pass automated tests covering pagination limits, pointer handling, and error codes. +- Docs (README, SPEC, Guide) teach developers how to call the API and interpret `POLICY_DENIED` responses. + +--- + ## **M7 — Proof-of-Experiment (PoX) & Reproduce/Verify** diff --git a/docs/SPEC.md b/docs/SPEC.md index fb221b32..759ac387 100644 --- a/docs/SPEC.md +++ b/docs/SPEC.md @@ -59,8 +59,13 @@ - [21. Local Enforcement (Watcher + Hooks)](#21-local-enforcement) - [21.1 Watcher Daemon (`gatos watch`)](#211-watcher-daemon-gatos-watch) - [21.2 Git Hooks (managed)](#212-git-hooks-managed) - - [21.3 Lock UX & Automation](#213-lock-ux--automation) - - [21.4 Security Notes](#214-security-notes) +- [21.3 Lock UX & Automation](#213-lock-ux--automation) +- [21.4 Security Notes](#214-security-notes) +- [22. GraphQL State API](#22-graphql-state-api) + - [22.1 Request Model](#221-request-model) + - [22.2 Schema & Types](#222-schema--types) + - [22.3 Policy & Errors](#223-policy--errors) + - [22.4 Caching & Rate Limits](#224-caching--rate-limits) - [Glossary](#glossary) @@ -1577,6 +1582,52 @@ watcher: --- +## 22. GraphQL State API + + + +Provides a typed, read-only HTTP endpoint for querying materialized state snapshots. + +### 22.1 Request Model + + + +- Endpoint: `POST /api/v1/graphql` with `Content-Type: application/json` and `x-gatos-api: v1` header. +- Every query MUST specify either `stateRef: ""` or `refPath: "refs/gatos/state/public//"`. When both are provided, `stateRef` wins. Servers resolve `refPath` to a concrete commit and expose it as `stateRefResolved` in the response. +- Lists use Relay connections with `first`/`last` arguments. Servers clamp to `[1, 500]`, default `100`. Cursors are opaque, HMAC-signed strings that callers MUST treat as black boxes. +- Requests exceeding the limit MUST return `USER_INPUT_ERROR` with `extensions.code = "PAGE_LIMIT_EXCEEDED"`. + +### 22.2 Schema & Types + + + +- Canonical SDL lives at `api/graphql/schema.graphql`; CI diffing is normative. +- Pointerized fields resolve to `OpaquePointerNode { kind, algo, digest, location, capability }` per ADR-0004. The API MUST NOT fetch private blobs automatically. +- Object IDs follow `::` (BLAKE3) for stability. +- Schema mapping metadata is captured in `schemas/v1/api/graphql_state_mapping.schema.json` so tooling can validate which state refs back which GraphQL types. + +### 22.3 Policy & Errors + + + +- Authz inherits ADR-0003/0004: when a user lacks read access but a pointer projection exists, resolvers return the pointer plus a GraphQL error entry with `extensions.code = "POLICY_DENIED"`. When no projection exists, the field is `null` with the same error. Clients MUST inspect `errors[]`. +- Standard errors: + - `POLICY_DENIED` → HTTP 403 + - `STATE_NOT_FOUND` → 404 + - `PAGE_LIMIT_EXCEEDED` → 422 + - `INVALID_STATE_REF` → 400 + - `INTERNAL_ERROR` → 500 (last resort) +- Each error includes `extensions.ref` (ULID) for support correlation. + +### 22.4 Caching & Rate Limits + + + +- Responses include `shapeRoot` (commit-derived hash) and `stateRefResolved`. Historical `stateRef` responses MUST set `Cache-Control: immutable` + `ETag: `. `refPath` queries are `Cache-Control: no-cache` and clients should revalidate to see new commits. +- Default throttling: 600 requests / 60 seconds per actor. Servers MUST emit `X-RateLimit-Limit`, `X-RateLimit-Remaining`, and `Retry-After` headers when throttled. Policy rules may override limits per namespace/project. + +--- + ## Glossary diff --git a/docs/decisions/ADR-0003/DECISION.md b/docs/decisions/ADR-0003/DECISION.md index 55e151f2..1e52b226 100644 --- a/docs/decisions/ADR-0003/DECISION.md +++ b/docs/decisions/ADR-0003/DECISION.md @@ -1,9 +1,25 @@ +--- +Status: Accepted +Date: 2025-11-08 +ADR: ADR-0003 +Authors: [flyingrobots] +Requires: [ADR-0001] +Related: [ADR-0002] +Tags: [Governance, Consensus, Policy] +Schemas: +- schemas/v1/governance/proposal.schema.json +- schemas/v1/governance/approval.schema.json +- schemas/v1/governance/grant.schema.json +- schemas/v1/governance/revocation.schema.json +- schemas/v1/governance/proof_of_consensus_envelope.schema.json +- schemas/v1/policy/governance_policy.schema.json +Supersedes: [] +Superseded-By: [] +--- # ADR-0003: Consensus Governance for Gated Actions - - @@ -25,28 +41,6 @@ ---- - -Status: Accepted -Date: 2025-11-08 -ADR: ADR-0003 -Authors: \[flyingrobots] -Requires: \[ADR-0001] -Related: \[ADR-0002] -Tags: \[Governance, Consensus, Policy] -Schemas: - -- `schemas/v1/governance/proposal.schema.json` -- `schemas/v1/governance/approval.schema.json` -- `schemas/v1/governance/grant.schema.json` -- `schemas/v1/governance/revocation.schema.json` -- `schemas/v1/governance/proof_of_consensus_envelope.schema.json` -- `schemas/v1/policy/governance_policy.schema.json` - -Supersedes: \[] -Superseded-By: \[] - ---- ## Scope diff --git a/docs/decisions/ADR-0006/DECISION.md b/docs/decisions/ADR-0006/DECISION.md index cd7c65cf..a0b87112 100644 --- a/docs/decisions/ADR-0006/DECISION.md +++ b/docs/decisions/ADR-0006/DECISION.md @@ -40,6 +40,31 @@ Provide **local enforcement** of governance policy via (a) a cross-platform Watc ``` - Watches `refs/gatos/grants/**` for updates so that newly approved locks are released immediately without requiring a restart. - The daemon MUST persist state (e.g., last applied masks, pending lock requests) under `~/.config/gatos/watch/` to survive restarts. State files are advisory; corruption or tampering MUST trigger a full resync from Git policy data before enforcement resumes. +- The daemon MUST persist state (e.g., last applied masks, pending lock requests) under `~/.config/gatos/watch/` to survive restarts. State files are advisory; corruption or tampering MUST trigger a full resync from Git policy data before enforcement resumes. + +```mermaid +sequenceDiagram + participant Dev as Developer + participant Hook as Git Hooks + participant Watch as gatos watch + participant Repo as Repo/Policy + + Dev->>Hook: git commit/push + Hook->>Watch: query lock cache + Watch->>Repo: read refs/gatos/grants + Repo-->>Watch: grant status + alt grant missing + Watch-->>Hook: deny (POLICY) + Hook-->>Dev: fail commit/push + else grant exists + Hook-->>Dev: allow + end + Dev->>Watch: edit file + Watch->>Repo: verify grant + alt violation + Watch-->>Dev: notification + reapply readonly + end +``` ### 2. Git Hooks (managed surface) diff --git a/docs/decisions/ADR-0007/DECISION.md b/docs/decisions/ADR-0007/DECISION.md index e9dfa5db..d6f8da43 100644 --- a/docs/decisions/ADR-0007/DECISION.md +++ b/docs/decisions/ADR-0007/DECISION.md @@ -1,5 +1,5 @@ --- -Status: Draft +Status: Accepted Date: 2025-11-09 ADR: ADR-0007 Authors: [flyingrobots] @@ -40,6 +40,27 @@ State is hierarchical and interlinked; GraphQL matches the access pattern and av Each error entry includes `extensions.code` and `extensions.ref` (support ULID). 10. **Rate Limits**: Default 600 requests / 60s window per actor, enforced via shared limiter. Policy rules may override per namespace/project; responses include `X-RateLimit-Remaining` headers. +```mermaid +sequenceDiagram + participant Client + participant API as GraphQL API + participant Resolver + participant Policy as Policy Filter + participant State as State Store + + Client->>API: POST /api/v1/graphql (stateRef, query) + API->>Resolver: resolve fields + Resolver->>State: load shape nodes + Resolver->>Policy: check access + alt allowed + Policy-->>Resolver: allow / pointerize + else denied + Policy-->>Resolver: deny (POLICY_DENIED) + end + Resolver-->>API: data + shapeRoot + API-->>Client: JSON (data, errors, shapeRoot) +``` + ## Consequences - Clients can build efficient UIs without bespoke endpoints. - Server complexity moves into resolvers and policy filters. diff --git a/docs/decisions/ADR-0008/DECISION.md b/docs/decisions/ADR-0008/DECISION.md index 9ffb589f..9b328c72 100644 --- a/docs/decisions/ADR-0008/DECISION.md +++ b/docs/decisions/ADR-0008/DECISION.md @@ -21,24 +21,44 @@ Commands are side-effecting; REST is adequate and tool-friendly. Integrations ne ## Decision 1. **Command Endpoint**: `POST /api/v1/commands` - - Body: - ```json - { "type": "", "args": {...}, "expect_state": "", "request_id": "" } - ``` - - Semantics: Return quickly with `{ "ack": true, "job_id": "" }` (async) or `{ "ok": true, ... }` (sync). - - **Idempotency**: `request_id` **MUST** dedupe within a 24h window. -2. **Result Plumbing**: Long work **SHOULD** create a Job (ADR-0002) and stream progress on the Message Plane. + - Body conforms to `schemas/v1/api/command_envelope.schema.json`. + - Semantics: + - Default async: return 202 + `{ "ack": true, "job_id": "", "state_ref": "" }` within 200 ms. + - Optional synchronous mode: when `sync=true` and the command’s budget <= 3s, return 200 + `{ "ok": true, "result": {...}, "state_ref": "" }`. + - `expect_state` enforces optimistic concurrency; if the current state head differs, respond 409 `EXPECT_STATE_MISMATCH` and include `current_state` in the body. + - **Idempotency**: `request_id` **MUST** dedupe within a rolling 24h window. The server stores the last response keyed by `request_id`/actor; duplicates return the cached payload. +2. **Result Plumbing**: Long work **SHALL** create a Job (ADR-0002) and stream progress on the Message Plane (ADR-0005) with trailers linking back to `request_id`. 3. **Webhooks**: - - Subscription admin endpoint: `POST /api/v1/webhooks`. - - Events (normative names): `proposal.created`, `approval.created`, `grant.created`, `grant.revoked`, `job.created`, `job.claimed`, `job.succeeded`, `job.failed`. - - Delivery: JSON body, `X-GATOS-Event`, `X-GATOS-Delivery`, **HMAC-SHA256** signature header. - - Retries with exponential backoff; dead-letter queue optional. -4. **AuthN/Z**: OAuth2/JWT bearer; scopes per command prefix; webhook secrets per subscription. -5. **HTTP Codes**: `202` for async ack, `200` for sync success, `409` for `EXPECT_STATE_MISMATCH`. + - CRUD endpoints: `POST /api/v1/webhooks` (create), `GET /api/v1/webhooks` (list), `DELETE /api/v1/webhooks/{id}` (revoke), `POST /api/v1/webhooks/{id}/rotate` (secret rotation). + - Event payloads obey `schemas/v1/api/webhook_delivery.schema.json`. + - Normative event names: `proposal.created`, `approval.created`, `grant.created`, `grant.revoked`, `job.created`, `job.claimed`, `job.succeeded`, `job.failed`, `state.folded`, `state.failed`. + - Delivery: HTTP POST with headers `X-GATOS-Event`, `X-GATOS-Delivery`, `X-GATOS-Signature: sha256=` (HMAC). Consumer must respond within 10s. Retries: exponential backoff (1s, 5s, 30s, 5m) up to 5 attempts; afterwards the delivery is parked in a dead-letter queue visible via `GET /api/v1/webhooks/{id}/dlq`. +4. **AuthN/Z**: OAuth2/JWT bearer tokens, validated against ADR-0003 policy. Scopes map to command prefixes (e.g., `locks:*`, `jobs:*`). Webhook secrets are per subscription; rotation takes effect immediately and old secrets expire after 5 minutes. +5. **HTTP Codes**: `202` (async ack), `200` (sync success), `400` (schema validation error), `401/403` (auth failures), `409` (`EXPECT_STATE_MISMATCH`), `422` (`COMMAND_UNSUPPORTED`), `500` (unhandled). + +```mermaid +sequenceDiagram + participant Client + participant API as REST API + participant Jobs as Job Plane + participant Message as Message Plane + participant Hook as Webhook Subscriber + + Client->>API: POST /api/v1/commands {request_id} + alt async command + API-->>Client: 202 {ack, job_id} + API->>Jobs: enqueue job (request_id) + Jobs->>Message: emit progress events + else sync command + API-->>Client: 200 {ok, result} + end + Jobs-->>API: job succeeded/failed + API->>Hook: POST webhook (job.succeeded) +``` ## Consequences - Clean separation of **mutations** (REST) vs **reads** (GraphQL). - Webhooks unlock automation without polling. ## Open Questions -- Do we surface a lightweight sync mode with server-side time budget? +- None at this stage. diff --git a/docs/decisions/ADR-0009/DECISION.md b/docs/decisions/ADR-0009/DECISION.md index 6c825f36..788f3328 100644 --- a/docs/decisions/ADR-0009/DECISION.md +++ b/docs/decisions/ADR-0009/DECISION.md @@ -19,21 +19,45 @@ Provide **WebSocket** streams for ref updates and bus topics to enable reactive UIs and workers need near-real-time updates without wasteful polling. ## Decision -1. **Endpoint**: `GET /api/v1/stream` → WebSocket (JSON frames). +1. **Endpoint**: `GET /api/v1/stream` → WebSocket (JSON frames defined in `schemas/v1/api/stream_frame.schema.json`). 2. **Subscribe/Unsubscribe** frames: ```json - {"op":"sub","refs":["refs/gatos/state/public/**","refs/mind/sessions/main"],"topics":["gatos.jobs.*"]} + {"op":"sub","refs":["refs/gatos/state/public/**"],"topics":["gatos.jobs.*"],"sinceSeq":1000} {"op":"unsub","refs":[...],"topics":[...]} ``` -3. **Server → client frames**: + - `sinceSeq` replays missed frames within the last 10 minutes (server clamps). + - Subscriptions are additive and capped at 20 refs + 20 topics per connection. +3. **Server → client frames** (examples): ```json - { "kind":"ref.update","ref":"refs/...","old":"","new":"","seq":123,"ts":"" } + { "kind":"ref.update","ref":"refs/gatos/state/public/ui/main","old":"","new":"","seq":123,"ts":"" } { "kind":"bus.event","topic":"gatos.jobs.pending","payload":{...},"seq":124,"ts":"" } + { "kind":"error","code":"POLICY_DENIED","message":"..." } ``` 4. **Delivery**: At-least-once with monotonic `seq` per connection; clients MUST dedupe. -5. **Replay**: Optional `sinceSeq` on connect to catch recent history (bounded window). -6. **AuthZ**: Same policy filters as GraphQL; forbidden refs are not emitted. -7. **Heartbeat**: `{"kind":"ping"}` / `{"kind":"pong"}` every 30s. +5. **Replay**: `sinceSeq` replays buffered frames up to `STREAM_REPLAY_LIMIT` (default 1,000 frames or 10 minutes, whichever comes first). Requests beyond the window respond with `error` frame `code="REPLAY_EXPIRED"` and start streaming live. +6. **AuthZ**: Same policy filters as GraphQL. Forbidden refs are silently dropped and an `error` frame with `code="POLICY_DENIED"` is emitted once per ref/topic to inform the client. +7. **Heartbeat & Backpressure**: Server sends `ping` every 30s; clients MUST reply within 10s or the connection is closed. Frames include a `credit` field when the server asks clients to apply backpressure (default window 1,000 outstanding frames). + +8. **Errors & Close Codes**: Protocol errors result in immediate close with WebSocket code `1008`. The final frame MAY include `{kind:"error", code:"INVALID_SUB", message:"..."}`. + +```mermaid +sequenceDiagram + participant Client + participant Stream as /api/v1/stream + participant Refs as Ref Watcher + participant Bus as Message Plane + + Client->>Stream: GET + WebSocket upgrade + Stream-->>Client: {kind:"ack"} + Client->>Stream: {op:"sub", refs:[...], topics:[...]} + Stream->>Refs: register ref filters + Stream->>Bus: register topic filters + Refs-->>Stream: ref.update events + Bus-->>Stream: bus.event frames + Stream-->>Client: frames with seq IDs + Stream-->>Client: {kind:"ping"} + Client-->>Stream: {kind:"pong"} +``` ## Consequences - Reactive UX and workers with minimal glue. diff --git a/docs/decisions/ADR-0010/DECISION.md b/docs/decisions/ADR-0010/DECISION.md index 8516a7ff..7126f385 100644 --- a/docs/decisions/ADR-0010/DECISION.md +++ b/docs/decisions/ADR-0010/DECISION.md @@ -31,6 +31,26 @@ Most teams live on GitHub; native enforcement and UX reduce friction. 5. **Security**: - Rotate app secrets, least-privilege scopes, per-repo installation. +```mermaid +sequenceDiagram + participant Dev as GitHub Developer + participant GH as GitHub + participant App as GATOS App + participant API as GATOS API + participant Ledger + + Dev->>GH: Open/Update PR + GH->>App: Webhook (pr.opened) + App->>Ledger: record event + Dev->>GH: /gatos lock acquire + GH->>App: Command comment + App->>API: POST /api/v1/commands {request_id} + API-->>App: ack/job + App-->>GH: PR comment + status check + Ledger-->>API: merge gate satisfied + App-->>GH: mark check success +``` + ## Consequences - One consistent control point for GitHub-centric teams. - Another moving part to maintain (secrets, webhooks, scale). diff --git a/docs/decisions/ADR-0011/DECISION.md b/docs/decisions/ADR-0011/DECISION.md index cece7301..5dfb1f3c 100644 --- a/docs/decisions/ADR-0011/DECISION.md +++ b/docs/decisions/ADR-0011/DECISION.md @@ -28,6 +28,17 @@ Teams want ad-hoc analytics without learning internals; SQL + columnar files cov 3. **Incremental**: `--since` resumes from last exported commit. 4. **Determinism**: Stable ordering; integrity table with exported root SHA. +```mermaid +flowchart LR + A[Ledger] -->|events| B[Exporter] + B --> C((SQLite)) + B --> D((Parquet)) + C --> E[BI/SQL] + D --> F[Analytics/Notebook] + B --> G{Integrity Record} + G --> H[Shape Root] +``` + ## Consequences - Easy dashboards, BI, notebooks. - Must be careful not to leak private overlay data (only pointer metadata exported). diff --git a/docs/decisions/ADR-0012/DECISION.md b/docs/decisions/ADR-0012/DECISION.md index 2052d7af..1e406b4f 100644 --- a/docs/decisions/ADR-0012/DECISION.md +++ b/docs/decisions/ADR-0012/DECISION.md @@ -37,6 +37,14 @@ Enables decentralized composition (e.g., central governance repo consumed by man 5. **UI/API**: - GraphQL exposes mounts under a separate namespace; streams emit updates when mount advances. +```mermaid +graph TD + A[Local Repo] -->|mounts| B[Remote Governance Repo] + B -->|state ref| C[refs/gatos/remotes/governance] + C --> D[State Fold] + D --> E[Local Policy] +``` + ## Consequences - Clean, verifiable cross-repo composition. - Requires remote availability and verification logic. diff --git a/docs/decisions/ADR-0013/DECISION.md b/docs/decisions/ADR-0013/DECISION.md index 3198ebc7..bc1a435f 100644 --- a/docs/decisions/ADR-0013/DECISION.md +++ b/docs/decisions/ADR-0013/DECISION.md @@ -27,6 +27,16 @@ Large repos need sub-linear recomputation to stay responsive. 5. **Concurrency**: Units may fold in parallel if dependencies permit; global join computes `Shape-Root`. 6. **Telemetry**: Commit trailers: `Fold-Cache-Hit: `, `Fold-Units: `, `Fold-Duration: `. +```mermaid +graph TD + E1[Event Stream] --> P1[Plan Affected Units] + P1 --> U1[Unit Fold Cache] + U1 -->|hit| C1[Join]\n(Reuse digest) + U1 -->|miss| F1[Fold Unit] + F1 --> C1 + C1 --> SR[Compute Shape-Root] +``` + ## Consequences - Orders-of-magnitude faster folds; predictable latencies. - Requires dependency modeling and careful cache invalidation. diff --git a/docs/decisions/ADR-0014/DECISION.md b/docs/decisions/ADR-0014/DECISION.md index fa0643a3..c4d2bd09 100644 --- a/docs/decisions/ADR-0014/DECISION.md +++ b/docs/decisions/ADR-0014/DECISION.md @@ -37,6 +37,20 @@ Jobs already attest execution (ADR-0002 PoE). Folds need equivalent integrity gu 3. **Storage**: Persist envelope under `refs/gatos/audit/proofs/folds/`. 4. **Verification**: `gatos fold verify ` checks engine key in trust graph, envelope hash, and output match. +```mermaid +sequenceDiagram + participant Fold as Fold Engine + participant Policy + participant Ledger + participant Audit + Fold->>Policy: resolve policy_root + Fold->>Ledger: read events/upstreams + Fold->>Fold: compute Shape-Root + Fold->>Fold: build envelope + Fold->>Fold: sign blake3(envelope) + Fold->>Audit: write refs/gatos/audit/proofs/folds/ +``` + ## Consequences - Auditable state derivations; reproducibility at the protocol layer. - Requires key management for fold engines. diff --git a/docs/decisions/README.md b/docs/decisions/README.md index edfe83d3..ec52271c 100644 --- a/docs/decisions/README.md +++ b/docs/decisions/README.md @@ -34,7 +34,7 @@ Each ADR will have a status, typically one of the following: | [ADR-0004](./ADR-0004/DECISION.md) | Hybrid Privacy Model (Public Projection + Private Overlay) | Accepted | 2025-11-09 | | [ADR-0005](./ADR-0005/DECISION.md) | Message Plane — A Git-Native, Commit-Backed Message Bus | Proposed | 2025-11-09 | | [ADR-0006](./ADR-0006/DECISION.md) | Local Enforcement — Watcher Daemon & Git Hooks | Accepted | 2025-11-09 | -| [ADR-0007](./ADR-0007/DECISION.md) | GraphQL State API (Read-Only) | Draft | 2025-11-09 | +| [ADR-0007](./ADR-0007/DECISION.md) | GraphQL State API (Read-Only) | Accepted | 2025-11-09 | | [ADR-0008](./ADR-0008/DECISION.md) | REST Commands & Webhooks | Draft | 2025-11-09 | | [ADR-0009](./ADR-0009/DECISION.md) | Real-Time Streams & Ref Subscriptions | Draft | 2025-11-09 | | [ADR-0010](./ADR-0010/DECISION.md) | First-Class GitHub App Integration | Draft | 2025-11-09 | diff --git a/docs/guide/CHAPTER-008.md b/docs/guide/CHAPTER-008.md index 66cd769c..dcce79ba 100644 --- a/docs/guide/CHAPTER-008.md +++ b/docs/guide/CHAPTER-008.md @@ -188,6 +188,30 @@ The CLI is the primary tool for human operators and for scripting simple GATOS i While not part of the core GATOS system, the JSONL RPC protocol makes it straightforward to build gateway services that expose GATOS functionality over more traditional web protocols like **REST** or **GraphQL**. +### GraphQL State API (ADR-0007) + +``` +POST /api/v1/graphql +Headers: { "x-gatos-api": "v1" } +Body: { "query": "query($stateRef: ID!,$first:Int){ + state(stateRef:$stateRef){ + shapeRoot + inventory(first:$first){ + edges{ node { id sku quantity } cursor } + pageInfo{ hasNextPage endCursor } + } + } +}", + "variables": {"stateRef": "7bc8...", "first": 50 } +} +``` + +- **Targeting:** Supply either `stateRef` (commit) or `refPath`. The server returns `stateRefResolved` so clients know which commit was read. +- **Pagination:** Relay-style connections (`edges`, `pageInfo`); cursors are opaque. Servers clamp page sizes to `[1,500]`. +- **Privacy:** Fields projected as opaque pointers appear as `OpaquePointerNode { kind, algo, digest, location, capability }`; the client can decide whether to resolve them. +- **Errors:** Denied fields append GraphQL errors with `POLICY_DENIED`. Always read the `errors[]` array even if `data` is non-null. +- **Caching:** Responses include `shapeRoot` and `ETag` so you can safely memoize historical snapshots; moving refs respond with `Cache-Control: no-cache` so you revalidate. + A simple Node.js or Go service could listen for HTTP requests, translate them into JSONL RPC commands to `gatosd`, and then format the JSONL responses back into HTTP responses. This allows web frontends and other standard web clients to interact with a GATOS node without needing a dedicated GATOS SDK. ### End-to-End Example (JSONL) diff --git a/schemas/README.md b/schemas/README.md index 55d3f0b2..3ff3bc55 100644 --- a/schemas/README.md +++ b/schemas/README.md @@ -41,3 +41,5 @@ Message Plane envelopes - Local enforcement (ADR-0006): - `schemas/v1/policy/locks.schema.json` extends `.gatos/policy.yaml` with `locks[]` and `watcher` blocks. - `schemas/v1/watch/events.schema.json` defines the JSONL payload emitted by `gatos watch`. +- GraphQL API (ADR-0007): + - `schemas/v1/api/graphql_state_mapping.schema.json` documents how GraphQL types map back to on-disk state paths. diff --git a/schemas/v1/api/command_envelope.schema.json b/schemas/v1/api/command_envelope.schema.json new file mode 100644 index 00000000..99761112 --- /dev/null +++ b/schemas/v1/api/command_envelope.schema.json @@ -0,0 +1,33 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "$id": "https://gatos.io/schemas/v1/api/command_envelope.schema.json", + "title": "GATOS Command Envelope", + "type": "object", + "additionalProperties": false, + "required": ["type", "args", "request_id"], + "properties": { + "type": { + "type": "string", + "pattern": "^[a-z][a-z0-9_.-]+$", + "description": "Verb.noun style identifier (e.g., locks.acquire)." + }, + "args": { + "type": "object", + "description": "Command-specific arguments." + }, + "expect_state": { + "type": "string", + "pattern": "^[0-9a-fA-F]{40}$", + "description": "Optional expected state SHA." + }, + "request_id": { + "type": "string", + "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$", + "description": "ULID used for idempotency." + }, + "sync": { + "type": "boolean", + "description": "Optional hint requesting synchronous execution when supported." + } + } +} diff --git a/schemas/v1/api/graphql_state_mapping.schema.json b/schemas/v1/api/graphql_state_mapping.schema.json new file mode 100644 index 00000000..8fa53078 --- /dev/null +++ b/schemas/v1/api/graphql_state_mapping.schema.json @@ -0,0 +1,21 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "title": "GraphQL State Mapping (v1)", + "type": "object", + "required": ["types"], + "properties": { + "types": { + "type": "array", + "items": { + "type": "object", + "required": ["graphqlType", "statePath"], + "properties": { + "graphqlType": { "type": "string" }, + "statePath": { "type": "string", "description": "Dot-path into state JSON" }, + "description": { "type": "string" } + } + } + } + }, + "additionalProperties": false +} diff --git a/schemas/v1/api/stream_frame.schema.json b/schemas/v1/api/stream_frame.schema.json new file mode 100644 index 00000000..f98edc7e --- /dev/null +++ b/schemas/v1/api/stream_frame.schema.json @@ -0,0 +1,99 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "$id": "https://gatos.io/schemas/v1/api/stream_frame.schema.json", + "title": "GATOS Stream Frame", + "type": "object", + "oneOf": [ + { "$ref": "#/definitions/sub" }, + { "$ref": "#/definitions/unsub" }, + { "$ref": "#/definitions/refUpdate" }, + { "$ref": "#/definitions/busEvent" }, + { "$ref": "#/definitions/ping" }, + { "$ref": "#/definitions/pong" }, + { "$ref": "#/definitions/error" } + ], + "definitions": { + "sub": { + "type": "object", + "required": ["op"], + "properties": { + "op": { "const": "sub" }, + "refs": { + "type": "array", + "items": { "type": "string" } + }, + "topics": { + "type": "array", + "items": { "type": "string" } + }, + "sinceSeq": { + "type": "integer", + "minimum": 0 + } + }, + "additionalProperties": false + }, + "unsub": { + "type": "object", + "required": ["op"], + "properties": { + "op": { "const": "unsub" }, + "refs": { + "type": "array", + "items": { "type": "string" } + }, + "topics": { + "type": "array", + "items": { "type": "string" } + } + }, + "additionalProperties": false + }, + "refUpdate": { + "type": "object", + "required": ["kind", "ref", "old", "new", "seq", "ts"], + "properties": { + "kind": { "const": "ref.update" }, + "ref": { "type": "string" }, + "old": { "type": ["string", "null"] }, + "new": { "type": ["string", "null"] }, + "seq": { "type": "integer", "minimum": 0 }, + "ts": { "type": "string", "format": "date-time" } + }, + "additionalProperties": false + }, + "busEvent": { + "type": "object", + "required": ["kind", "topic", "payload", "seq", "ts"], + "properties": { + "kind": { "const": "bus.event" }, + "topic": { "type": "string" }, + "payload": { "type": "object" }, + "seq": { "type": "integer", "minimum": 0 }, + "ts": { "type": "string", "format": "date-time" } + }, + "additionalProperties": true + }, + "ping": { + "type": "object", + "required": ["kind"], + "properties": { "kind": { "const": "ping" } }, + "additionalProperties": false + }, + "pong": { + "type": "object", + "required": ["kind"], + "properties": { "kind": { "const": "pong" } }, + "additionalProperties": false + }, + "error": { + "type": "object", + "required": ["kind", "code", "message"], + "properties": { + "kind": { "const": "error" }, + "code": { "type": "string" }, + "message": { "type": "string" } + } + } + } +} diff --git a/schemas/v1/api/webhook_delivery.schema.json b/schemas/v1/api/webhook_delivery.schema.json new file mode 100644 index 00000000..5a3ddd47 --- /dev/null +++ b/schemas/v1/api/webhook_delivery.schema.json @@ -0,0 +1,35 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "$id": "https://gatos.io/schemas/v1/api/webhook_delivery.schema.json", + "title": "GATOS Webhook Delivery", + "type": "object", + "additionalProperties": false, + "required": ["id", "event", "ts", "payload"], + "properties": { + "id": { + "type": "string", + "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$", + "description": "Delivery ULID" + }, + "event": { + "type": "string", + "description": "Event name (proposal.created, job.succeeded, etc.)." + }, + "ts": { + "type": "string", + "format": "date-time" + }, + "payload": { + "type": "object", + "description": "Event-specific body." }, + "attempt": { + "type": "integer", + "minimum": 1, + "description": "Delivery attempt number." + }, + "signature": { + "type": "string", + "description": "HMAC-SHA256 signature of body (hex)." + } + } +} From cd3da093eee34d71eb9887d890b7026cef5fde69 Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Mon, 17 Nov 2025 04:42:23 -0800 Subject: [PATCH 11/25] docs: ADR-0010 updates --- docs/decisions/ADR-0010/DECISION.md | 68 +++++++++++++++++++++++------ 1 file changed, 55 insertions(+), 13 deletions(-) diff --git a/docs/decisions/ADR-0010/DECISION.md b/docs/decisions/ADR-0010/DECISION.md index 7126f385..58b68c4f 100644 --- a/docs/decisions/ADR-0010/DECISION.md +++ b/docs/decisions/ADR-0010/DECISION.md @@ -17,19 +17,60 @@ Define a **GitHub App** that enforces policy, brokers commands, and mirrors nece Most teams live on GitHub; native enforcement and UX reduce friction. ## Decision -1. **Capabilities**: - - Read PRs, reviews, comments; write status checks; limited content permissions. - - Webhook ingestion → map to ledger events (`pr.opened`, `review.submitted`, etc.). -2. **Command Triggers**: - - PR comment prefix `git mind ...` or `/gatos ...` → hits `POST /api/v1/commands`. - - Responses posted as PR comments (summarised) + status checks. -3. **Policy Enforcement**: - - `gatos-policy` exposes `merge_gate` check; the app **MUST** block merges until grants exist for gated actions (ADR-0003). - - Status checks: `gatos/policy`, `gatos/fold`, `gatos/jobs`. -4. **Attestations**: - - Job and fold proofs surfaced as artifacts/annotations with digest links. -5. **Security**: - - Rotate app secrets, least-privilege scopes, per-repo installation. +1. **Permissions & Webhooks** + - GitHub App permissions: + - Read: Pull requests, Commit statuses, Checks, Issues, Metadata. + - Write: Commit statuses, Checks, Issue comments (for bot replies). + - **MUST NOT** request Contents:write; repository content changes still flow through Git. + - Webhooks subscribed: `pull_request`, `pull_request_review`, `status`, `check_suite`, `issue_comment`, `push`. + - Each webhook POST → `POST /api/v1/commands` or `/api/v1/webhooks` internal mapping that logs an event under `refs/gatos/events/github/`. + +2. **Command Triggers & UX** + - Slash commands: `/gatos [args]` or fenced code blocks starting with `git mind`. + - The App validates actor permissions via ADR-0003 policy before forwarding to `POST /api/v1/commands`. + - Responses: short PR comment summarizing result + link to Message Plane topic `gatos.github//` for full logs. + - Failed commands MUST set a temporary GitHub Status `gatos/command` to `failure` with details. + +3. **Checks & Merge Gates** + - Required checks per PR: `gatos/policy`, `gatos/fold`, `gatos/jobs`, `gatos/locks`. + - `gatos/policy`: passes when all governance rules referenced by the PR diff have quorum Grants. + - `gatos/fold`: passes when Proof-of-Fold (ADR-0014) exists for target state. + - `gatos/jobs`: ensures required Jobs (ADR-0002) have succeeded. + - `gatos/locks`: verifies there are no outstanding `locks.acquire` commands pending on files touched by the PR. + - Merge button is disabled until all required checks succeed; the App **MUST** set `required_status_checks.strict=true` via GitHub API. + +4. **Ledger Mirroring** + - Webhook payloads are normalized and committed to `refs/gatos/events/github/` so the ledger contains a full audit trail of PR history. + - Mapping examples: + - `pull_request.opened` → `events/ns=github, kind=pr.opened` with PR metadata. + - `pull_request_review.submitted` → `events/ns=github, kind=review.submitted`. + - These events feed downstream automation (Message Plane topics `gatos.github.events`). + +5. **Attestations & Artifacts** + - When a Job completes with Proof-of-Execution, the App uploads an artifact to the PR (`gatos-poe-.txt`) referencing the digest. + - Fold proofs post an annotation linking to `refs/gatos/audit/proofs/folds/`. + +7. **Configuration (`.gatos/github.yaml`)** + ```yaml + repo: + org: gatos + name: demo + checks: + required: ["gatos/policy", "gatos/fold", "gatos/jobs", "gatos/locks"] + commands: + aliases: + lock: "locks.acquire" + release: "locks.release" + permissions: + allow_commenters: ["team/leads", "user:alice"] + ``` + - This file is versioned and interpreted by the App to scope commands, aliases, and allowed actors per repo. + +6. **Security & Operations** + - App private key stored in HashiCorp Vault (or equivalent). Rotation every 90 days; automation triggers `POST /api/v1/github/rotate` to reload credentials. + - Installation tokens cached for ≤1 hour; refresh proactively at 45 minutes. + - All outbound calls to GitHub use conditional requests (`If-None-Match`) to stay within rate limits. + - The App MUST verify GitHub webhook signatures (SHA256) before processing. ```mermaid sequenceDiagram @@ -57,3 +98,4 @@ sequenceDiagram ## Open Questions - Optional mapping of GH approvals → governance approvals (off by default)? +- Federation: how do installations spanning many repos share a single watcher queue? From f08c95cf1fe2544a4e226b3f264817e50111e2cb Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Mon, 17 Nov 2025 06:10:30 -0800 Subject: [PATCH 12/25] docs: Updates based on ADRs --- docs/decisions/ADR-0011/DECISION.md | 30 ++++++---- docs/decisions/ADR-0012/DECISION.md | 43 +++++++++----- docs/decisions/ADR-0013/DECISION.md | 40 ++++++++++--- docs/decisions/ADR-0014/DECISION.md | 14 +---- schemas/v1/export/export_manifest.schema.json | 32 +++++++++++ schemas/v1/federation/mounts.schema.json | 56 +++++++++++++++++++ .../state/proof_of_fold_envelope.schema.json | 38 +++++++++++++ 7 files changed, 210 insertions(+), 43 deletions(-) create mode 100644 schemas/v1/export/export_manifest.schema.json create mode 100644 schemas/v1/federation/mounts.schema.json create mode 100644 schemas/v1/state/proof_of_fold_envelope.schema.json diff --git a/docs/decisions/ADR-0011/DECISION.md b/docs/decisions/ADR-0011/DECISION.md index 5dfb1f3c..d3352a0e 100644 --- a/docs/decisions/ADR-0011/DECISION.md +++ b/docs/decisions/ADR-0011/DECISION.md @@ -17,16 +17,26 @@ Provide an **exporter** that materializes the ledger and state into **SQLite** a Teams want ad-hoc analytics without learning internals; SQL + columnar files cover 95% of needs. ## Decision -1. **CLI**: `gatos export --format sqlite|parquet --out [--since ]` -2. **Schema (SQLite)**: - - `commits(id, parent_id, author, ts, msg, trailers JSON)` - - `events(id, ns, kind, payload JSON, commit_id)` - - `state_nodes(id, path, digest, shape JSON, state_ref)` - - `pointers(digest, location, capability, state_ref)` # ADR-0004 - - `jobs(id, status, started_at, finished_at, proof_digest)`# ADR-0002 - - `governance(proposals, approvals, grants, revocations)` # ADR-0003 -3. **Incremental**: `--since` resumes from last exported commit. -4. **Determinism**: Stable ordering; integrity table with exported root SHA. +1. **CLI**: `gatos export --format sqlite|parquet --out [--since ] [--tables commits,events,...]` + - Writes an export manifest alongside the export (`.manifest.json`) conforming to `schemas/v1/export/export_manifest.schema.json`. + - `--since` defaults to the last export’s `state_ref` recorded in the manifest when present. +2. **SQLite Schema** (columns are strictly typed, `NOT NULL` unless marked optional): + - `commits(id TEXT PRIMARY KEY, parent_id TEXT, author TEXT, ts INTEGER, message TEXT, trailers JSON)` + - `events(id TEXT PRIMARY KEY, ns TEXT, kind TEXT, payload JSON, commit_id TEXT REFERENCES commits(id))` + - `state_nodes(id TEXT PRIMARY KEY, path TEXT, digest TEXT, shape JSON, state_ref TEXT)` + - `pointers(digest TEXT PRIMARY KEY, location TEXT, capability TEXT, state_ref TEXT)` + - `jobs(id TEXT PRIMARY KEY, status TEXT, started_at INTEGER, finished_at INTEGER, proof_digest TEXT)` + - `governance_proposals(id TEXT PRIMARY KEY, payload JSON, commit_id TEXT)` (similar tables for approvals/grants/revocations) + - All tables include `created_ts` and `updated_ts` for audit. +3. **Parquet Layout**: + - Mirror table structure; partition by `state_ref` for `state_nodes` and by day for `events` to optimise Spark/Trino queries. + - Compression: ZSTD level 3; dictionary encoding enabled. +4. **Incremental Semantics**: + - `--since ` includes events with `commit_id` strictly greater than `` (topo order). + - If a job re-runs with the same `id`, the exporter updates/inserts rows idempotently. +5. **Determinism & Integrity**: + - Rows sorted by primary key; SQLite `PRAGMA user_version` stores the exporter version. + - Integrity table `export_info(state_ref TEXT, commit_start TEXT, commit_end TEXT, shape_root TEXT, exported_at TEXT)`. ```mermaid flowchart LR diff --git a/docs/decisions/ADR-0012/DECISION.md b/docs/decisions/ADR-0012/DECISION.md index 1e406b4f..c33c4553 100644 --- a/docs/decisions/ADR-0012/DECISION.md +++ b/docs/decisions/ADR-0012/DECISION.md @@ -19,23 +19,40 @@ Allow a repository to **subscribe to** and **mount** public state from other rep Enables decentralized composition (e.g., central governance repo consumed by many project repos) without monorepos. ## Decision -1. **Config**: `.gatos/federation.yaml` declares mounts: +1. **Configuration** (`.gatos/federation.yaml`) + - Validates against `schemas/v1/federation/mounts.schema.json`. ```yaml mounts: - name: governance - source: "git+https://example.com/org/gov-repo.git#refs/gatos/state/public/policy/main" - verify: "ed25519:" + source: "git+https://example.com/org/gov.git#refs/gatos/state/public/policy/main" + verify: "ed25519:ABC123..." refresh: "PT5M" + auth: + kind: token + token_env: GATOS_FED_TOKEN + policy: + trusted_refs: + - "refs/gatos/state/public/policy/main" + max_depth: 2 ``` -2. **On-Disk Refs**: - - `refs/gatos/remotes//state//` (read-only mirror). -3. **Resolution**: - - Fetch at refresh cadence or on demand. - - Verify signed commit against `verify` key in trust graph; reject otherwise. -4. **Usage**: - - State folds can read mounted refs as input; mounts MUST NOT be mutated locally. -5. **UI/API**: - - GraphQL exposes mounts under a separate namespace; streams emit updates when mount advances. + - Cycles are prevented by `max_depth` (default 3); mounts referencing each other deeper than the limit fail validation. + +2. **On-Disk Layout** + - Mirror remote refs under `refs/gatos/remotes//state//`. + - Metadata stored at `refs/gatos/remotes//meta` including last fetch time and verified commit. + +3. **Fetch & Verification Pipeline** + - Mount daemon (`gatos mountd`) fetches `source` on startup and every `refresh` interval; manual `gatos mount sync ` triggers eager fetch. + - Each fetched commit MUST include a signed trailer (per ADR-0003 trust graph); `verify` key (ed25519) is looked up in the trust graph. If signature fails, the mount is marked `stale` and not exposed to folds. + - Only refs listed in `policy.trusted_refs` are tracked; others are ignored to prevent repo sprawl. + +4. **Usage in Folds & APIs** + - State folds reference mount refs via `refs/gatos/remotes/...` but treat them as read-only. Any attempt to commit to a mount ref MUST be rejected client-side and server-side. + - GraphQL exposes mounts under `federation { mounts { name, state(ns, channel) } }`. WebSocket streams include `ref.update` frames when a mount advances. + +5. **Failure Modes** + - If mount fetch fails (auth/network), mark mount `degraded` and emit an event to `refs/gatos/audit/federation//`. + - Policy MUST support forcing a mount to `offline` when `max_depth` is exceeded to avoid cycles. ```mermaid graph TD @@ -50,4 +67,4 @@ graph TD - Requires remote availability and verification logic. ## Open Questions -- Cycles between mounts (A mounts B, B mounts A) — forbid or handle with depth limits? +- Federation gossip: do we allow automatic mount discovery, or keep `.gatos/federation.yaml` manual only? diff --git a/docs/decisions/ADR-0013/DECISION.md b/docs/decisions/ADR-0013/DECISION.md index bc1a435f..b0d68ec6 100644 --- a/docs/decisions/ADR-0013/DECISION.md +++ b/docs/decisions/ADR-0013/DECISION.md @@ -17,15 +17,36 @@ Specify **incremental** and **lazy** state folding to avoid recomputing the enti Large repos need sub-linear recomputation to stay responsive. ## Decision -1. **Fold Units**: Define shard boundaries (by namespace/path). Each unit has a cache key: - ```text - key = blake3(code_hash || policy_root || input_event_ids || upstream_unit_digests) - ``` -2. **Cache**: `gatos-kv` stores fold outputs per unit with `key -> digest`. -3. **Invalidation**: On new events, compute affected unit set via dependency graph; only recompute those. -4. **Lazy Materialization**: Units not requested by a client **MAY** remain cold; materialize on demand. -5. **Concurrency**: Units may fold in parallel if dependencies permit; global join computes `Shape-Root`. -6. **Telemetry**: Commit trailers: `Fold-Cache-Hit: `, `Fold-Units: `, `Fold-Duration: `. +1. **Unit Partitioning** + - State namespaces declare unit boundaries in `.gatos/fold_units.yaml` (list of glob patterns). Default: `state///**` per channel. + - Each unit’s cache key: + ```text + key = blake3(fold_code_hash || policy_root || event_ids_hash || upstream_unit_digests) + ``` + - `fold_code_hash`: hash of the Echo fold code (compiled). + - `event_ids_hash`: blake3 over the ordered list of events fed into the unit since last checkpoint. + - `upstream_unit_digests`: sorted list of dependent unit digests (for DAG composition). + +2. **Cache Store** + - `gatos-kv` stores `key -> {digest, payload_pointer}`. Payload is a pointer to the unit’s serialized shape (ADR-0004 pointer envelope) to avoid duplicating blobs. + - Cache entries expire when `policy_root` changes or when explicitly invalidated. + +3. **Dependency Graph & Invalidation** + - Fold authors declare dependencies between units (YAML adjacency). The engine builds a DAG and, upon new events, computes affected units via reverse dependencies. + - When an upstream unit changes digest, all downstream units are marked dirty. + - Cache evictions recorded under `refs/gatos/audit/fold-cache/` for observability. + +4. **Lazy Materialization** + - Units outside the request (e.g., API query doesn’t touch a namespace) remain cold. The first access triggers `fold_unit` computation asynchronously; clients receive `loading=true` until ready. + - CLI flag `--materialize all` forces full materialization for scenarios like exports. + +5. **Concurrency & Scheduling** + - Fold executor spawns up to `num_cpus` workers; DAG ensures topological order. Units with no unmet dependencies run in parallel. + - Locks per unit prevent double computation; duplicate requests wait on the same future. + +6. **Telemetry & Reporting** + - Commit trailers include `Fold-Cache-Hit`, `Fold-Cache-Miss`, `Fold-Units`, `Fold-Duration`, `Fold-Parallelism`. + - Metrics exported via Prometheus: `gatos_fold_unit_duration_ms`, `gatos_fold_cache_utilization`. ```mermaid graph TD @@ -43,3 +64,4 @@ graph TD ## Open Questions - Do we allow background prewarming or keep strictly on-demand for MVP? +- How do we cache across git worktrees (shared cache vs per-worktree)? diff --git a/docs/decisions/ADR-0014/DECISION.md b/docs/decisions/ADR-0014/DECISION.md index c4d2bd09..3d5eea12 100644 --- a/docs/decisions/ADR-0014/DECISION.md +++ b/docs/decisions/ADR-0014/DECISION.md @@ -20,17 +20,8 @@ Jobs already attest execution (ADR-0002 PoE). Folds need equivalent integrity gu ## Decision 1. **Envelope** (canonical JSON): - ```json - { - "fold_id": "blake3:<...>", - "engine": { "program": "sha256:<...>", "version": "x.y.z", "platform": "..." }, - "policy_root": "sha256:<...>", - "inputs": { "events": ["..."], "upstreams": ["blake3:...", "..."] }, - "output_shape_root": "blake3:<...>", - "metrics": { "units": "N", "duration_ms": "M" }, - "ts": "" - } - ``` + - Serialized according to `schemas/v1/state/proof_of_fold_envelope.schema.json`. + - Includes `content_id = blake3(envelope_bytes)` so downstream verification doesn’t re-hash. 2. **Signature**: Engine signs `blake3(envelope)` with its key; trailers: - `Proof-Of-Fold: blake3:` - `Fold-Sig: ed25519:` @@ -57,3 +48,4 @@ sequenceDiagram ## Open Questions - Do we include WASM module hash for portable fold engines in v1? +- Should Proof-of-Fold signatures be batched (multi-unit proofs) or per-state only? diff --git a/schemas/v1/export/export_manifest.schema.json b/schemas/v1/export/export_manifest.schema.json new file mode 100644 index 00000000..30aa10bf --- /dev/null +++ b/schemas/v1/export/export_manifest.schema.json @@ -0,0 +1,32 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "$id": "https://gatos.io/schemas/v1/export/export_manifest.schema.json", + "title": "GATOS Export Manifest", + "type": "object", + "required": ["format", "state_ref", "commit_range", "tables"], + "properties": { + "format": { "enum": ["sqlite", "parquet"] }, + "state_ref": { "type": "string", "pattern": "^[0-9a-fA-F]{40}$" }, + "commit_range": { + "type": "object", + "required": ["start", "end"], + "properties": { + "start": { "type": "string", "pattern": "^[0-9a-fA-F]{40}$" }, + "end": { "type": "string", "pattern": "^[0-9a-fA-F]{40}$" } + } + }, + "tables": { + "type": "object", + "additionalProperties": false, + "properties": { + "commits": { "type": "boolean", "default": true }, + "events": { "type": "boolean", "default": true }, + "state_nodes": { "type": "boolean", "default": true }, + "pointers": { "type": "boolean", "default": true }, + "jobs": { "type": "boolean", "default": true }, + "governance": { "type": "boolean", "default": true } + } + }, + "created_at": { "type": "string", "format": "date-time" } + } +} diff --git a/schemas/v1/federation/mounts.schema.json b/schemas/v1/federation/mounts.schema.json new file mode 100644 index 00000000..e78fd4ee --- /dev/null +++ b/schemas/v1/federation/mounts.schema.json @@ -0,0 +1,56 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "$id": "https://gatos.io/schemas/v1/federation/mounts.schema.json", + "title": "GATOS Federation Mounts", + "type": "object", + "required": ["mounts"], + "properties": { + "mounts": { + "type": "array", + "items": { + "type": "object", + "required": ["name", "source", "verify"], + "properties": { + "name": { + "type": "string", + "pattern": "^[a-z0-9_-]+$" + }, + "source": { + "type": "string", + "description": "git+https://...#refs/..." + }, + "verify": { + "type": "string", + "description": "ed25519:" + }, + "refresh": { + "type": "string", + "description": "ISO-8601 duration (e.g., PT5M)", + "default": "PT5M" + }, + "auth": { + "type": "object", + "additionalProperties": false, + "properties": { + "kind": { "enum": ["none", "token", "ssh"] }, + "token_env": { "type": "string" }, + "ssh_key_path": { "type": "string" } + } + }, + "policy": { + "type": "object", + "additionalProperties": false, + "properties": { + "trusted_refs": { + "type": "array", + "items": { "type": "string" } + }, + "max_depth": { "type": "integer", "minimum": 1 } + } + } + }, + "additionalProperties": false + } + } + } +} diff --git a/schemas/v1/state/proof_of_fold_envelope.schema.json b/schemas/v1/state/proof_of_fold_envelope.schema.json new file mode 100644 index 00000000..0fb1e060 --- /dev/null +++ b/schemas/v1/state/proof_of_fold_envelope.schema.json @@ -0,0 +1,38 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "$id": "https://gatos.io/schemas/v1/state/proof_of_fold_envelope.schema.json", + "title": "Proof of Fold Envelope", + "type": "object", + "required": ["fold_id", "engine", "policy_root", "inputs", "output_shape_root", "metrics", "ts"], + "properties": { + "fold_id": { "type": "string", "pattern": "^blake3:[0-9a-f]{64}$" }, + "engine": { + "type": "object", + "required": ["program", "version", "platform"], + "properties": { + "program": { "type": "string" }, + "version": { "type": "string" }, + "platform": { "type": "string" } + } + }, + "policy_root": { "type": "string" }, + "inputs": { + "type": "object", + "required": ["events", "upstreams"], + "properties": { + "events": { "type": "array", "items": { "type": "string" } }, + "upstreams": { "type": "array", "items": { "type": "string" } } + } + }, + "output_shape_root": { "type": "string", "pattern": "^blake3:[0-9a-f]{64}$" }, + "metrics": { + "type": "object", + "properties": { + "units": { "type": "integer" }, + "duration_ms": { "type": "integer" } + } + }, + "ts": { "type": "string", "format": "date-time" }, + "content_id": { "type": "string", "pattern": "^blake3:[0-9a-f]{64}$" } + } +} From 48f096b7a569cf2609925958bef438137e68ab2f Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Mon, 17 Nov 2025 16:01:21 -0800 Subject: [PATCH 13/25] docs: expand proof + schema coverage --- docs/decisions/ADR-0007/DECISION.md | 10 ++ docs/decisions/ADR-0014/DECISION.md | 48 ++++-- docs/decisions/ARD-FEEDBACK.md | 149 ++++++++++++++++++ examples/v1/api/command_envelope_min.json | 10 ++ .../v1/api/graphql_state_mapping_min.json | 8 + examples/v1/api/stream_frame_sub.json | 6 + examples/v1/api/webhook_delivery_min.json | 11 ++ examples/v1/export/export_manifest_min.json | 17 ++ examples/v1/federation/mounts_min.json | 18 +++ .../consumer_checkpoint_min.json | 4 + .../v1/message-plane/event_envelope_min.json | 12 ++ examples/v1/privacy/opaque_pointer_min.json | 8 + .../v1/state/proof_of_fold_envelope_min.json | 25 +++ examples/v1/watch/event_min.json | 11 ++ schemas/v1/api/command_envelope.schema.json | 2 +- .../v1/api/graphql_state_mapping.schema.json | 2 +- schemas/v1/api/stream_frame.schema.json | 2 +- schemas/v1/api/webhook_delivery.schema.json | 2 +- schemas/v1/export/export_manifest.schema.json | 2 +- schemas/v1/federation/mounts.schema.json | 2 +- .../consumer_checkpoint.schema.json | 2 +- .../message-plane/event_envelope.schema.json | 2 +- schemas/v1/privacy/opaque_pointer.schema.json | 2 +- .../state/proof_of_fold_envelope.schema.json | 2 +- schemas/v1/watch/events.schema.json | 2 +- scripts/validate_schemas.sh | 30 +++- 26 files changed, 363 insertions(+), 26 deletions(-) create mode 100644 docs/decisions/ARD-FEEDBACK.md create mode 100644 examples/v1/api/command_envelope_min.json create mode 100644 examples/v1/api/graphql_state_mapping_min.json create mode 100644 examples/v1/api/stream_frame_sub.json create mode 100644 examples/v1/api/webhook_delivery_min.json create mode 100644 examples/v1/export/export_manifest_min.json create mode 100644 examples/v1/federation/mounts_min.json create mode 100644 examples/v1/message-plane/consumer_checkpoint_min.json create mode 100644 examples/v1/message-plane/event_envelope_min.json create mode 100644 examples/v1/privacy/opaque_pointer_min.json create mode 100644 examples/v1/state/proof_of_fold_envelope_min.json create mode 100644 examples/v1/watch/event_min.json diff --git a/docs/decisions/ADR-0007/DECISION.md b/docs/decisions/ADR-0007/DECISION.md index d6f8da43..645ff615 100644 --- a/docs/decisions/ADR-0007/DECISION.md +++ b/docs/decisions/ADR-0007/DECISION.md @@ -40,6 +40,16 @@ State is hierarchical and interlinked; GraphQL matches the access pattern and av Each error entry includes `extensions.code` and `extensions.ref` (support ULID). 10. **Rate Limits**: Default 600 requests / 60s window per actor, enforced via shared limiter. Policy rules may override per namespace/project; responses include `X-RateLimit-Remaining` headers. +## Schema Evolution & Error Surfacing +1. **Deprecation Cadence** + - *Announce (0–4 weeks)*: SDL publishes `@deprecated(reason: "removal in 4w")`, release notes summarize the change, and responses add `X-GATOS-Deprecations`. No behavior change beyond warnings. + - *Dual-Serve (4–12 weeks)*: Legacy + successor fields resolve in parallel. Requests for the old field append `errors[]` entries with `extensions.code="FIELD_DEPRECATED"`; dashboards track usage daily so teams can confirm adoption. + - *Removal (>12 weeks)*: Field disappears from SDL/introspection. Queries referencing it return `USER_INPUT_ERROR` with an `extensions.ref` ULID pointing to the removal notice. A 1-week emergency rollback window exists; after that, reintroducing the field requires a fresh ADR. +2. **Error Propagation** + - *Policy Denied*: Resolver emits partial data with an error `{code:"POLICY_DENIED", path:[...], ref:}` while still returning HTTP 200 and a `shapeRoot`. The auxiliary diagram highlights the `Policy` participant sending `deny/pointerize` before the response. + - *Invalid Ref / Missing State*: When `stateRef` or `refPath` cannot be resolved, Resolver emits `{code:"STATE_NOT_FOUND"}` (404 when the top-level ref fails, 200 otherwise) or `{code:"INVALID_STATE_REF"}` (400). The auxiliary diagram’s second branch shows `State Store` returning a miss and Resolver surfacing the error bubble before writing the response. + - *Diagram Note*: Add a companion Mermaid sequence titled “Error Paths” with two `alt` blocks—`Policy DENY` (shows pointerized/null field plus error) and `State MISS` (shows Resolver handling a missing ref). This diagram supplements the happy-path diagram above so operators see both flows. + ```mermaid sequenceDiagram participant Client diff --git a/docs/decisions/ADR-0014/DECISION.md b/docs/decisions/ADR-0014/DECISION.md index 3d5eea12..4a7ff658 100644 --- a/docs/decisions/ADR-0014/DECISION.md +++ b/docs/decisions/ADR-0014/DECISION.md @@ -19,14 +19,38 @@ Define a **cryptographic attestation** for state folds that proves which code an Jobs already attest execution (ADR-0002 PoE). Folds need equivalent integrity guarantees. ## Decision -1. **Envelope** (canonical JSON): - - Serialized according to `schemas/v1/state/proof_of_fold_envelope.schema.json`. - - Includes `content_id = blake3(envelope_bytes)` so downstream verification doesn’t re-hash. -2. **Signature**: Engine signs `blake3(envelope)` with its key; trailers: - - `Proof-Of-Fold: blake3:` - - `Fold-Sig: ed25519:` -3. **Storage**: Persist envelope under `refs/gatos/audit/proofs/folds/`. -4. **Verification**: `gatos fold verify ` checks engine key in trust graph, envelope hash, and output match. +1. **Envelope Contract** — canonical JSON defined in `schemas/v1/state/proof_of_fold_envelope.schema.json`: + - `fold_id`: `blake3` hash of `(state_ref || policy_root || shape_root)`; matches `Shape-Root` when no downstream transforms exist. + - `engine`: `{ program, version, platform }` describing the code + runtime that produced the fold. `program` **must** match the fold artifact ID the GitHub App advertises via ADR-0007/0008 APIs. + - `policy_root`: commit that supplied policy + trust inputs. + - `inputs.events[]`: ordered Git commits (`refs/gatos/journal/*`) included in this fold window. + - `inputs.upstreams[]`: upstream state refs, including federation mounts (ADR-0012) and lazily folded units (ADR-0013). Values are Git refs so verifiers can fetch deterministically. + - `output_shape_root`: deterministic digest of the resulting checkpoint. + - `metrics`: lightweight execution stats (`units`, `duration_ms`). + - `ts`: RFC3339 timestamp when the envelope is sealed. + - `content_id`: canonical `blake3` of the encoded envelope bytes so later hops don’t recompute it. +2. **Engine Identity & Key Management** + - Each fold executor advertises an `engine_id = ed25519:` that lives under `refs/gatos/trust/engines/.json` alongside owner metadata, capabilities, and rotation history. Entries obey the same trust-graph validation as other actors (ADR-0003/0005). + - `gatos trust issue-engine --name fold-runner-a` minting writes a grant that limits the engine to `fold.sign` and `fold.publish` capabilities. + - Rotations (`gatos trust rotate-engine`) append `prev`→`next` links; revoked keys move to `refs/gatos/trust/revocations/`. Proof verifiers MUST treat revoked keys as invalid even if older proofs exist. + - Private keys stay inside an HSM or KMS-backed signer; the CLI receives detached signatures only. +3. **Signing & Publication Flow** + - Engines sign `blake3(envelope_bytes)` using their active key. Git trailers on the state commit carry: + - `Proof-Of-Fold: blake3:` (the `content_id`). + - `Fold-Sig: ed25519:#engine:` to bind signature + issuer. + - The full envelope and signature are committed under `refs/gatos/audit/proofs/folds//proof.json`. + - A summary message is published to the message plane (`refs/gatos/messages/proofs/fold/`) so downstream systems (GitHub App webhooks, streaming API) can react without scanning refs. +4. **Verification Workflow** (`gatos fold verify `) + 1. Resolve `refs/gatos/audit/proofs/folds/` and load `proof.json`. + 2. Validate JSON against `schemas/v1/state/proof_of_fold_envelope.schema.json` (AJV inside the CLI / CI job). + 3. Recompute `content_id` and ensure it matches the trailer + stored digest. + 4. Resolve `engine_id` via the trust graph (`refs/gatos/trust/graph.json` + `refs/gatos/trust/engines/*`), confirm the key is active, and verify the ed25519 signature. + 5. Replay the fold deterministically (reusing ADR-0013 partial caches when available) and ensure the recomputed `Shape-Root` equals `output_shape_root`. + 6. Emit a verification report event to `refs/gatos/messages/proofs/fold/` and expose the status through ADR-0007 GraphQL fields and ADR-0008 webhooks so the GitHub App integration can gate merges. +5. **Failure Modes & Observability** + - Verification failures write structured commits under `refs/gatos/audit/proofs/folds//failures/` explaining whether the schema, signature, trust, or execution step failed. + - CLI/daemon metrics mirror execution stats (`fold.verify.duration_ms`, `fold.verify.replays`). + - Policy can require “verified” proofs before advancing `refs/gatos/state/*`; if this gate fires, `gatos watch` surfaces a `deny.write` event (per `schemas/v1/watch/events.schema.json`). ```mermaid sequenceDiagram @@ -44,8 +68,10 @@ sequenceDiagram ## Consequences - Auditable state derivations; reproducibility at the protocol layer. -- Requires key management for fold engines. +- Fold engine keys become first-class trust graph actors (issuance, rotation, revocation flows). +- Message plane consumers and the GitHub App can react to proof publication/verification without scraping refs. ## Open Questions -- Do we include WASM module hash for portable fold engines in v1? -- Should Proof-of-Fold signatures be batched (multi-unit proofs) or per-state only? +- Should we embed the WASM module hash + OCI digest inside `engine.program` for portable fold engines, or add a dedicated field? +- Do we need aggregate proofs that cover multiple state refs (e.g., nightly rollups), or is per-state sufficient for GA? +- How should we persist verification artifacts (logs, re-run traces) so remote verifiers can audit without replaying everything locally? diff --git a/docs/decisions/ARD-FEEDBACK.md b/docs/decisions/ARD-FEEDBACK.md new file mode 100644 index 00000000..84d9980b --- /dev/null +++ b/docs/decisions/ARD-FEEDBACK.md @@ -0,0 +1,149 @@ +# ADR Feedback Checklist + +## ADR-0007 – Expand evolution story & error visuals + +- [x] Resolved + +> [!WARNING]- **Recommendation** +> +> Document a concrete GraphQL field deprecation schedule (e.g., warning periods, sunset dates) and add an error-handling sequence/flow diagram so operators can anticipate failure paths. +> +> ### LLM Prompt +> +> ```text +> You are a GraphQL API architect. Draft a “Schema Evolution & Error Surfacing” section for ADR-0007 that: +> +> 1. Defines deprecation phases (announce, dual-serve, removal) with timelines in weeks. +> 2. Explains how errors propagate (policy denied vs. invalid ref) with an auxiliary diagram description. +> +> Keep the tone consistent with existing ADRs. +> ``` +> + +--- + +## ADR-0008 – Flesh out auth scopes & webhook DLQ operations + +- [ ] Resolved + +> [!WARNING]- **Recommendation** +> +> +> Add a table mapping OAuth scopes → command prefixes and specify observability/cleanup for the webhook dead-letter queue (visibility APIs, retention, alerting). +> +> ### LLM Prompt +> +> ```text +You are documenting a REST command surface. Produce a subsection for ADR-0008 covering: +> +> - A table mapping OAuth scopes to command/resource prefixes. +> - DLQ management: how operators list, replay, or purge failed webhooks, including retention guarantees. +> +> +> Output markdown suitable for an ADR. +> ``` + +--- + +## ADR-0009 – Clarify replay/backpressure across federated nodes + +- [ ] Resolved + +> [!WARNING]- **Recommendation** +> +> Describe how sequence IDs and buffering work when streams traverse multiple nodes, and augment diagrams with replay/error/credit flows. +> +> ### LLM Prompt +> +> ```text +Pretend you run a multi-node ref streaming service. Explain for ADR-0009: +> +> 1. How seq/credit propagation works when a client connects through a federation proxy. +> 2. Failure handling when replay windows expire mid-hop. +> +> Include guidance for an updated diagram (text description sufficient). +> ``` + +--- + +## ADR-0010 – Resolve PR approvals & multi-repo watcher strategy + +- [ ] Resolved + +> [!WARNING]- **Recommendation** +> +> +> Decide whether GitHub approvals can satisfy governance approvals and document how installations spanning many repos share watcher queues/check workloads. +> +> ### LLM Prompt +> +> ```text +> As the GitHub App owner, write a section for ADR-0010 that: +> +> - States the policy for mapping PR approvals to governance approvals (allowed? constraints?). +> - Details queue partitioning when one installation manages N repos (e.g., sharded workers, priority order). +> +> Provide actionable guidance. +> ``` + +--- + +## ADR-0011 – Add security/scale envelope for exports + +- [ ] Resolved + +> [!WARNING]- **Recommendation** +> +> Describe how sensitive data is redacted, expected dataset sizes / resource requirements, and include a sample manifest/table appendix to remove ambiguity. +> +> ### LLM Prompt +> +> ```text +> Channel your inner data engineer. Extend ADR-0011 with: +> +> - A “Security & Resource Envelope” section (storage limits, IAM expectations, pointer redaction guarantees). +> - An example export manifest snippet plus one CREATE TABLE statement. +> +> Keep everything deterministic. +> ``` + +--- + +## ADR-0012 – Decide on federation discovery/gossip + +- [ ] Resolved + +> [!WARNING]- **Recommendation** +> +> Either commit to manual-only `.gatos/federation.yaml` or outline the gossip/discovery protocol (trust anchors, rate limits) so operators know what to expect. +> +> ### LLM Prompt +> +> ```text +> Act as the federation architect. Draft text for ADR-0012 that answers: +> +> - Do we support automatic mount discovery? If yes, describe the gossip protocol, trust requirements, and operator controls. If no, justify manual-only. +> +> Provide clear, testable language. +> ``` + +--- + +## ADR-0013 – Decide on prewarming & shared caches + +- [ ] Resolved + +> [!WARNING]- **Recommendation** +> +> Specify whether background prewarming is supported and define how cache stores behave across multiple worktrees/nodes (shared path, eviction policy). +> +> ### LLM Prompt +> +> ```text +> From the perspective of the fold-engine owner, write guidance for ADR-0013 covering: +> +> - Policy for background prewarming (allowed? triggers? safeguards?). +> - Strategy for sharing caches between worktrees/nodes, including locking and eviction rules. +> +> Output concise markdown bullets. +>``` diff --git a/examples/v1/api/command_envelope_min.json b/examples/v1/api/command_envelope_min.json new file mode 100644 index 00000000..333378da --- /dev/null +++ b/examples/v1/api/command_envelope_min.json @@ -0,0 +1,10 @@ +{ + "type": "locks.acquire", + "args": { + "path": "docs/plan.md", + "grant": "gto_grant_ulid" + }, + "expect_state": "0123456789abcdef0123456789abcdef01234567", + "request_id": "01HZY6QF7PC2VF9R8Y9YH6F4Q2", + "sync": true +} diff --git a/examples/v1/api/graphql_state_mapping_min.json b/examples/v1/api/graphql_state_mapping_min.json new file mode 100644 index 00000000..868a1791 --- /dev/null +++ b/examples/v1/api/graphql_state_mapping_min.json @@ -0,0 +1,8 @@ +{ + "types": [ + { + "graphqlType": "Query.state", + "statePath": "state.public.main" + } + ] +} diff --git a/examples/v1/api/stream_frame_sub.json b/examples/v1/api/stream_frame_sub.json new file mode 100644 index 00000000..3e0fbb08 --- /dev/null +++ b/examples/v1/api/stream_frame_sub.json @@ -0,0 +1,6 @@ +{ + "op": "sub", + "refs": ["refs/gatos/state/public/main"], + "topics": ["governance"], + "sinceSeq": 0 +} diff --git a/examples/v1/api/webhook_delivery_min.json b/examples/v1/api/webhook_delivery_min.json new file mode 100644 index 00000000..5aea60e9 --- /dev/null +++ b/examples/v1/api/webhook_delivery_min.json @@ -0,0 +1,11 @@ +{ + "id": "01HZY6R8DTN97Z6AV1ZK1X6QGQ", + "event": "proposal.created", + "ts": "2025-11-17T12:00:00Z", + "payload": { + "proposal_id": "blake3:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", + "title": "Increase fold memory" + }, + "attempt": 1, + "signature": "4f8b0c1d2e3f4a5b6c7d8e9f0a1b2c3d" +} diff --git a/examples/v1/export/export_manifest_min.json b/examples/v1/export/export_manifest_min.json new file mode 100644 index 00000000..44310a91 --- /dev/null +++ b/examples/v1/export/export_manifest_min.json @@ -0,0 +1,17 @@ +{ + "format": "sqlite", + "state_ref": "89abcdef0123456789abcdef0123456789abcdef", + "commit_range": { + "start": "0123456789abcdef0123456789abcdef01234567", + "end": "fedcba9876543210fedcba9876543210fedcba98" + }, + "tables": { + "commits": true, + "events": true, + "state_nodes": true, + "pointers": false, + "jobs": false, + "governance": true + }, + "created_at": "2025-11-17T11:59:00Z" +} diff --git a/examples/v1/federation/mounts_min.json b/examples/v1/federation/mounts_min.json new file mode 100644 index 00000000..8e62ec1b --- /dev/null +++ b/examples/v1/federation/mounts_min.json @@ -0,0 +1,18 @@ +{ + "mounts": [ + { + "name": "policies-upstream", + "source": "git+https://example.com/upstream.git#refs/heads/main", + "verify": "ed25519:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", + "refresh": "PT5M", + "auth": { + "kind": "token", + "token_env": "UPSTREAM_TOKEN" + }, + "policy": { + "trusted_refs": ["refs/heads/main"], + "max_depth": 128 + } + } + ] +} diff --git a/examples/v1/message-plane/consumer_checkpoint_min.json b/examples/v1/message-plane/consumer_checkpoint_min.json new file mode 100644 index 00000000..f2bb1ba2 --- /dev/null +++ b/examples/v1/message-plane/consumer_checkpoint_min.json @@ -0,0 +1,4 @@ +{ + "ulid": "01HZY75D9EKA7Q3B8VYF4CQ3PG", + "commit": "abcdefabcdefabcdefabcdefabcdefabcdefabcd" +} diff --git a/examples/v1/message-plane/event_envelope_min.json b/examples/v1/message-plane/event_envelope_min.json new file mode 100644 index 00000000..6f77610f --- /dev/null +++ b/examples/v1/message-plane/event_envelope_min.json @@ -0,0 +1,12 @@ +{ + "ulid": "01HZY74CE0R9X5ZG9S7B5MQG9S", + "ns": "governance", + "type": "proposal.created", + "payload": { + "proposal_id": "blake3:bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb", + "title": "Add proof-of-fold checks" + }, + "refs": { + "state": "blake3:cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc" + } +} diff --git a/examples/v1/privacy/opaque_pointer_min.json b/examples/v1/privacy/opaque_pointer_min.json new file mode 100644 index 00000000..7eab69fd --- /dev/null +++ b/examples/v1/privacy/opaque_pointer_min.json @@ -0,0 +1,8 @@ +{ + "kind": "opaque_pointer", + "algo": "blake3", + "digest": "blake3:dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd", + "size": 4096, + "location": "https://vault.example.com/blob/123", + "capability": "kms://aws/alias/gatos/fold" +} diff --git a/examples/v1/state/proof_of_fold_envelope_min.json b/examples/v1/state/proof_of_fold_envelope_min.json new file mode 100644 index 00000000..bc88eaf0 --- /dev/null +++ b/examples/v1/state/proof_of_fold_envelope_min.json @@ -0,0 +1,25 @@ +{ + "fold_id": "blake3:1111111111111111111111111111111111111111111111111111111111111111", + "engine": { + "program": "gatos-fold@sha256:a1b2c3d4", + "version": "1.5.0", + "platform": "linux/amd64" + }, + "policy_root": "abcdefabcdefabcdefabcdefabcdefabcdefabcd", + "inputs": { + "events": [ + "refs/gatos/journal/main/0001", + "refs/gatos/journal/main/0002" + ], + "upstreams": [ + "refs/gatos/state/federated/policy@0123456789abcdef0123456789abcdef01234567" + ] + }, + "output_shape_root": "blake3:2222222222222222222222222222222222222222222222222222222222222222", + "metrics": { + "units": 12, + "duration_ms": 845 + }, + "ts": "2025-11-17T11:58:30Z", + "content_id": "blake3:3333333333333333333333333333333333333333333333333333333333333333" +} diff --git a/examples/v1/watch/event_min.json b/examples/v1/watch/event_min.json new file mode 100644 index 00000000..8813afd7 --- /dev/null +++ b/examples/v1/watch/event_min.json @@ -0,0 +1,11 @@ +{ + "ts": "2025-11-17T12:01:00Z", + "actor": "git:alice", + "path": "refs/gatos/state/public/main", + "rule": "state.requires-proof", + "action": "deny.write", + "remediation": "Run gatos fold verify before pushing", + "details": { + "state_ref": "89abcdef0123456789abcdef0123456789abcdef" + } +} diff --git a/schemas/v1/api/command_envelope.schema.json b/schemas/v1/api/command_envelope.schema.json index 99761112..5c6a02b1 100644 --- a/schemas/v1/api/command_envelope.schema.json +++ b/schemas/v1/api/command_envelope.schema.json @@ -1,5 +1,5 @@ { - "$schema": "http://json-schema.org/draft-07/schema#", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://gatos.io/schemas/v1/api/command_envelope.schema.json", "title": "GATOS Command Envelope", "type": "object", diff --git a/schemas/v1/api/graphql_state_mapping.schema.json b/schemas/v1/api/graphql_state_mapping.schema.json index 8fa53078..f686fe24 100644 --- a/schemas/v1/api/graphql_state_mapping.schema.json +++ b/schemas/v1/api/graphql_state_mapping.schema.json @@ -1,5 +1,5 @@ { - "$schema": "http://json-schema.org/draft-07/schema#", + "$schema": "https://json-schema.org/draft/2020-12/schema", "title": "GraphQL State Mapping (v1)", "type": "object", "required": ["types"], diff --git a/schemas/v1/api/stream_frame.schema.json b/schemas/v1/api/stream_frame.schema.json index f98edc7e..0f6fe9f6 100644 --- a/schemas/v1/api/stream_frame.schema.json +++ b/schemas/v1/api/stream_frame.schema.json @@ -1,5 +1,5 @@ { - "$schema": "http://json-schema.org/draft-07/schema#", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://gatos.io/schemas/v1/api/stream_frame.schema.json", "title": "GATOS Stream Frame", "type": "object", diff --git a/schemas/v1/api/webhook_delivery.schema.json b/schemas/v1/api/webhook_delivery.schema.json index 5a3ddd47..7425b74b 100644 --- a/schemas/v1/api/webhook_delivery.schema.json +++ b/schemas/v1/api/webhook_delivery.schema.json @@ -1,5 +1,5 @@ { - "$schema": "http://json-schema.org/draft-07/schema#", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://gatos.io/schemas/v1/api/webhook_delivery.schema.json", "title": "GATOS Webhook Delivery", "type": "object", diff --git a/schemas/v1/export/export_manifest.schema.json b/schemas/v1/export/export_manifest.schema.json index 30aa10bf..84100f01 100644 --- a/schemas/v1/export/export_manifest.schema.json +++ b/schemas/v1/export/export_manifest.schema.json @@ -1,5 +1,5 @@ { - "$schema": "http://json-schema.org/draft-07/schema#", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://gatos.io/schemas/v1/export/export_manifest.schema.json", "title": "GATOS Export Manifest", "type": "object", diff --git a/schemas/v1/federation/mounts.schema.json b/schemas/v1/federation/mounts.schema.json index e78fd4ee..1a33875e 100644 --- a/schemas/v1/federation/mounts.schema.json +++ b/schemas/v1/federation/mounts.schema.json @@ -1,5 +1,5 @@ { - "$schema": "http://json-schema.org/draft-07/schema#", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://gatos.io/schemas/v1/federation/mounts.schema.json", "title": "GATOS Federation Mounts", "type": "object", diff --git a/schemas/v1/message-plane/consumer_checkpoint.schema.json b/schemas/v1/message-plane/consumer_checkpoint.schema.json index cb212793..bc08ec9e 100644 --- a/schemas/v1/message-plane/consumer_checkpoint.schema.json +++ b/schemas/v1/message-plane/consumer_checkpoint.schema.json @@ -1,5 +1,5 @@ { - "$schema": "http://json-schema.org/draft-07/schema#", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://gatos.io/schemas/v1/message-plane/consumer_checkpoint.schema.json", "title": "Message Plane Consumer Checkpoint", "description": "Schema for a Message Plane consumer checkpoint.", diff --git a/schemas/v1/message-plane/event_envelope.schema.json b/schemas/v1/message-plane/event_envelope.schema.json index 7b585a4b..c311bc7c 100644 --- a/schemas/v1/message-plane/event_envelope.schema.json +++ b/schemas/v1/message-plane/event_envelope.schema.json @@ -1,5 +1,5 @@ { - "$schema": "http://json-schema.org/draft-07/schema#", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://gatos.io/schemas/v1/message-plane/event_envelope.schema.json", "title": "Message Plane Envelope", "description": "Schema for a Message Plane event envelope.", diff --git a/schemas/v1/privacy/opaque_pointer.schema.json b/schemas/v1/privacy/opaque_pointer.schema.json index fae379fb..fc6aebcd 100644 --- a/schemas/v1/privacy/opaque_pointer.schema.json +++ b/schemas/v1/privacy/opaque_pointer.schema.json @@ -1,5 +1,5 @@ { - "$schema": "http://json-schema.org/draft-07/schema#", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://gatos.io/schemas/v1/privacy/opaque_pointer.schema.json", "title": "Opaque Pointer", "description": "Schema for an opaque pointer envelope, referencing private data.", diff --git a/schemas/v1/state/proof_of_fold_envelope.schema.json b/schemas/v1/state/proof_of_fold_envelope.schema.json index 0fb1e060..ab3f95c2 100644 --- a/schemas/v1/state/proof_of_fold_envelope.schema.json +++ b/schemas/v1/state/proof_of_fold_envelope.schema.json @@ -1,5 +1,5 @@ { - "$schema": "http://json-schema.org/draft-07/schema#", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://gatos.io/schemas/v1/state/proof_of_fold_envelope.schema.json", "title": "Proof of Fold Envelope", "type": "object", diff --git a/schemas/v1/watch/events.schema.json b/schemas/v1/watch/events.schema.json index a75b16e5..376e9f79 100644 --- a/schemas/v1/watch/events.schema.json +++ b/schemas/v1/watch/events.schema.json @@ -1,5 +1,5 @@ { - "$schema": "http://json-schema.org/draft-07/schema#", + "$schema": "https://json-schema.org/draft/2020-12/schema", "title": "GATOS Watcher Event (v1)", "type": "object", "required": ["ts", "actor", "path", "rule", "action"], diff --git a/scripts/validate_schemas.sh b/scripts/validate_schemas.sh index da72b254..ec228299 100755 --- a/scripts/validate_schemas.sh +++ b/scripts/validate_schemas.sh @@ -36,14 +36,25 @@ if [ "$DO_COMPILE" -eq 1 ]; then echo "[schemas] Compiling JSON Schemas (v1)…" SCHEMAS=( "schemas/v1/common/ids.schema.json" - "schemas/v1/job/job_manifest.schema.json" - "schemas/v1/job/proof_of_execution_envelope.schema.json" + "schemas/v1/api/command_envelope.schema.json" + "schemas/v1/api/graphql_state_mapping.schema.json" + "schemas/v1/api/stream_frame.schema.json" + "schemas/v1/api/webhook_delivery.schema.json" + "schemas/v1/export/export_manifest.schema.json" + "schemas/v1/federation/mounts.schema.json" "schemas/v1/governance/proposal.schema.json" "schemas/v1/governance/approval.schema.json" "schemas/v1/governance/grant.schema.json" "schemas/v1/governance/revocation.schema.json" "schemas/v1/governance/proof_of_consensus_envelope.schema.json" + "schemas/v1/job/job_manifest.schema.json" + "schemas/v1/job/proof_of_execution_envelope.schema.json" + "schemas/v1/message-plane/event_envelope.schema.json" + "schemas/v1/message-plane/consumer_checkpoint.schema.json" "schemas/v1/policy/governance_policy.schema.json" + "schemas/v1/privacy/opaque_pointer.schema.json" + "schemas/v1/state/proof_of_fold_envelope.schema.json" + "schemas/v1/watch/events.schema.json" ) for schema in "${SCHEMAS[@]}"; do @@ -59,13 +70,24 @@ fi if [ "$DO_VALIDATE" -eq 1 ]; then echo "[schemas] Validating example documents (v1)…" declare -A EXAMPLES=( - ["schemas/v1/job/job_manifest.schema.json"]="examples/v1/job/manifest_min.json" - ["schemas/v1/job/proof_of_execution_envelope.schema.json"]="examples/v1/job/poe_min.json" + ["schemas/v1/api/command_envelope.schema.json"]="examples/v1/api/command_envelope_min.json" + ["schemas/v1/api/graphql_state_mapping.schema.json"]="examples/v1/api/graphql_state_mapping_min.json" + ["schemas/v1/api/stream_frame.schema.json"]="examples/v1/api/stream_frame_sub.json" + ["schemas/v1/api/webhook_delivery.schema.json"]="examples/v1/api/webhook_delivery_min.json" + ["schemas/v1/export/export_manifest.schema.json"]="examples/v1/export/export_manifest_min.json" + ["schemas/v1/federation/mounts.schema.json"]="examples/v1/federation/mounts_min.json" ["schemas/v1/governance/proposal.schema.json"]="examples/v1/governance/proposal_min.json" ["schemas/v1/governance/approval.schema.json"]="examples/v1/governance/approval_min.json" ["schemas/v1/governance/grant.schema.json"]="examples/v1/governance/grant_min.json" ["schemas/v1/governance/revocation.schema.json"]="examples/v1/governance/revocation_min.json" ["schemas/v1/governance/proof_of_consensus_envelope.schema.json"]="examples/v1/governance/poc_envelope_min.json" + ["schemas/v1/job/job_manifest.schema.json"]="examples/v1/job/manifest_min.json" + ["schemas/v1/job/proof_of_execution_envelope.schema.json"]="examples/v1/job/poe_min.json" + ["schemas/v1/message-plane/event_envelope.schema.json"]="examples/v1/message-plane/event_envelope_min.json" + ["schemas/v1/message-plane/consumer_checkpoint.schema.json"]="examples/v1/message-plane/consumer_checkpoint_min.json" + ["schemas/v1/privacy/opaque_pointer.schema.json"]="examples/v1/privacy/opaque_pointer_min.json" + ["schemas/v1/state/proof_of_fold_envelope.schema.json"]="examples/v1/state/proof_of_fold_envelope_min.json" + ["schemas/v1/watch/events.schema.json"]="examples/v1/watch/event_min.json" ) for schema in "${!EXAMPLES[@]}"; do From 9d5370806b949a4d605e717a047470daeaa3a38c Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Mon, 17 Nov 2025 16:02:13 -0800 Subject: [PATCH 14/25] docs: clarify REST scopes and dlq ops --- docs/decisions/ADR-0008/DECISION.md | 20 ++++++++++++++++++++ docs/decisions/ARD-FEEDBACK.md | 2 +- 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/docs/decisions/ADR-0008/DECISION.md b/docs/decisions/ADR-0008/DECISION.md index 9b328c72..7337f503 100644 --- a/docs/decisions/ADR-0008/DECISION.md +++ b/docs/decisions/ADR-0008/DECISION.md @@ -36,6 +36,26 @@ Commands are side-effecting; REST is adequate and tool-friendly. Integrations ne 4. **AuthN/Z**: OAuth2/JWT bearer tokens, validated against ADR-0003 policy. Scopes map to command prefixes (e.g., `locks:*`, `jobs:*`). Webhook secrets are per subscription; rotation takes effect immediately and old secrets expire after 5 minutes. 5. **HTTP Codes**: `202` (async ack), `200` (sync success), `400` (schema validation error), `401/403` (auth failures), `409` (`EXPECT_STATE_MISMATCH`), `422` (`COMMAND_UNSUPPORTED`), `500` (unhandled). +## Auth Scopes & Webhook DLQ Operations +1. **OAuth2 Scope Matrix** + +| Scope | Command Prefixes / Resources | Notes | +| :--- | :--- | :--- | +| `locks:*` | `locks.acquire`, `locks.release`, `locks.status` | Required for any mutation touching refs guarded by the watcher plane. | +| `jobs:*` | `jobs.claim`, `jobs.complete`, `jobs.retry` | Grants access to job lifecycle commands plus Message Plane job topics. | +| `policy:*` | `policy.evaluate`, `policy.override` | High-privilege scope gated by ADR-0003 trust rules; only issued to governance automation. | +| `state.export` | `state.export`, `state.fold.force` | Limited-use scope for export/fold operators; cannot touch locks/jobs. | +| `webhooks:*` | `GET/POST/DELETE /api/v1/webhooks*` | Needed to CRUD webhook subscriptions and manage DLQ entries. | + +Servers MUST reject commands whose prefix is not covered by the caller’s scope set; errors return `401` with `WWW-Authenticate: scope="missing-scope"`. + +2. **Dead-Letter Queue Management** + - **Visibility**: `GET /api/v1/webhooks/{id}/dlq` lists parked deliveries newest-first, capped at 500 entries. Each entry includes `delivery_id`, `event`, `attempts`, `last_error`, and `expires_at`. + - **Replay**: `POST /api/v1/webhooks/{id}/dlq/{delivery_id}/replay` moves the entry back to the active queue immediately and increments an `operator_replay` counter recorded under `refs/gatos/audit/webhooks//`. + - **Purge**: `DELETE /api/v1/webhooks/{id}/dlq/{delivery_id}` permanently removes the entry (audited with the same trail as replay). Full purge requires `webhooks:*` scope and emits a governance event when more than 50 entries are deleted at once. + - **Retention**: Entries auto-expire after 30 days; the DLQ sweeper emits a `webhook.dlq.expired` event for observability. + - **Alerting**: When a DLQ exceeds 100 entries or the oldest entry is >24h, the system emits `state.failed` + `webhook.dlq.threshold` events so operators can wire alerts without polling. + ```mermaid sequenceDiagram participant Client diff --git a/docs/decisions/ARD-FEEDBACK.md b/docs/decisions/ARD-FEEDBACK.md index 84d9980b..4b83f01d 100644 --- a/docs/decisions/ARD-FEEDBACK.md +++ b/docs/decisions/ARD-FEEDBACK.md @@ -24,7 +24,7 @@ ## ADR-0008 – Flesh out auth scopes & webhook DLQ operations -- [ ] Resolved +- [x] Resolved > [!WARNING]- **Recommendation** > From f74d9db7322bfd563188acfa6bd5c778d10f8805 Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Mon, 17 Nov 2025 16:03:01 -0800 Subject: [PATCH 15/25] docs: document federated stream semantics --- docs/decisions/ADR-0009/DECISION.md | 11 +++++++++++ docs/decisions/ARD-FEEDBACK.md | 2 +- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/docs/decisions/ADR-0009/DECISION.md b/docs/decisions/ADR-0009/DECISION.md index 788f3328..4d67e3b3 100644 --- a/docs/decisions/ADR-0009/DECISION.md +++ b/docs/decisions/ADR-0009/DECISION.md @@ -40,6 +40,17 @@ UIs and workers need near-real-time updates without wasteful polling. 8. **Errors & Close Codes**: Protocol errors result in immediate close with WebSocket code `1008`. The final frame MAY include `{kind:"error", code:"INVALID_SUB", message:"..."}`. +## Federated Streaming Semantics +1. **Seq & Credit Propagation** + - Federation proxies treat their upstream node as a virtual client: they forward `sub` frames and maintain a local `seq_proxy` counter while the origin node keeps `seq_origin`. Frames forwarded downstream include both values (`seq_origin` inside payload, `seq_proxy` in the envelope) so clients can dedupe after failover. + - Backpressure travels hop-by-hop. When a client sends `credit: N`, the proxy immediately decrements its local window and only propagates a refreshed credit upstream once it drains the buffered frames. This prevents head-of-line blocking across tenants. + - If a proxy exhausts credit while waiting on the downstream client, it pauses reads from the upstream socket and emits `kind:"ping", credit:0` every 5s to signal stalling. Upstream nodes close idle links after 30s with `1001`. +2. **Replay Windows Across Hops** + - On reconnect, clients include `sinceSeq=`. The proxy first replays from its buffer; if the requested range predates its retention, it escalates to the origin via `sinceSeq=`. Should the origin also lack history, the proxy emits `{kind:"error", code:"REPLAY_EXPIRED", missingSeq:}` to the client and instructs it to fall back to a full ref sync before resubscribing. + - Proxies cache up to 10k frames or 15 minutes (whichever is smaller). When upstream expiry occurs mid-hop, the proxy logs `refs/gatos/audit/stream/replay_miss/` and publishes a `stream.replay.miss` message so ops teams can tune buffers. +3. **Diagram Update** + - Add an auxiliary Mermaid diagram titled "Federated Stream" introducing a `Proxy` participant between Client and Stream. Show credit propagation (`Client -> Proxy -> Stream`) and two `alt` blocks: `Replay hit` (Proxy responds locally) and `Replay expired` (Proxy requests upstream, receives miss, emits error). This supplements the base diagram without duplicating it. + ```mermaid sequenceDiagram participant Client diff --git a/docs/decisions/ARD-FEEDBACK.md b/docs/decisions/ARD-FEEDBACK.md index 4b83f01d..3fe5bb54 100644 --- a/docs/decisions/ARD-FEEDBACK.md +++ b/docs/decisions/ARD-FEEDBACK.md @@ -47,7 +47,7 @@ You are documenting a REST command surface. Produce a subsection for ADR-0008 co ## ADR-0009 – Clarify replay/backpressure across federated nodes -- [ ] Resolved +- [x] Resolved > [!WARNING]- **Recommendation** > From cb05c9dba28fcfe6857227bec1dce0eacbd9cf26 Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Mon, 17 Nov 2025 16:06:10 -0800 Subject: [PATCH 16/25] docs: capture GH approval bridge + queue ops --- docs/decisions/ADR-0010/DECISION.md | 21 +++++++++++++++++++++ docs/decisions/ARD-FEEDBACK.md | 2 +- 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/docs/decisions/ADR-0010/DECISION.md b/docs/decisions/ADR-0010/DECISION.md index 58b68c4f..c5f99c4d 100644 --- a/docs/decisions/ADR-0010/DECISION.md +++ b/docs/decisions/ADR-0010/DECISION.md @@ -72,6 +72,27 @@ Most teams live on GitHub; native enforcement and UX reduce friction. - All outbound calls to GitHub use conditional requests (`If-None-Match`) to stay within rate limits. - The App MUST verify GitHub webhook signatures (SHA256) before processing. +## Approvals Mapping & Multi-Repo Operations +1. **PR Approvals ↔ Governance Approvals** + - Disabled by default. `.gatos/github.yaml` may enable bridging via: + ```yaml + approvals: + map_pull_request_reviews: true + reviewers: + - github:team/leads -> trust:@leads + - github:user:alice -> trust:actor:alice + ttl: PT72H + ``` + - When enabled, each `APPROVED` review counts as a `grant.approval` envelope for the mapped trust actor **only if** the reviewer’s GitHub identity resolves in the trust graph. Revoked/rotated keys invalidate outstanding approvals immediately. + - Bridged approvals inherit quorum rules from ADR-0003; if the governance policy requires 2-of-3, the App must see two distinct mapped actors before marking `gatos/policy` passing. Non-mapped reviewers contribute comments only. + - Removal of a mapped approval (e.g., reviewer changes mind) creates a compensating `grant.revocation` event and forces re-evaluation. + +2. **Multi-Repo Queue Partitioning** + - Each installation maintains a shard per repository consisting of: `watcher-queue`, `command-queue`, `check-queue`. Shards are weighted by outstanding PR count (≥1 queue per 50 open PRs). + - Worker pods pull from shards using weighted fair scheduling: latency-sensitive queues (`command-queue`) get 50% of slots, `check-queue` 30%, `watcher-queue` 20% by default; operators can override via `.gatos/github.yaml queues { weights: ... }`. + - Cross-repo storms are mitigated by backpressure: when a shard exceeds 5k pending items or 2 minutes of lag, the App emits `gatos.github.queue.backpressure` events and temporarily downgrades lower-priority shards until lag <30s. + - Audit commits under `refs/gatos/audit/github/queues//` capture rebalancing operations so SREs can trace why work moved across shards. + ```mermaid sequenceDiagram participant Dev as GitHub Developer diff --git a/docs/decisions/ARD-FEEDBACK.md b/docs/decisions/ARD-FEEDBACK.md index 3fe5bb54..990f2178 100644 --- a/docs/decisions/ARD-FEEDBACK.md +++ b/docs/decisions/ARD-FEEDBACK.md @@ -68,7 +68,7 @@ Pretend you run a multi-node ref streaming service. Explain for ADR-0009: ## ADR-0010 – Resolve PR approvals & multi-repo watcher strategy -- [ ] Resolved +- [x] Resolved > [!WARNING]- **Recommendation** > From d011f292c3a87212d81fc1d0377c3c5a69af38fa Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Mon, 17 Nov 2025 16:06:48 -0800 Subject: [PATCH 17/25] docs: define export envelope + samples --- docs/decisions/ADR-0011/DECISION.md | 36 +++++++++++++++++++++++++++++ docs/decisions/ARD-FEEDBACK.md | 2 +- 2 files changed, 37 insertions(+), 1 deletion(-) diff --git a/docs/decisions/ADR-0011/DECISION.md b/docs/decisions/ADR-0011/DECISION.md index d3352a0e..5cb7bcec 100644 --- a/docs/decisions/ADR-0011/DECISION.md +++ b/docs/decisions/ADR-0011/DECISION.md @@ -38,6 +38,42 @@ Teams want ad-hoc analytics without learning internals; SQL + columnar files cov - Rows sorted by primary key; SQLite `PRAGMA user_version` stores the exporter version. - Integrity table `export_info(state_ref TEXT, commit_start TEXT, commit_end TEXT, shape_root TEXT, exported_at TEXT)`. +## Security & Resource Envelope +1. **Data Boundaries** + - Exporter walks only canonical ledger/state; private overlays stay pointerized. Any blob referenced by an opaque pointer is emitted as metadata (`digest`, `location`, `capability`) and never dereferenced; attempts to pull bytes result in a hard failure logged under `refs/gatos/audit/export/`. + - IAM: the CLI/daemon requires a dedicated `state.export` capability grant scoped to the target namespace. S3/GCS destinations must enforce bucket policies preventing public reads; parquet exports inherit those credentials via short-lived federated tokens (≤1h). + +2. **Resource Planning** + - Guidance: expect ~2.5× repo size for Parquet (due to columnar replication) and ~1.2× for SQLite. A repo with 1M events / 200k state nodes consumes ~15 GB Parquet, ~7 GB SQLite. The exporter caps memory use at 1 GB by chunking rows per table; operators can override via `--batch-size`. + - Latency budget: 1 minute per 100k events on SSD-backed runners. Long-running exports emit `export.progress` events every 30s so CI can enforce SLAs. + +3. **Example Artifacts** + - Manifest snippet: + ```json + { + "$schema": "https://gatos.io/schemas/v1/export/export_manifest.schema.json", + "format": "sqlite", + "state_ref": "0123456789abcdef0123456789abcdef01234567", + "commit_range": { "start": "0000...0000", "end": "fedcba9876543210fedcba9876543210fedcba98" }, + "tables": { "commits": true, "events": true, "state_nodes": true, "pointers": true, "jobs": false, "governance": true }, + "created_at": "2025-11-17T12:00:00Z" + } + ``` + - Canonical table definition (SQLite): + ```sql + CREATE TABLE commits ( + id TEXT PRIMARY KEY, + parent_id TEXT, + author TEXT NOT NULL, + ts INTEGER NOT NULL, + message TEXT NOT NULL, + trailers JSON NOT NULL, + shape_root TEXT NOT NULL, + created_ts INTEGER NOT NULL, + updated_ts INTEGER NOT NULL + ); + ``` + ```mermaid flowchart LR A[Ledger] -->|events| B[Exporter] diff --git a/docs/decisions/ARD-FEEDBACK.md b/docs/decisions/ARD-FEEDBACK.md index 990f2178..1f596dca 100644 --- a/docs/decisions/ARD-FEEDBACK.md +++ b/docs/decisions/ARD-FEEDBACK.md @@ -90,7 +90,7 @@ Pretend you run a multi-node ref streaming service. Explain for ADR-0009: ## ADR-0011 – Add security/scale envelope for exports -- [ ] Resolved +- [x] Resolved > [!WARNING]- **Recommendation** > From 6bf73f821a5855c5d071c9a9ee842b53f86fafa7 Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Mon, 17 Nov 2025 16:07:27 -0800 Subject: [PATCH 18/25] docs: lock down federation discovery --- docs/decisions/ADR-0012/DECISION.md | 9 +++++++++ docs/decisions/ARD-FEEDBACK.md | 2 +- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/docs/decisions/ADR-0012/DECISION.md b/docs/decisions/ADR-0012/DECISION.md index c33c4553..9b6b0615 100644 --- a/docs/decisions/ADR-0012/DECISION.md +++ b/docs/decisions/ADR-0012/DECISION.md @@ -54,6 +54,15 @@ Enables decentralized composition (e.g., central governance repo consumed by man - If mount fetch fails (auth/network), mark mount `degraded` and emit an event to `refs/gatos/audit/federation//`. - Policy MUST support forcing a mount to `offline` when `max_depth` is exceeded to avoid cycles. +## Discovery Policy +1. **Manual-Only Mount Registry** + - v1 explicitly rejects automatic discovery/gossip. All mounts MUST be declared in `.gatos/federation.yaml` and land via reviewed commits. This guarantees every cross-repo dependency is auditable and subject to policy review. + - Nodes ignore unsolicited mount advertisements; received gossip packets trigger `federation.discovery.ignored` audit events for visibility. +2. **Rationale & Future Hooks** + - Trust graph coupling: accepting federated state implies extending the verification surface. Manual declaration ensures the `verify` key is pinned to a known trust anchor and that `policy.trusted_refs` is curated. + - Rate limiting & abuse: gossip traffic can be abused for DoS; the manual approach keeps mount changes at git-speed, with existing review/approval flows. + - Operators who need semi-automatic behavior can script `gatos mount inspect ` → YAML patch, but the resulting diff still requires review. A future ADR may revisit gossip once we define signed advertisements + quota enforcement. + ```mermaid graph TD A[Local Repo] -->|mounts| B[Remote Governance Repo] diff --git a/docs/decisions/ARD-FEEDBACK.md b/docs/decisions/ARD-FEEDBACK.md index 1f596dca..724d9937 100644 --- a/docs/decisions/ARD-FEEDBACK.md +++ b/docs/decisions/ARD-FEEDBACK.md @@ -111,7 +111,7 @@ Pretend you run a multi-node ref streaming service. Explain for ADR-0009: ## ADR-0012 – Decide on federation discovery/gossip -- [ ] Resolved +- [x] Resolved > [!WARNING]- **Recommendation** > From 888c43e66e91c4f90558b13f7e9ea3899ab03dee Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Mon, 17 Nov 2025 16:08:02 -0800 Subject: [PATCH 19/25] docs: define fold prewarm + cache policy --- docs/decisions/ADR-0013/DECISION.md | 10 ++++++++++ docs/decisions/ARD-FEEDBACK.md | 2 +- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/docs/decisions/ADR-0013/DECISION.md b/docs/decisions/ADR-0013/DECISION.md index b0d68ec6..13b270e1 100644 --- a/docs/decisions/ADR-0013/DECISION.md +++ b/docs/decisions/ADR-0013/DECISION.md @@ -48,6 +48,16 @@ Large repos need sub-linear recomputation to stay responsive. - Commit trailers include `Fold-Cache-Hit`, `Fold-Cache-Miss`, `Fold-Units`, `Fold-Duration`, `Fold-Parallelism`. - Metrics exported via Prometheus: `gatos_fold_unit_duration_ms`, `gatos_fold_cache_utilization`. +## Prewarming & Shared Cache Policy +1. **Background Prewarming** + - Allowed only when `--prewarm` flag is explicitly set or `fold.prewarm=true` in `.gatos/fold_units.yaml`. Prewarm jobs enqueue idle-time recomputation for units touched in the last 24h and MUST honor global concurrency limits to avoid starving foreground folds. + - Policy gate: if governance rules forbid speculative compute (e.g., sensitive namespaces), the fold daemon skips prewarming automatically and logs `fold.prewarm.skipped`. + - Prewarm runs emit `Fold-Prewarm` trailers with the list of units warmed so auditors can correlate CPU usage. +2. **Shared Cache Stores** + - Default path: `${GATOS_CACHE_ROOT:-.gatos/cache}/fold-units`. Multi-worktree setups point `GATOS_CACHE_ROOT` to a shared volume; locks are implemented via `flock` on `cache.lock` plus per-unit `.lck` files to prevent double writes. + - Eviction policy: LRU capped at 50k units or 200 GB, whichever comes first. Operators may override via env vars `GATOS_CACHE_MAX_UNITS` / `GATOS_CACHE_MAX_BYTES`. + - When multiple nodes share the cache (e.g., NFS), cache metadata includes the producing host ID; stale entries older than `policy_root` or with missing blobs are purged during startup sweep, and the purge is recorded under `refs/gatos/audit/fold-cache/`. + ```mermaid graph TD E1[Event Stream] --> P1[Plan Affected Units] diff --git a/docs/decisions/ARD-FEEDBACK.md b/docs/decisions/ARD-FEEDBACK.md index 724d9937..a3463247 100644 --- a/docs/decisions/ARD-FEEDBACK.md +++ b/docs/decisions/ARD-FEEDBACK.md @@ -131,7 +131,7 @@ Pretend you run a multi-node ref streaming service. Explain for ADR-0009: ## ADR-0013 – Decide on prewarming & shared caches -- [ ] Resolved +- [x] Resolved > [!WARNING]- **Recommendation** > From 1dd22a188a299943d7ec934a2ebac2df3e2849a8 Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Mon, 17 Nov 2025 17:05:51 -0800 Subject: [PATCH 20/25] chore: Cargo.lock update --- Cargo.lock | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/Cargo.lock b/Cargo.lock index 4fba9fab..2b9bb217 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -93,6 +93,17 @@ version = "0.7.6" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "7c02d123df017efcdfbd739ef81735b36c5ba83ec3c59c80a9d7ecc718f92e50" +[[package]] +name = "async-trait" +version = "0.1.89" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9035ad2d096bed7955a320ee7e2230574d28fd3c3a0f186cbea1ff3c7eed5dbb" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + [[package]] name = "autocfg" version = "1.5.0" @@ -463,6 +474,15 @@ dependencies = [ "libc", ] +[[package]] +name = "gatos-graphql" +version = "0.1.0" +dependencies = [ + "async-trait", + "serde", + "thiserror", +] + [[package]] name = "gatos-kv" version = "0.1.0" @@ -1516,6 +1536,26 @@ dependencies = [ "libc", ] +[[package]] +name = "thiserror" +version = "1.0.69" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b6aaf5339b578ea85b50e080feb250a3e8ae8cfcdff9a461c9ec2904bc923f52" +dependencies = [ + "thiserror-impl", +] + +[[package]] +name = "thiserror-impl" +version = "1.0.69" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4fee6c4efc90059e10f81e6d42c60a18f76588c3d74cb83a0b242a2b6c7504c1" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + [[package]] name = "thread_local" version = "1.1.9" From 4d285eb0e67d4323eda806aafa6df96a6f14eff6 Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Tue, 18 Nov 2025 08:15:08 -0800 Subject: [PATCH 21/25] docs: normalize ADR metadata and roadmap --- README.md | 4 +- ROADMAP.md | 484 ---------------------------- docs/ROADMAP.md | 22 ++ docs/decisions/ADR-0001/DECISION.md | 8 + docs/decisions/ADR-0004/DECISION.md | 24 +- docs/decisions/ADR-0007/DECISION.md | 4 +- docs/decisions/ADR-0008/DECISION.md | 2 + docs/decisions/ADR-0009/DECISION.md | 2 + docs/decisions/ADR-0010/DECISION.md | 3 + docs/decisions/ADR-0011/DECISION.md | 17 +- docs/decisions/ADR-0012/DECISION.md | 4 +- docs/decisions/ADR-0013/DECISION.md | 3 + docs/decisions/ADR-0014/DECISION.md | 2 + docs/exporter.md | 2 +- 14 files changed, 78 insertions(+), 503 deletions(-) delete mode 100644 ROADMAP.md diff --git a/README.md b/README.md index 43b864fc..800c0dc7 100644 --- a/README.md +++ b/README.md @@ -220,7 +220,7 @@ See also: Deterministic Lua profile for policies/folds: [docs/deterministic-lua. ## Contributing -🚧 GATOS is currently under construction, but you can check out the [ROADMAP](./ROADMAP.md). 🗺️ +🚧 GATOS is currently under construction, but you can check out the [ROADMAP](./docs/ROADMAP.md). 🗺️ **Currently Working On:** Conceptualization & Planning Phase @@ -246,7 +246,7 @@ See also: Deterministic Lua profile for policies/folds: [docs/deterministic-lua. > We are looking for design partners in **scientific research**, **regulated fintech**, and **AI alignment**. If you're interested in GATOS, please get in touch. [james@flyingrobots.dev](mailto:james@flyingrobots.dev) * [Read the Specification](./docs/SPEC.md) -* [View the Roadmap](./ROADMAP.md) +* [View the Roadmap](./docs/ROADMAP.md) * [Join the Discussion](https://github.com/flyingrobots/gatos/discussions) --- diff --git a/ROADMAP.md b/ROADMAP.md deleted file mode 100644 index feb3c700..00000000 --- a/ROADMAP.md +++ /dev/null @@ -1,484 +0,0 @@ -# 🐈‍⬛ GATOS ROADMAP - -**Git As The Operating Surface — A Truth Machine for Distributed Systems & Science** - -This roadmap outlines the path from **0 lines of code** to the **first reproducible scientific experiment** verified end-to-end with GATOS. - -It follows a strict **proof-first** philosophy: - -- **Proof-of-Fold (PoF)** — state is verifiably derived from history. -- **Proof-of-Execution (PoE)** — jobs are verifiably executed. -- **Proof-of-Experiment (PoX)** — experiments are verifiably reproducible. - ---- - -## Guiding Principles - -- **Proof-first Design** — Every claim is verifiable from first principles. -- **Deterministic by Construction** — Same history + same policy = same state, bit-for-bit. -- **Git as History, not Database** — Git stores Message Plane events, checkpoints, and proofs; bulk data lives behind Opaque Pointers; heavy analytics via Explorer off-ramp. -- **Research Profile Defaults** — A conservative profile for scientific reproducibility (PoF required, policy FF-only, anchored audit refs). -- **At-Least-Once + Idempotency** — Delivery is at-least-once; consumers dedupe idempotently. No “exactly-once” fairy tales. - ---- - -## Global Non-Goals (for the initial phases) - -These are explicit non-goals until after the core truth machine is working: - -- A fully featured **multi-peer networking layer** (start single-node). -- A **cluster scheduler** or full-blown job orchestration system. -- A replacement for **Kafka** or high-throughput brokers (Message Plane stays Git-native, not a hosted queue). -- A hosted “GATOS Cloud” product. -- Strong isolation / capability-based sandboxing beyond basic VM guarantees - - (initial focus is determinism and correctness, not perfect sandbox security). - ---- - -## Milestones Overview - -| Milestone | Goal | -|----------|------| -| **M0** | Repo, scaffolding, canonicalization, ADR process | -| **M1** | EchoLua fold engine + Proof-of-Fold (PoF) | -| **M2** | Push-gate, .rgs policy, DENY-audit, grants | -| **M3** | Message Plane (Git-native append-only stream + queries) | -| **M4** | Job Plane + Proof-of-Execution (PoE) | -| **M5** | Opaque Pointers + privacy-preserving projection | -| **M6** | Explorer off-ramp + Explorer-Root verification | -| **M6.5** | GraphQL State API (read-only) | -| **M7** | Proof-of-Experiment (PoX) + reproduce/verify CLI | -| **M8** | Demos & examples (Bisect, ADR-as-policy, PoX) | -| **M9** | Conformance suite + `gatos doctor` | -| **M10** | Security & hardening | -| **M11** | Community & Launch (docs, blog, outreach) | -| **M12** | Wesley integration & schema tooling (optional Phase 2) | - ---- - -## M0 — Repository Skeleton & Governance - -**Goal:** A clean project structure and decision process with no implementation yet. - -**Deliverables:** - -- Rust workspace layout: - - `crates/gatos-core` — deterministic engine & types - - `crates/gatosd` — daemon - - `crates/git-gatos` — CLI shim (`git gatos ...`) -- ADR/RFC process (`/spec/adr`) + templates. -- Canonical encoding decision: **DAG-CBOR + CID** for signed artifacts. -- Profiles config file: `profile.default`, `profile.research`. -- Docs scaffolding: - - `docs/SPEC.md` - - `docs/TECH-SPEC.md` - - `docs/research-profile.md` - - `docs/opaque-pointers.md` - - `docs/exporter.md` - - `docs/proof-of-experiment.md` -- CI: format, lint, build, basic tests. - -**Non-goal:** Any networking, multi-peer sync, or job scheduling. M0 is wiring the skeleton. - ---- - -## M1 — EchoLua Fold Engine & Proof-of-Fold (PoF) - -**Goal:** Deterministic folds from events → state, with verifiable proofs. - -**Deliverables:** - -- `gatos-core`: - - EchoLua interpreter (deterministic subset). - - `dpairs()` / sorted iteration, forbidden patterns, numeric model. - - Fold runner: `fold(state, event) -> new_state`. -- EventEnvelope: - - DAG-CBOR encoding. - - Typed event structure. -- StateRoot computation: - - canonical serialization of shape → hash. -- PoF envelope: - - Proof metadata + signature over `(history_root, policy_root, state_root)`. -- Daemon: - - Run fold over `refs/gatos/journal/*`. - - Commit checkpoints to `refs/gatos/state/`. -- CLI: - - `git gatos state show` - - `git gatos fold verify ` - -**Done when:** - -- Same journal + same policy → identical `state_root` on two machines. -- PoF verification succeeds across platforms. - ---- - -## M2 — Push-Gate & Policy Plane (.rgs + rgc) - -**Goal:** Governance at the boundary of history; policies as executable law. - -**Deliverables:** - -- Push-gate (Stargate): - - FF-only enforcement for `refs/gatos/policies/**`, `refs/gatos/state/**`, `refs/gatos/audit/**`. - - PoF-required checks on state refs. -- Policy system: - - `.rgs` authoring DSL (Rego/Datalog-inspired). - - `.rgs -> .rgc` compiler (structured IR/bytecode). - - Policy VM built on EchoLua runtime (or parallel deterministic VM). - - DENY-audit: policy rejections logged to `refs/gatos/audit/policy/deny/`. -- Governance: - - Proposals → approvals → grants mapped to signed events. - - Grants bound to `policy_root`. -- Local enforcement: - - Ship `gatos watch` daemon enforcing read-only locks from `.gatos/policy.yaml` until grants land. - - Managed Git hooks (`pre-commit`, `pre-push`, `post-merge`) installed via `gatos install-hooks` and logged under audit refs. - - Lock UX: `gatos lock acquire/release` wired to ADR-0003 so artists get Perforce-style flows. - -**Done when:** - -- Rewriting policy history via rebase is impossible. -- Violating commits produce DENY entries with links back to the responsible ADR/policy. -- Policy rules can enforce e.g. “no API changes without 2-of-3 quorum”. -- Locked assets stay read-only locally until a Grant is available and hooks reject bypass attempts. - ---- - -## M3 — Message Plane (ADR-0005) - -**Goal:** Land the Git-native Message Plane so integrations can consume ordered events without parsing the entire ledger. - -**Deliverables:** - -- Refs & checkpoints: - - `refs/gatos/messages//head` per-topic parent chains. - - `refs/gatos/consumers//` storing last processed `ulid` (+ optional commit) for each consumer group. -- Event envelope: - - Canonical JSON payload with `ulid`, `ns`, `type`, `payload`, `refs`, and `content_id` (BLAKE3 of the canonical envelope). - - Enforce `Event-Id` and `Content-Id` headers in Message Plane commit messages. -- APIs & tooling: - - `gatos-message-plane messages.read(topic, since_ulid, limit)` returning canonical envelopes + commit ids, oldest → newest. - - Consumer checkpoint helpers (list, advance, reset) plus tests for ULID monotonicity. -- Integration: - - Automatically emit Message Plane events for ledger folds and governance transitions (e.g., `governance` topic). - - Optional bridge mirroring Message Plane topics to external brokers (Kafka/NATS) without breaking Git-native ownership. - -**Done when:** - -- Consumers can resume from checkpoints and replay Message Plane topics deterministically on fresh clones. -- Governance transitions and ledger mirrors emit Message Plane events discoverable via `messages.read`. - ---- - -## M4 — Job Plane & Proof-of-Execution (PoE) - -**Goal:** Off-repo compute with verifiable provenance. - -**Deliverables:** - -- Job claims: - - Exclusive CAS lock ref `refs/gatos/jobs//claim`. -- Worker: - - Subscribe to the Message Plane `jobs` topic (`messages.read` helper). - - Claim jobs. - - Run configured program/container. - - Commit results. -- PoE envelope: - - `inputs_root`, `program_id` (container/WASM/Nix hash), `outputs_root`, status, signature. -- Audit: - - PoE recorded under `refs/gatos/audit/jobs/`. - -**Done when:** - -- Race between multiple workers → exactly one claim wins. -- PoE verification reproducibly ties inputs, program, and outputs together. - ---- - -## M5 — Opaque Pointers & Privacy-Preserving Projection - -**Goal:** Publicly verifiable state with private data. - -**Deliverables:** - -- Public pointer schema: - - Canonical JSON envelope with `kind: "opaque_pointer"`, `algo`, `digest`, and optional bucketed `size` (e.g., 1k/4k/16k/64k). - - `location` URI for retrieval (e.g., `gatos-node://`, `https://`, `s3://`, `ipfs://`). - - `capability` URI describing how to authorize/decrypt (e.g., `gatos-key://`, `kms://`, `age://`). - - `digest` is the BLAKE3 hash of the raw plaintext blob; no ciphertext hash is tracked in Git. -- Resolver service: - - Auth (Bearer JWT; optional HTTP signatures/mTLS). - - Returns bytes + `Digest` headers. - - Logs fetches to audit refs. -- Projection: - - Folds never decrypt sensitive data. - - Public “shape” contains pointers instead of raw values. - - Projection is deterministic across platforms. - -**Done when:** - -- Public state cannot leak PII or sensitive details via pointer metadata. -- Pointer resolution is policy-controlled and auditable. - ---- - -## M6 — Explorer Off-Ramp & Explorer-Root - -**Goal:** Heavy analytics off-chain but still provable. - -**Deliverables:** - -- Export: - - CLI: `git gatos export parquet|sqlite --state `. - - Writes Parquet/SQLite plus metadata. -- Explorer-Root: - - Checksum tying export back to `(ledger_head, policy_root, state_root, extractor_version)`. -- Verification: - - CLI: `git gatos export verify ` checks Explorer-Root. - -**Done when:** - -- Exports verify on clean machines. -- Tampering with an export causes verification to fail. - ---- - -## M6.5 — GraphQL State API - -**Goal:** Provide a typed, cache-friendly read surface for state snapshots. - -**Deliverables:** - -- API service (crate or module) exposing `POST /api/v1/graphql` with the schema defined in `api/graphql/schema.graphql`. -- SDL publishing endpoint + CI check to keep schema + resolvers in sync. -- Resolver contract honoring `stateRef` / `refPath`, Relay pagination (`first/last`, opaque cursors, max 500), opaque pointer nodes, and deterministic ordering. -- Policy + privacy integration mirroring ADR-0003/0004 (return `POLICY_DENIED` errors; never auto-fetch private blobs). -- Rate-limiting (600 req / 60s default) and caching semantics (`shapeRoot`, `stateRefResolved`, `Cache-Control`/`ETag`). - -**Done when:** - -- Clients can issue GraphQL queries against historical or live state and receive deterministic results tied to a specific `stateRef`. -- SDL + schema live in-repo and the service passes conformance tests covering pagination, pointer handling, and error codes. -- Docs (README, SPEC, Guide) describe how to target states, interpret errors, and respect policy filters. - ---- - -## M7 — Proof-of-Experiment (PoX) & Reproduce/Verify - -**Goal:** Make experiments machine-checkable. - -**Deliverables:** - -- PoX envelope: - - Ties together `inputs_root`, `program_id`, `policy_root`, `policy_code_root`, `outputs_root`, PoF, and PoE. - - Stored under `refs/gatos/audit/proofs/experiments/`. -- CLI: - - `git gatos verify ` - - `git gatos reproduce ` -- Reproduction pipeline: - - Fetch Opaque Pointers. - - Re-run analysis in attested environment. - - Compare outputs + PoF. - -**Done when:** - -- Reproduce yields bit-for-bit identical results in a “clean-room” setting. -- If not, verify explains exactly where/why it diverged. - ---- - -## M8 — Demos & Examples - -**Goal:** Show, don’t tell. - -**Deliverables:** - -- `examples/adr-as-policy/` — ADR → policy → DENY/ALLOW behavior. -- `examples/bisect-for-state/` — state regression + git gatos bisect. -- `examples/pox-research/` — synthetic experiment → PoX → reproduce. -- GIFs of: - - ADR-as-policy, - - Bisect-for-state, - - PoX verification. - -**Done when:** - -- Each example runs with a single scripted command. -- GIFs are README-ready. - ---- - -## M9 — Conformance Suite & `gatos doctor` - -**Goal:** Turn correctness into automation. - -**Deliverables:** - -- Conformance tests: - - QoS (at-least-once + dedupe). - - Exclusive job claim. - - Pointer privacy rules. - - Projection determinism. - - PoF enforcement on state pushes. - - Explorer-Root export verification. -- `git gatos doctor`: - - Checks for misconfigurations in: - - profiles, - - Message Plane head continuity & retention, - - consumer checkpoint drift, - - anchors, - - PoF presence, - - export consistency. - -**Done when:** - -- CI runs conformance suite on every change. -- `doctor` reliably flags misconfigurations. - ---- - -## M10 — Security & Hardening - -**Goal:** Move from “works” to “safe to trust.” - -**Deliverables:** - -- Threat models for all planes. -- Fuzzing harnesses for: - - DAG-CBOR parsing, - - EchoLua interpreter, - - .rgs compiler, - - Message Plane consumer dedupe/resume logic, - - pointer resolver. -- External cryptography review: - - PoF and PoE signing, - - pointer encryption & AEAD usage, - - hash choices and domain separation. -- Replay/forgery resilience testing. -- Hardened Research Profile defaults. - ---- - -## M11 — Community & Launch - -**Goal:** Turn GATOS into a living project. - -**Deliverables:** - -- Documentation site (mdBook or similar). -- “For Scientists” documentation section. -- Launch blog post: “GATOS: Git As The Operating Surface — A Reproducibility OS”. -- Early adopter outreach: - - 2–3 design partner labs. -- Conference submissions, talks, and demos. - ---- - -## M12 — Wesley Integration & Schema Tooling (Phase 2) - -**Goal:** Make GATOS pleasant to program against. - -**Deliverables:** - -- `wesley build --target gatos`: - - generates fold specs, - - schemas, - - RLS/policy scaffolding. -- Examples: - - schema-first experiment spec flowing into GATOS. - ---- - -*End of ROADMAP.* - ---- - -# 📌 GITHUB ISSUE LIST (Milestones Board) - -Paste these titles to create issues grouped by milestone. - -M0 – Repo & Scaffolding - -- Create Rust workspace structure (gatos-core, gatosd, git-gatos) -- Add ADR/RFC process and templates -- Choose canonical encoding: DAG-CBOR + CID -- Add initial docs: research-profile.md, proof-of-experiment.md -- Add CLI shim + smoke-test (git gatos --help) -- Add CI pipelines (fmt, lint, build) -- Add SECURITY, CONTRIBUTING, CODEOWNERS - -M1 – Fold Engine + PoF - -- Implement EventEnvelope (DAG-CBOR) -- Implement pure fold engine (gatos-core) -- Integrate Lua or WASM reducer -- Add state checkpoint format + PoF -- Implement state show -- Implement fold verify -- Add cross-platform determinism tests - -M2 – Push-Gate & Policy - -- Implement pre-receive FF-only enforcement -- Implement PoF-required validation for state refs -- Implement minimal policy VM (Lua/WASM) -- Add DENY audit logging -- Implement proposal/approval/grant flow -- Add policy verify - -M3 – Message Plane - -- Implement `refs/gatos/messages//head` parent chains -- Implement `refs/gatos/consumers//` checkpoints (ULID + commit) -- Define canonical Message Plane envelope + commit annotations (Event-Id/Content-Id) -- Add `gatos-message-plane messages.read` RPC + CLI helper -- Add consumer checkpoint management commands/tests (ULID monotonicity) -- Auto-emit ledger & governance events into appropriate Message Plane topics -- Add optional bridge to mirror Message Plane topics to external brokers - -M4 – Job Plane + PoE - -- Implement exclusive CAS lock ref for job claim -- Implement worker subscribe → claim → run → result -- Add PoE envelope -- Add CLI verbs for job lifecycle -- Add PoE verification CLI - -M5 – Opaque Pointers + Privacy - -- Implement pointer envelope (kind/algo/digest/size/location/capability) -- Implement private overlay store wired into capability URIs -- Implement pointer resolver (JWT + Digest headers) -- Integrate privacy projection into fold pipeline -- Add projection determinism tests - -M6 – Explorer Off-Ramp - -- Implement Parquet/SQLite export -- Implement Explorer-Root hash -- Add CLI: export, export verify -- Add export mismatch tests - -M7 – PoX + Reproduce - -- Add PoX envelope + CID storage -- Implement gatos verify -- Implement gatos reproduce -- Add clean-room reproduction tests - -M8 – Demos - -- Create ADR-as-policy demo -- Create Bisect-for-State demo -- Create PoX demo -- Record GIFs and embed them in README - -M9 – Conformance + Doctor - -- Add conformance suite (QoS, exclusivity, projection) -- Add pointer privacy test suite -- Add Explorer-Root verification tests -- Add PoF enforcement tests -- Implement gatos doctor -- Ensure CI runs all conformance tests diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md index 69384002..af479344 100644 --- a/docs/ROADMAP.md +++ b/docs/ROADMAP.md @@ -142,6 +142,8 @@ These are explicit non-goals until after the core truth machine is working: **1–2 weeks** +**Status:** ✅ Completed (2025-11-08). Repo scaffold, ADR log, and the SPEC/guide skeletons already live in this repository. + ### Goals @@ -206,6 +208,8 @@ These are explicit non-goals until after the core truth machine is working: **3–5 weeks** +**Status:** 🟡 In Progress — ADR-0014 (Draft) plus SPEC §5 define PoF; EchoLua runtime + CLI wiring are still underway. + ### Goals @@ -265,6 +269,8 @@ These are explicit non-goals until after the core truth machine is working: **3–4 weeks** +**Status:** ✅ Accepted — ADR-0003 (Policy Plane) and ADR-0006 (Local Enforcement) are merged; current work focuses on implementation hardening and tooling. + ### Goals @@ -323,6 +329,8 @@ These are explicit non-goals until after the core truth machine is working: **3–5 weeks** +**Status:** 🟠 Proposed — ADR-0005 documents the Message Plane but remains Proposed; crates, daemons, and tests still need to land. + ### Goals @@ -382,6 +390,8 @@ These are explicit non-goals until after the core truth machine is working: **4–6 weeks** +**Status:** ✅ Accepted — ADR-0002 locks the Job Plane + PoE design; integration into the daemon/CLI is queued behind Message Plane delivery. + ### Goals @@ -432,6 +442,8 @@ These are explicit non-goals until after the core truth machine is working: **4–6 weeks** +**Status:** ✅ Accepted — ADR-0004 establishes the hybrid privacy model; `docs/opaque-pointers.md` and SPEC sections are ready for implementation polish. + ### Goals @@ -484,6 +496,8 @@ These are explicit non-goals until after the core truth machine is working: **3–4 weeks** +**Status:** 🟡 In Progress — ADR-0011 (Draft) and `docs/exporter.md` define Explorer-Root + export flows; CLI + verifier code remains to be written. + ### Goals @@ -525,6 +539,8 @@ These are explicit non-goals until after the core truth machine is working: **3–4 weeks** +**Status:** ✅ Accepted — ADR-0007 plus API docs describe the GraphQL gateway; next steps are wiring it into `gatosd` and hardening pagination/rate limits. + ### Goals - Typed, single-roundtrip read access to any committed state snapshot. @@ -558,6 +574,8 @@ These are explicit non-goals until after the core truth machine is working: **4–6 weeks** +**Status:** 🔜 Planned — PoX tooling is still conceptual (see `docs/proofs/proof-of-experiment.md`); no ADR has been authored yet. + ### Goals @@ -611,6 +629,8 @@ These are explicit non-goals until after the core truth machine is working: **1–2 weeks** +**Status:** 🔜 Planned — Demo content depends on Message Plane, PoE, and PoX shipping; nothing beyond outlines exists yet. + ### Deliverables @@ -649,6 +669,8 @@ These are explicit non-goals until after the core truth machine is working: **3–4 weeks** +**Status:** 🔜 Planned — Conformance tooling hinges on exporter/policy maturity; no ADR currently covers `gatos doctor`. + ### Goals diff --git a/docs/decisions/ADR-0001/DECISION.md b/docs/decisions/ADR-0001/DECISION.md index dbea52c5..6aeccb35 100644 --- a/docs/decisions/ADR-0001/DECISION.md +++ b/docs/decisions/ADR-0001/DECISION.md @@ -1,6 +1,14 @@ --- Status: Accepted Date: 2025-11-08 +ADR: ADR-0001 +Authors: [flyingrobots] +Requires: [] +Related: [] +Tags: [Ledger, ObjectStore, Architecture] +Schemas: [] +Supersedes: [] +Superseded-By: [] --- # ADR-0001: Split gatos-ledger into `no_std` Core and `std` Backends diff --git a/docs/decisions/ADR-0004/DECISION.md b/docs/decisions/ADR-0004/DECISION.md index 0059f5e5..43afd6c7 100644 --- a/docs/decisions/ADR-0004/DECISION.md +++ b/docs/decisions/ADR-0004/DECISION.md @@ -12,7 +12,7 @@ Supersedes: [] Superseded-By: [] --- -# ADR‑0004: Hybrid Privacy Model (Public Projection + Private Overlay) +# ADR-0004: Hybrid Privacy Model (Public Projection + Private Overlay) ## Scope @@ -22,7 +22,7 @@ Define a **hybrid privacy** model in which the State Plane produces: ## Rationale -The original model envisioned using a local, out‑of‑repo directory for private data and committing only redacted or pointerized state to Git. This ADR makes that pattern **normative** and **deterministic**: +The original model envisioned using a local, out-of-repo directory for private data and committing only redacted or pointerized state to Git. This ADR makes that pattern **normative** and **deterministic**: - Public state remains globally verifiable. - Sensitive details live in a private overlay but are **addressable** and **auditable** via content hashes. @@ -52,22 +52,22 @@ Several alternative approaches to managing privacy and sensitive data were consi ## Decision -### 1. Actor‑Anchored Private Namespace (normative) +### 1. Actor-Anchored Private Namespace (normative) -Private overlays are rooted in an **actor identity**, not an ad‑hoc “session”. +Private overlays are rooted in an **actor identity**, not an ad-hoc “session”. - **Actor ID:** `ed25519:` that resolves in the trust graph. -- **On‑disk refs (private):** +- **On-disk refs (private):** ``` refs/gatos/private/// refs/gatos/private//sessions/// # OPTIONAL ephemeral overlays ``` -- **On‑disk refs (public):** +- **On-disk refs (public):** ``` refs/gatos/state/public// ``` -> The prior “`` at namespace root” concept is deprecated. If you need per‑process isolation, use `sessions/` under the owning ``. +> The prior “`` at namespace root” concept is deprecated. If you need per-process isolation, use `sessions/` under the owning ``. ### 2. Opaque Pointers (normative) @@ -88,13 +88,13 @@ Where private data is elided from PublicState, emit a canonical JSON **opaque po - `gatos-node://ed25519:` — resolve endpoint(s) via trust graph. - `file:///...` — local file path (dev/test only). - `https://...` — HTTPS object store. - - `s3://bucket/key` — S3‑style store. + - `s3://bucket/key` — S3-style store. - `ipfs://` — IPFS address. - `capability` MUST be a URI. Reserved schemes include: - `gatos-key://v1/aes-256-gcm/` - `kms://aws//keys/` - `age://` / `sops://` -- Canonical JSON (UTF‑8, sorted keys, no insignificant whitespace). The digest of the pointer envelope itself (its **content_id**) is `blake3(canonical_bytes)`. +- Canonical JSON (UTF-8, sorted keys, no insignificant whitespace). The digest of the pointer envelope itself (its **content_id**) is `blake3(canonical_bytes)`. **Schema:** `schemas/v1/privacy/opaque_pointer.schema.json` (see repo changes below). @@ -125,7 +125,7 @@ A resolver MUST: - Use scheme to select decryption/authorization mechanism. - Fetch and decrypt the content. Verify that the `blake3` hash of the resulting plaintext bytes matches the `digest` from the pointer. If it does not match, the resolution **MUST FAIL**. -> This ADR standardizes **envelopes and verification**. The `.well-known` fetch API shape is reserved for a future ADR; implementations may use compatible private APIs short‑term. +> This ADR standardizes **envelopes and verification**. The `.well-known` fetch API shape is reserved for a future ADR; implementations may use compatible private APIs short-term. ### 5. Policy Hooks (normative) @@ -155,13 +155,13 @@ privacy: ### 7. Security Considerations - Never embed plaintext secrets in PublicState. Pointer envelopes do **not** leak bytes. -- If `location` is remote and `capability` is non‑null, deny fetch if capability can’t be resolved or verified. +- If `location` is remote and `capability` is non-null, deny fetch if capability can’t be resolved or verified. - The trust graph entry for a node SHOULD declare endpoint URIs and allowed capability schemes. - The `capability` mechanism implies a dependency on a robust and secure key management system. This ADR does not specify the architecture of such a system, but implementations MUST ensure that key access is strictly controlled and auditable. ### 8. Compatibility -- Existing per‑process “session” overlays can migrate to `refs/gatos/private//sessions//...` with no behavioral change to the projection. +- Existing per-process “session” overlays can migrate to `refs/gatos/private//sessions//...` with no behavioral change to the projection. ### Diagrams diff --git a/docs/decisions/ADR-0007/DECISION.md b/docs/decisions/ADR-0007/DECISION.md index 645ff615..3fbd85fa 100644 --- a/docs/decisions/ADR-0007/DECISION.md +++ b/docs/decisions/ADR-0007/DECISION.md @@ -8,6 +8,8 @@ Related: [ADR-0008, ADR-0009] Tags: [API, GraphQL, State] Schemas: - schemas/v1/api/graphql_state_mapping.schema.json +Supersedes: [] +Superseded-By: [] --- # ADR-0007: GraphQL State API (Read-Only) @@ -76,4 +78,4 @@ sequenceDiagram - Server complexity moves into resolvers and policy filters. ## Open Questions -- Field deprecation cadence. +- None (cadence + error surfacing policy defined above). diff --git a/docs/decisions/ADR-0008/DECISION.md b/docs/decisions/ADR-0008/DECISION.md index 7337f503..4f24046f 100644 --- a/docs/decisions/ADR-0008/DECISION.md +++ b/docs/decisions/ADR-0008/DECISION.md @@ -9,6 +9,8 @@ Tags: [API, REST, Commands, Webhooks] Schemas: - schemas/v1/api/command_envelope.schema.json - schemas/v1/api/webhook_delivery.schema.json +Supersedes: [] +Superseded-By: [] --- # ADR-0008: REST Commands & Webhooks diff --git a/docs/decisions/ADR-0009/DECISION.md b/docs/decisions/ADR-0009/DECISION.md index 4d67e3b3..53d7c77c 100644 --- a/docs/decisions/ADR-0009/DECISION.md +++ b/docs/decisions/ADR-0009/DECISION.md @@ -8,6 +8,8 @@ Related: [ADR-0007, ADR-0008] Tags: [API, WebSocket, Streaming, Refs] Schemas: - schemas/v1/api/stream_frame.schema.json +Supersedes: [] +Superseded-By: [] --- # ADR-0009: Real-Time Streams & Ref Subscriptions diff --git a/docs/decisions/ADR-0010/DECISION.md b/docs/decisions/ADR-0010/DECISION.md index c5f99c4d..3912a499 100644 --- a/docs/decisions/ADR-0010/DECISION.md +++ b/docs/decisions/ADR-0010/DECISION.md @@ -6,6 +6,9 @@ Authors: [flyingrobots] Requires: [ADR-0002, ADR-0003, ADR-0007, ADR-0008] Related: [] Tags: [Integration, GitHub App, CI/CD, Governance] +Schemas: [] +Supersedes: [] +Superseded-By: [] --- # ADR-0010: First-Class GitHub App Integration diff --git a/docs/decisions/ADR-0011/DECISION.md b/docs/decisions/ADR-0011/DECISION.md index 5cb7bcec..9a2e1eb9 100644 --- a/docs/decisions/ADR-0011/DECISION.md +++ b/docs/decisions/ADR-0011/DECISION.md @@ -6,6 +6,10 @@ Authors: [flyingrobots] Requires: [ADR-0001, ADR-0005] Related: [] Tags: [Analytics, Export, SQL, Parquet] +Schemas: + - schemas/v1/export/export_manifest.schema.json +Supersedes: [] +Superseded-By: [] --- # ADR-0011: GATOS-to-SQL/Parquet Exporter @@ -85,9 +89,18 @@ flowchart LR G --> H[Shape Root] ``` +## Query Pushdown (v1 Behaviour) +1. **Range-Scoped Runs** + - `--since ` + optional `--until ` bound the export window. The exporter walks topo order between the commits and skips events outside the window; this is the only supported history filter to keep proofs simple. +2. **Table/Column Filters** + - `--tables commits,events,...` narrows participating tables (already part of the CLI). Additionally, v1 adds `--columns :col1,col2` so operators can omit high-churn blobs (e.g., large JSON payloads) and shrink exports without losing keys needed for joins. +3. **Row Predicates** + - Deterministic WHERE-like filters expressed as canonical JSON DSL: `--where commits='{"ns":"gatos/jobs"}'`. Predicates only support conjunctions of equality/range comparisons and are hashed into the export manifest so downstream consumers can validate which subset they received. Unsupported operators cause the exporter to fail fast with `UNSUPPORTED_FILTER` and leave an audit record. +4. **Auditability** + - Every pushdown option is materialized into `export_info.filters` inside the manifest. Consumers recompute the digest of `(state_ref, commit_range, tables, columns, where)` to verify they read from the same logical slice. This closes the open question—yes, we support limited pushdown scoped to deterministic filters, and we document every filter for reproducibility. + ## Consequences - Easy dashboards, BI, notebooks. - Must be careful not to leak private overlay data (only pointer metadata exported). - ## Open Questions -- Do we support query pushdown (pre-filtered exports) in v1? +- None (pushdown semantics defined above). diff --git a/docs/decisions/ADR-0012/DECISION.md b/docs/decisions/ADR-0012/DECISION.md index 9b6b0615..e1f27fc0 100644 --- a/docs/decisions/ADR-0012/DECISION.md +++ b/docs/decisions/ADR-0012/DECISION.md @@ -8,6 +8,8 @@ Related: [] Tags: [Federation, Mounts, Cross-Repo] Schemas: - schemas/v1/federation/mounts.schema.json +Supersedes: [] +Superseded-By: [] --- # ADR-0012: Federated Repositories & Mounts @@ -76,4 +78,4 @@ graph TD - Requires remote availability and verification logic. ## Open Questions -- Federation gossip: do we allow automatic mount discovery, or keep `.gatos/federation.yaml` manual only? +- Future ADR: Define the signed gossip/advertisement protocol (trust anchor format, TTL, quota) required before we can safely enable any automatic mount discovery. diff --git a/docs/decisions/ADR-0013/DECISION.md b/docs/decisions/ADR-0013/DECISION.md index 13b270e1..b9e6725e 100644 --- a/docs/decisions/ADR-0013/DECISION.md +++ b/docs/decisions/ADR-0013/DECISION.md @@ -6,6 +6,9 @@ Authors: [flyingrobots] Requires: [ADR-0001, ADR-0005] Related: [] Tags: [Performance, State Engine, Caching] +Schemas: [] +Supersedes: [] +Superseded-By: [] --- # ADR-0013: Partial & Lazy Folds diff --git a/docs/decisions/ADR-0014/DECISION.md b/docs/decisions/ADR-0014/DECISION.md index 4a7ff658..b3ed404a 100644 --- a/docs/decisions/ADR-0014/DECISION.md +++ b/docs/decisions/ADR-0014/DECISION.md @@ -8,6 +8,8 @@ Related: [ADR-0002] Tags: [Attestation, Proofs, State Engine] Schemas: - schemas/v1/state/proof_of_fold_envelope.schema.json +Supersedes: [] +Superseded-By: [] --- # ADR-0014: Proof-Of-Fold (Attestation of State) diff --git a/docs/exporter.md b/docs/exporter.md index 0421a06c..e8a6b8bd 100644 --- a/docs/exporter.md +++ b/docs/exporter.md @@ -35,7 +35,7 @@ Exports let you analyze GATOS state outside the repo (e.g., Parquet/SQLite) whil -See \[\[\[\[\[[SPEC §15.1](/SPEC#15.1)]\(/SPEC#15.1)]\(/SPEC#15.1)]\(/SPEC#15.1)]\(/SPEC#15.1)]\(/SPEC#15.1). Exporters **MUST** compute `Explorer-Root`. +See [SPEC §15.1](/SPEC#15.1). Exporters **MUST** compute `Explorer-Root`. Derived state exports (from folds) include `state_root`: From 2e0469bf0e9be40348b6fac1578f9f2ce050346a Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Tue, 18 Nov 2025 08:23:11 -0800 Subject: [PATCH 22/25] docs: add structured task backlog --- TASKLIST.md | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 65 insertions(+) create mode 100644 TASKLIST.md diff --git a/TASKLIST.md b/TASKLIST.md new file mode 100644 index 00000000..eb2d8411 --- /dev/null +++ b/TASKLIST.md @@ -0,0 +1,65 @@ +# Task Backlog + +- [ ] **ADR Coverage: Sessions & PoX** + - *Summary*: Author ADRs that define the Sessions feature set and the Proof-of-Experiment workflow so roadmap items have specs. + - *Problem Statement*: `docs/TASKS.md` references Sessions without any ADR, and PoX (M7) lacks a normative document; engineers cannot implement without an agreed contract. + - *Acceptance Criteria*: (1) Two ADRs merged (Sessions, PoX) with IDs in the canonical log; (2) Each ADR documents scope, decision, consequences, and diagrams; (3) Roadmap/TASKS link to the new ADR IDs. + - *Test Plan*: Lint ADR markdown (mdformat/dprint), ensure `docs/decisions/README.md` table updates; run `rg` to confirm no TODO placeholders remain. + - *LLM Prompt*: “You are drafting an Architecture Decision Record. Produce an ADR that specifies the GATOS Sessions feature (start/undo/fork/merge with lattice/DPO joins) aligned with existing policy/state planes, including decision, diagrams, and consequences.” + +- [ ] **Message Plane Implementation** + - *Summary*: Build the segmented Git-backed pub/sub system defined in ADR-0005 so M3 can move from Proposed to Accepted. + - *Problem Statement*: ADR-0005 is still Proposed; without actual crates and tests, Job Plane and downstream integrations are blocked. + - *Acceptance Criteria*: (1) New crate/service exposes publish/subscribe APIs with `refs/gatos/messages/**` segment rotation; (2) Consumer checkpoints persist under `refs/gatos/consumers/**`; (3) Integration tests cover at-least-once delivery, rotation, and pruning; (4) ADR-0005 status flipped to Accepted. + - *Test Plan*: Rust unit tests for envelope serialization; end-to-end test spinning up two publishers/consumers verifying dedupe; regression test ensuring pruning respects checkpoints. + - *LLM Prompt*: “Implement a Git-backed message bus per ADR-0005: create a Rust module that writes message topics under `refs/gatos/messages///` with rotation at 100k messages or 192MB and checkpoints under `refs/gatos/consumers/...`. Include tests for publish, subscribe, rotation, and pruning.” + +- [ ] **Job Plane + PoE Integration** + - *Summary*: Wire ADR-0002’s Job Plane into `gatosd`, enabling CAS claims, worker loops, and Proof-of-Execution commits. + - *Problem Statement*: The Job Plane is specified but not implemented; no CLI/daemon support exists for enqueuing, claiming, or attesting jobs. + - *Acceptance Criteria*: (1) CLI commands `git gatos jobs enqueue|ls|watch` exist; (2) Daemon exposes job topics/messages; (3) Workers create `refs/gatos/jobs//claims/` and result commits with PoE envelopes; (4) Tests prove only one worker can claim a job. + - *Test Plan*: Integration test with two worker processes racing on the same job; verify PoE signature verification code path; ensure job topics emit events onto Message Plane. + - *LLM Prompt*: “Extend the GATOS daemon to support ADR-0002: implement job enqueue, claim (CAS ref updates), and result commits with Proof-of-Execution envelopes plus CLI commands to drive the flow.” + +- [ ] **Exporter CLI & Explorer-Root Verifier** + - *Summary*: Finish M6 by building the `gatos export` command and Explorer-Root verification flow spelled out in ADR-0011/docs/exporter.md. + - *Problem Statement*: Specs describe Parquet/SQLite exports and Explorer-Root, but no code produces or verifies artifacts, leaving analytics teams blocked. + - *Acceptance Criteria*: (1) CLI supports `gatos export --format {sqlite,parquet}` plus `--since/--until`, column filters, and predicates; (2) Outputs include manifest + explorer-root digest; (3) `gatos export verify ` recomputes digests and fails on mismatches; (4) Tests cover pushdown filters and manifest hashing. + - *Test Plan*: Golden tests exporting a fixture repo; fuzz tests on filter DSL; verification test that tampering data triggers failure. + - *LLM Prompt*: “Implement the `gatos export` CLI per ADR-0011: emit SQLite/Parquet datasets with explorer-root metadata and add a verification subcommand that recomputes the digest.” + +- [ ] **GraphQL Gateway Service** + - *Summary*: Deliver M6.5 by creating the GraphQL API service described in ADR-0007 (SDL publish, Relay pagination, policy filtering). + - *Problem Statement*: API contracts exist but no service handles GraphQL queries; consumers can’t query state snapshots without custom tooling. + - *Acceptance Criteria*: (1) Gateway binary serves `POST /api/v1/graphql` + `GET /api/v1/graphql/schema`; (2) Resolver layer enforces `stateRef/refPath`, pagination caps, error codes; (3) Caching headers and rate limits match ADR-0007; (4) CI runs schema/regression tests. + - *Test Plan*: GraphQL integration tests for policy-denied fields, pagination bounds, shapeRoot caching; load test verifying rate-limit headers. + - *LLM Prompt*: “Build a Rust GraphQL gateway matching ADR-0007: expose POST /api/v1/graphql, enforce stateRef/refPath targeting, Relay pagination, OpaquePointerNode handling, caching headers, and rate limiting.” + +- [ ] **Federation Stream Proxy ADR & Implementation** + - *Summary*: Close ADR-0009’s open question by specifying and building the cross-node streaming proxy (fan-out next to `gatos mountd`). + - *Problem Statement*: Federation currently lacks a story for streaming refs/topics across mounts, limiting multi-node deployments. + - *Acceptance Criteria*: (1) New ADR details the stream proxy approach, credit windows, and auditing; (2) Implementation ships alongside `gatos mountd`, forwarding streams with deterministic seq IDs; (3) Tests cover replay, credit exhaustion, and failure telemetry. + - *Test Plan*: Simulate upstream/downstream nodes with network hiccups; ensure audit refs record forwarded frames and that dedupe holds across hops. + - *LLM Prompt*: “Author an ADR and implementation plan for a federation stream proxy that subscribes locally and replays frames downstream with deterministic sequence IDs and audit logging.” + +- [ ] **Operations & Observability Guide Chapter** + - *Summary*: Add a “Chapter 13: Operations & Observability” to the GATOS book covering SLOs, `/healthz`, watcher logs, and troubleshooting. + - *Problem Statement*: Operators lack comprehensive guidance even though roadmap milestones (M8/M9) assume operational maturity. + - *Acceptance Criteria*: (1) New chapter published with sections on daemons, health checks, metrics, and playbooks; (2) Cross-links to ADR-0006, ADR-0009, and exporter specs; (3) README/map-of-contents updated. + - *Test Plan*: Run markdown lint; verify TOC autogen; spot-check links. + - *LLM Prompt*: “Write Chapter 13 of the GATOS guide detailing Ops & Observability: cover gatosd health endpoints, metrics, watcher/audit logs, SLOs, and troubleshooting playbooks referencing relevant ADRs.” + +- [ ] **Demo Suite (ADR-as-policy, Bisect State, PoX)** + - *Summary*: Produce runnable demos + media assets once core planes ship, to fulfill M8. + - *Problem Statement*: README promises demos, but none exist; marketing/onboarding lack concrete examples. + - *Acceptance Criteria*: (1) Scripts or Make targets run each demo end-to-end; (2) Capture GIFs/screens for README; (3) Document steps in `docs/demos/*.md`. + - *Test Plan*: CI job that runs demo scripts against a fixture repo; manual QA of media assets. + - *LLM Prompt*: “Create demo scripts that showcase ADR-as-policy enforcement, state bisection, and PoX reproduction, including documentation and media assets for the README.” + +- [ ] **`gatos doctor` Conformance Tooling** + - *Summary*: Implement the conformance suite envisioned in M9 to automatically vet repos for policy/export/proof invariants. + - *Problem Statement*: Without `gatos doctor`, operators cannot quickly validate installations, undermining trust. + - *Acceptance Criteria*: (1) CLI command `gatos doctor` runs a battery of checks (policy FF-only refs, exporter manifests, proof coverage); (2) Reports actionable errors; (3) Tests cover healthy vs failing repos. + - *Test Plan*: Integration tests against synthetic repos with intentional corruption; verify output codes and messages. + - *LLM Prompt*: “Implement a `gatos doctor` CLI that validates repo invariants (policy FF-only branches, PoF/PoE coverage, exporter manifests) and reports actionable diagnostics.” + From 41262a65938a9c459e0276ed561c6824a3f56202 Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Tue, 18 Nov 2025 08:33:21 -0800 Subject: [PATCH 23/25] docs: add ADRs for sessions and PoX --- TASKLIST.md | 7 +- docs/ROADMAP.md | 2 +- docs/decisions/ADR-0015/DECISION.md | 77 ++++++++++++++++++++ docs/decisions/ADR-0016/DECISION.md | 85 ++++++++++++++++++++++ docs/decisions/README.md | 2 + schemas/v1/proofs/pox_envelope.schema.json | 45 ++++++++++++ 6 files changed, 213 insertions(+), 5 deletions(-) create mode 100644 docs/decisions/ADR-0015/DECISION.md create mode 100644 docs/decisions/ADR-0016/DECISION.md create mode 100644 schemas/v1/proofs/pox_envelope.schema.json diff --git a/TASKLIST.md b/TASKLIST.md index eb2d8411..c08e05b1 100644 --- a/TASKLIST.md +++ b/TASKLIST.md @@ -1,10 +1,10 @@ # Task Backlog -- [ ] **ADR Coverage: Sessions & PoX** +- [x] **ADR Coverage: Sessions & PoX** - *Summary*: Author ADRs that define the Sessions feature set and the Proof-of-Experiment workflow so roadmap items have specs. - *Problem Statement*: `docs/TASKS.md` references Sessions without any ADR, and PoX (M7) lacks a normative document; engineers cannot implement without an agreed contract. - - *Acceptance Criteria*: (1) Two ADRs merged (Sessions, PoX) with IDs in the canonical log; (2) Each ADR documents scope, decision, consequences, and diagrams; (3) Roadmap/TASKS link to the new ADR IDs. - - *Test Plan*: Lint ADR markdown (mdformat/dprint), ensure `docs/decisions/README.md` table updates; run `rg` to confirm no TODO placeholders remain. + - *Acceptance Criteria*: ✅ ADR-0015 (Sessions) and ADR-0016 (PoX) exist with diagrams + consequences; roadmap/task references updated. + - *Test Plan*: ✅ Markdown lint + `rg` show new ADR ids in README/ROADMAP. - *LLM Prompt*: “You are drafting an Architecture Decision Record. Produce an ADR that specifies the GATOS Sessions feature (start/undo/fork/merge with lattice/DPO joins) aligned with existing policy/state planes, including decision, diagrams, and consequences.” - [ ] **Message Plane Implementation** @@ -62,4 +62,3 @@ - *Acceptance Criteria*: (1) CLI command `gatos doctor` runs a battery of checks (policy FF-only refs, exporter manifests, proof coverage); (2) Reports actionable errors; (3) Tests cover healthy vs failing repos. - *Test Plan*: Integration tests against synthetic repos with intentional corruption; verify output codes and messages. - *LLM Prompt*: “Implement a `gatos doctor` CLI that validates repo invariants (policy FF-only branches, PoF/PoE coverage, exporter manifests) and reports actionable diagnostics.” - diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md index af479344..f56b742e 100644 --- a/docs/ROADMAP.md +++ b/docs/ROADMAP.md @@ -574,7 +574,7 @@ These are explicit non-goals until after the core truth machine is working: **4–6 weeks** -**Status:** 🔜 Planned — PoX tooling is still conceptual (see `docs/proofs/proof-of-experiment.md`); no ADR has been authored yet. +**Status:** 🟡 In Progress — ADR-0016 (Draft) defines PoX envelopes; CLI + verifier/reproducer tooling remains to be built. ### Goals diff --git a/docs/decisions/ADR-0015/DECISION.md b/docs/decisions/ADR-0015/DECISION.md new file mode 100644 index 00000000..738d62a2 --- /dev/null +++ b/docs/decisions/ADR-0015/DECISION.md @@ -0,0 +1,77 @@ +--- +Status: Draft +Date: 2025-11-18 +ADR: ADR-0015 +Authors: [flyingrobots] +Requires: [ADR-0001, ADR-0003, ADR-0004] +Related: [ADR-0005, ADR-0013] +Tags: [Sessions, Branching, UX] +Schemas: [] +Supersedes: [] +Superseded-By: [] +--- + +# ADR-0015: Sessions (Ephemeral Working Branches) + +## Scope +Define the contract for **sessions** – ephemeral, actor-scoped working branches (`refs/gatos/sessions//`) that capture in-flight edits, enable deterministic undo/fork/merge flows, and integrate with governance and state folds without polluting canonical history. + +## Rationale +Event sourcing alone is awkward for exploratory change. Engineers and researchers need scratch space to iterate, roll back, or branch experiments before elevating them into policy-gated events. Git already has branches, but mixing ungoverned branches with governed refs risks bypassing policy and determinism. Sessions create a narrow, deterministic lane for local mutation while keeping governance and state integrity intact. + +## Decision +1. **Name & Layout** + - Session refs live under `refs/gatos/sessions//`, where `` resolves via the trust graph (ADR-0003) and `` is a monotonically increasing identifier per actor. + - Each ref points to a linear commit stack authored locally; session commits MAY contain working files, staged state snapshots, or experiment artifacts. +2. **Lifecycle Commands** + - `git gatos session start [--from ]` creates a new session ref at `ref` (defaults to `refs/heads/main`). The command records metadata in `.gatos/session/.json` with fields `{ base_ref, policy_root, fold_root }` pinned at creation time. + - `git gatos session checkpoint` commits current worktree changes into the session ref with trailers `Session-Id`, `Policy-Root`, and `Fold-Root`. Watcher hooks (ADR-0006) ensure locked files remain read-only unless an active grant exists. + - `git gatos session undo` performs a single-step revert inside the session ref, writing an `undo` trailer so tooling can distinguish deliberate rewinds from rebases. + - `git gatos session fork` clones the current session into a new ULID, copying metadata and parent pointers. Fork metadata lists `parent_session` for traceability. + - `git gatos session publish --to ` rebases the session commits onto the target governed namespace (e.g., `refs/gatos/journal/demo`). Publish uses the Policy Gate; denied publishes leave an audit event under `refs/gatos/audit/sessions//deny/`. +3. **State & Policy Coupling** + - Session metadata stores `policy_root` and `fold_root` captured at `start`. Tooling MUST warn when the live policy/fold root diverges; publishing requires revalidation or explicit `--accept-drift` that logs under `refs/gatos/audit/sessions//drift`. + - Folding inside a session uses the local Echo runtime (ADR-0013) but records `Session-Shape-Root` commits under `refs/gatos/sessions///state//` so reproducibility checks can compare against canonical folds later. +4. **Merge Semantics** + - Publishing a session creates a governance event referencing `Session-Id` so reviewers can trace its provenance. + - Concurrent sessions touching overlapping footprints (ADR-0013) must merge via deterministic lattices or DPO joins. Conflicts the engine cannot auto-resolve yield `session.conflict` events referencing the conflicting paths; policy rules decide whether to abort or escalate. +5. **Garbage Collection & Retention** + - Sessions auto-expire after 30 days of inactivity by default. Expiry writes a `session.expired` audit entry and deletes the ref after a configurable quarantine window (default 7 days). Operators can override per profile. + +## Diagrams +```mermaid +graph LR + base((refs/heads/main)) -->|start| S1[session:alice/01H] + S1 -->|checkpoint| C1(commit a) + C1 -->|undo| C0 + C1 -->|fork| S2[session:alice/01J] + S2 -->|publish| Journal[refs/gatos/journal/demo] +``` + +```mermaid +sequenceDiagram + participant User + participant SessionRef as refs/gatos/sessions// + participant PolicyGate + participant Journal as refs/gatos/journal/ns + + User->>SessionRef: start/fork + User->>SessionRef: checkpoint commits + User->>SessionRef: undo/fork operations + User->>PolicyGate: publish request (session diff) + PolicyGate-->>Journal: allow → commit events + PolicyGate-->>User: deny → audit refs/gatos/audit/sessions//deny +``` + +## Consequences +- Provides deterministic scratch space with provenance, enabling local iteration without bypassing policy. +- Requires CLI + watcher support; publishing adds latency because policy revalidation runs on the combined session diff. +- Garbage collection policies must balance cleanliness with forensic needs; audit refs preserve history even after refs are deleted. + +## Implementation Notes +- Session metadata lives under `.gatos/session/.json`; deletion without publishing triggers a warning logged in `refs/gatos/audit/sessions//abandoned`. +- Hooks MUST prevent pushes of session refs to remotes other than the owner’s scratch remotes; canonical remotes reject `refs/gatos/sessions/**` to keep history clean. + +## Open Questions +- Should session publishes support partial cherry-picks, or must entire session histories publish atomically? +- Do we enforce a hard limit on concurrent sessions per actor, or leave it to policy modules to decide? diff --git a/docs/decisions/ADR-0016/DECISION.md b/docs/decisions/ADR-0016/DECISION.md new file mode 100644 index 00000000..368915cf --- /dev/null +++ b/docs/decisions/ADR-0016/DECISION.md @@ -0,0 +1,85 @@ +--- +Status: Draft +Date: 2025-11-18 +ADR: ADR-0016 +Authors: [flyingrobots] +Requires: [ADR-0002, ADR-0004, ADR-0014] +Related: [ADR-0011] +Tags: [Proofs, Reproducibility, Science] +Schemas: + - schemas/v1/proofs/pox_envelope.schema.json +Supersedes: [] +Superseded-By: [] +--- + +# ADR-0016: Proof-of-Experiment (PoX) + +## Scope +Standardize the Proof-of-Experiment (PoX) envelope, CLI workflow, and storage conventions that tie inputs, policy, folds, jobs, and outputs into a reproducible scientific artifact. + +## Rationale +Researchers need a verifiable, portable object that answers: *Which inputs and code produced these results under which policy, and can I replay it?* Without a canonical PoX, viewers cannot trust published figures, and automation cannot enforce reproducibility gates. PoE (jobs) and PoF (state) exist, but there is no top-level commitment linking them; PoX fills that gap. + +## Decision +1. **Envelope Definition** + - Canonical JSON schema `schemas/v1/proofs/pox_envelope.schema.json` with fields: + - `type = "pox"`, `ulid`, `inputs_root`, `program_id`, `policy_root`, `policy_code_root`, `outputs_root`. + - `links.poe[]` = array of Proof-of-Execution digests; `links.pof[]` = array of Proof-of-Fold digests. + - Optional `metadata` map for lab notebook context (title, DOI, contact). + - Canonicalization uses RFC 8785 JCS; the signed digest is `blake3(canonical_bytes)`. +2. **Storage & Refs** + - PoX commits live under `refs/gatos/audit/proofs/experiments/`. The commit tree contains `pox/envelope.json` plus optional attachments (`inputs.json`, `outputs.json`, etc.). + - Commit trailers include `PoX-Id`, `Inputs-Root`, `Program-Id`, `Outputs-Root`, and `Policy-Root` for easy discovery. +3. **CLI Workflow** + - `git gatos pox create --inputs --program ` collects inputs (as pointer manifests), program fingerprints (OCI digest, WASM hash, etc.), outputs (state refs, opaque pointers), and referenced PoE/PoF ULIDs. Optionally attaches raw artifacts into `pox//`. + - `git gatos pox sign --id --key ` signs the canonical envelope with the actor’s key (ed25519) and records the signature in the envelope `sig` field. + - `git gatos pox verify --id ` checks signature validity, policy ancestry, and that all linked PoE/PoF digests exist and validate. + - `git gatos reproduce ` orchestrates: fetch inputs (resolving opaque pointers via policy), replay jobs described by linked PoE entries, fold state checkpoints, and diff outputs. +4. **Integration Rules** + - Policies may require PoX IDs on publish (e.g., research profile). Pre-receive hooks reject pushes missing corresponding PoX entries when `policy.require_pox` is true. + - Explorer exports (ADR-0011) record the PoX ULID alongside derived tables to create an analytics lineage chain. + - The GitHub App (ADR-0010) surfaces PoX verification status as a PR check when experiments touch governed namespaces. +5. **Reproduction Semantics** + - Reproduction logs are written to `refs/gatos/audit/pox//repro/` with metadata (timestamp, verifier, status, divergence summary). + - Determinism expectations: successful reproduction must match `outputs_root`. Divergence stores the diff summary and references the offending PoE/PoF for debugging. + +## Diagrams +```mermaid +flowchart TD + Inputs[inputs_root] --> PoX + Program[program_id] --> PoX + Policy[policy_root + policy_code_root] --> PoX + Outputs[outputs_root] --> PoX + PoE[PoE digests] --> PoX + PoF[PoF digests] --> PoX + PoX --> Audit[refs/gatos/audit/proofs/experiments/] +``` + +```mermaid +sequenceDiagram + participant Author + participant CLI as git gatos + participant Audit as refs/gatos/audit/proofs/experiments + participant Verifier + + Author->>CLI: pox create + sign + CLI->>Audit: commit envelope + attachments + Verifier->>CLI: pox verify + CLI->>Audit: fetch envelope & linked PoE/PoF + CLI-->>Verifier: status + reproducer script + Verifier->>CLI: reproduce + CLI->>Audit: log reproduction result +``` + +## Consequences +- Establishes a reproducible artifact for publications and compliance. +- Requires deterministic hashing of inputs/programs; opaque pointers must supply plaintext digests or reproducibility fails. +- Additional storage under `refs/gatos/audit/proofs/experiments/**` increases repo size; pruning policies must preserve scientific records. + +## Implementation Notes +- `program_id` accepts multiple encodings via tagged union: `wasm:`, `oci:`, `containerd:`. CLI normalizes to lowercase hex digests. +- Inputs/outputs can reference Explorer-Root manifests; when present, `inputs_root` is the explorer-root digest rather than raw blob hashes. + +## Open Questions +- Should PoX enforce multi-signer signatures (e.g., PI + operator) by default or defer to policy requirements? +- Do we permit redacted attachments (e.g., private data) if the pointer manifests are public, or must all attachments be shareable? diff --git a/docs/decisions/README.md b/docs/decisions/README.md index ec52271c..8390d338 100644 --- a/docs/decisions/README.md +++ b/docs/decisions/README.md @@ -42,3 +42,5 @@ Each ADR will have a status, typically one of the following: | [ADR-0012](./ADR-0012/DECISION.md) | Federated Repositories & Mounts | Draft | 2025-11-09 | | [ADR-0013](./ADR-0013/DECISION.md) | Partial & Lazy Folds | Draft | 2025-11-09 | | [ADR-0014](./ADR-0014/DECISION.md) | Proof-Of-Fold (Attestation of State) | Draft | 2025-11-09 | +| [ADR-0015](./ADR-0015/DECISION.md) | Sessions (Ephemeral Working Branches) | Draft | 2025-11-18 | +| [ADR-0016](./ADR-0016/DECISION.md) | Proof-of-Experiment (PoX) | Draft | 2025-11-18 | diff --git a/schemas/v1/proofs/pox_envelope.schema.json b/schemas/v1/proofs/pox_envelope.schema.json new file mode 100644 index 00000000..f26804a2 --- /dev/null +++ b/schemas/v1/proofs/pox_envelope.schema.json @@ -0,0 +1,45 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "title": "GATOS Proof-of-Experiment Envelope", + "type": "object", + "required": [ + "type", + "ulid", + "inputs_root", + "program_id", + "policy_root", + "policy_code_root", + "outputs_root", + "links", + "sig_alg", + "sig" + ], + "properties": { + "type": { "const": "pox" }, + "ulid": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" }, + "inputs_root": { "type": "string", "pattern": "^[a-z0-9]+:[a-f0-9]+$" }, + "program_id": { "type": "string" }, + "policy_root": { "type": "string" }, + "policy_code_root": { "type": "string" }, + "outputs_root": { "type": "string", "pattern": "^[a-z0-9]+:[a-f0-9]+$" }, + "metadata": { "type": "object", "additionalProperties": { "type": ["string", "number", "boolean"] } }, + "links": { + "type": "object", + "required": ["poe", "pof"], + "properties": { + "poe": { + "type": "array", + "items": { "type": "string", "pattern": "^[a-z0-9]+:[a-f0-9]+$" } + }, + "pof": { + "type": "array", + "items": { "type": "string", "pattern": "^[a-z0-9]+:[a-f0-9]+$" } + } + }, + "additionalProperties": false + }, + "sig_alg": { "type": "string", "enum": ["ed25519"] }, + "sig": { "type": "string" } + }, + "additionalProperties": false +} From 4970d129dd5c7a7b47d1c73f09a651d0353ec799 Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Tue, 18 Nov 2025 08:45:09 -0800 Subject: [PATCH 24/25] docs: align spec and guide with sessions & PoX --- docs/SPEC.md | 17 +++++++++++++++-- docs/TASKS.md | 6 +++--- docs/TECH-SPEC.md | 37 +++++++++++++++++++++++++++++++++++++ docs/guide/CHAPTER-005.md | 13 +++++++++++++ docs/guide/CHAPTER-010.md | 14 ++++++++++++++ 5 files changed, 82 insertions(+), 5 deletions(-) diff --git a/docs/SPEC.md b/docs/SPEC.md index 759ac387..9e032ca9 100644 --- a/docs/SPEC.md +++ b/docs/SPEC.md @@ -811,7 +811,13 @@ Retention and compaction: -`gatos/sessions//` represents an ephemeral branch for interactive mutation. +`gatos/sessions//` represents an ephemeral branch for interactive mutation (see ADR-0015). + +- Session refs are **actor-scoped**; `` resolves via the trust graph and inherits watcher locks + policy grants from ADR-0003/0006. +- `git gatos session start [--from ]` records metadata (`base_ref`, `policy_root`, `fold_root`) under `.gatos/session/.json` so later publishes can detect drift. +- Checkpoints are plain Git commits that MUST include trailers `Session-Id`, `Policy-Root`, and `Fold-Root`. +- Undo/fork operations stay within the session ref; publishes rebase onto governed namespaces and route through the Policy Gate. Denials are recorded under `refs/gatos/audit/sessions//deny/*`. +- Idle sessions expire after 30 days by default; pruning writes an audit event before deleting the ref. ```mermaid graph TD @@ -873,7 +879,7 @@ Proofs **MUST** be stored under `refs/gatos/audit/proofs/`. -A **PoX** envelope ties together a scientific artifact’s inputs, program, policy, and outputs: +A **PoX** envelope (ADR-0016) ties together a scientific artifact’s inputs, program, policy, and outputs: - `inputs_root` — commitment to input datasets/pointers - `program_id` — canonical hash of the analysis program/container @@ -883,6 +889,13 @@ A **PoX** envelope ties together a scientific artifact’s inputs, program, poli PoX envelopes **MUST** be stored under `refs/gatos/audit/proofs/experiments/`. +CLI flow: + +- `git gatos pox create` collects input/output manifests plus PoE/PoF digests. +- `git gatos pox sign` signs the canonical envelope and writes it to `refs/gatos/audit/proofs/experiments/`. +- `git gatos pox verify` checks signatures, policy ancestry, and linked proofs. +- `git gatos reproduce` pulls inputs via opaque pointers, replays PoE jobs, refolds state, and logs results under `refs/gatos/audit/pox//repro/`. + --- ## 11. Offline Authority Protocol (OAP) diff --git a/docs/TASKS.md b/docs/TASKS.md index 90d3b9ce..1086262c 100644 --- a/docs/TASKS.md +++ b/docs/TASKS.md @@ -57,15 +57,15 @@ - [ ] Tests: - [ ] exactly-once torture -## EPIC-5: Sessions +## EPIC-5: Sessions (ADR-0015) -- [ ] start +- [ ] start (CLI + RPC per ADR-0015) - [ ] undo - [ ] fork - [ ] merge -- [ ] lattice/DPO joins for conflicts +- [ ] lattice/DPO joins for conflicts (tie into ADR-0013 concurrency rules) ## EPIC-6: CAS & Opaque Pointers diff --git a/docs/TECH-SPEC.md b/docs/TECH-SPEC.md index 3438bf22..f20a4f04 100644 --- a/docs/TECH-SPEC.md +++ b/docs/TECH-SPEC.md @@ -856,3 +856,40 @@ The daemon exposes `messages.read` over the JSONL RPC channel so workers and bri - `limit_out_of_range` (409) — `limit < 1`. `gatos-message-plane` is responsible for translating RPC calls to actual Git ref walks and enforcing ULID monotonicity per ADR-0005. + +--- + +## 18. Sessions & PoX Tooling + +### Sessions (ADR-0015) + +- **RPC surface** + - `session.start { base_ref?, actor? }` → daemon validates actor key, resolves base ref (default `refs/heads/main`), generates ULID, creates `refs/gatos/sessions//` pointing to base, and writes metadata JSON. + - `session.checkpoint { session_id }` → daemon shells out to Git to create a commit with required trailers. + - `session.undo/fork/publish` map 1:1 to CLI subcommands; publish calls the Policy API using the aggregated diff. +- **Storage** + - Metadata file `.gatos/session/.json` structure: + ```json + { + "session_id": "01H...", + "actor": "ed25519:...", + "base_ref": "refs/heads/main", + "policy_root": "", + "fold_root": "sha256:", + "created_at": "2025-11-18T18:00:00Z" + } + ``` + - Session-local folds live in `refs/gatos/sessions///state//` so Echo can diff shapes. +- **GC job** + - `gatos gc sessions` scans metadata, deletes sessions idle >30 days after logging to `refs/gatos/audit/sessions//expired`. + +### Proof-of-Experiment (ADR-0016) + +- **Schema**: `schemas/v1/proofs/pox_envelope.schema.json` (RFC 8785). CLI ensures `program_id` accepts `wasm:`, `oci:`, or `exec:` prefixes. +- **Workflow** + 1. `pox create` collects pointer manifests (`inputs_root`), Explorer outputs, PoE/PoF links. + 2. `pox sign` signs BLAKE3 digest with ed25519; supports hardware keys via `gatos key use`. + 3. `pox publish` writes commit to `refs/gatos/audit/proofs/experiments/`. + 4. `pox verify` checks signatures + linked proofs; if missing, returns structured errors. + 5. `reproduce` drives workers + folds; writes audit logs under `refs/gatos/audit/pox//repro/`. +- **GitHub App**: exposes `gatos/pox` status check by calling `pox.status` RPC, ensuring experiments touching governed namespaces reference an existing PoX ULID. diff --git a/docs/guide/CHAPTER-005.md b/docs/guide/CHAPTER-005.md index 888dffd5..2fc9e329 100644 --- a/docs/guide/CHAPTER-005.md +++ b/docs/guide/CHAPTER-005.md @@ -91,6 +91,19 @@ By intercepting all writes, the Stargate can run powerful server-side **`pre-rec This local-first enforcement provides low-latency, high-security writes that would be impossible on a public SaaS platform. +## Sessions: Scratch Space With Policy Guarantees + + + +Not every edit is ready for the ledger. ADR-0015 introduces **sessions** – ephemeral refs under `refs/gatos/sessions//` – so you can iterate locally without bypassing governance: + +- `git gatos session start` snapshots the base ref, `policy_root`, and `fold_root`, then spins up a dedicated branch for your experiments. +- `session checkpoint`, `undo`, and `fork` stay inside that sandbox while watcher/hooks continue to enforce locks. +- When you’re ready, `session publish` feeds the entire diff through the Policy Gate. If policy denies the publish, the denial is logged under `refs/gatos/audit/sessions//deny/*` so nothing disappears silently. +- Idle sessions auto-expire after 30 days (configurable), but their audit trail lives on forensics. + +Sessions feel like private Git branches, but they inherit all the determinism guarantees—folds you run inside a session record their `Session-Shape-Root`, so anyone can reproduce your intermediate states before you promote them. + ## The Magic Mirror diff --git a/docs/guide/CHAPTER-010.md b/docs/guide/CHAPTER-010.md index 08af0f4e..c90ccc11 100644 --- a/docs/guide/CHAPTER-010.md +++ b/docs/guide/CHAPTER-010.md @@ -14,6 +14,7 @@ - [Verifiable Folds on Private Data](#verifiable-folds-on-private-data) - [Blob Availability Attestation (BAA)](#blob-availability-attestation-baa) - [Rekeying](#rekeying) +- [Proof-of-Experiment (PoX)](#proof-of-experiment-pox) - [Summary](#summary) @@ -146,6 +147,19 @@ Policies can require a valid BAA before pointers are accepted into public state. The Opaque Pointer model also supports **`rekey`** operations. An authorized user can decrypt a blob and re-encrypt it with a new key, creating a new Opaque Pointer. This allows for secure key rotation and sharing of private data with new parties without changing the underlying data itself. +## Proof-of-Experiment (PoX) + + + +PoE proves jobs, PoF proves folds—but scientists need a single artifact that says *“this figure came from these inputs under this policy and you can replay it.”* ADR-0016 formalizes the **PoX envelope**: + +- Fields: `inputs_root`, `program_id`, `policy_root`, `policy_code_root`, `outputs_root`, plus links to the PoE/PoF digests that backed the work. +- Storage: commits under `refs/gatos/audit/proofs/experiments/` containing `pox/envelope.json` and optional attachments (input manifests, analysis notebooks, PDFs). +- CLI: `git gatos pox create` → `pox sign` → `pox publish` to record the envelope; `pox verify` and `reproduce` let reviewers validate signatures and re-run the experiment. Reproduction logs live under `refs/gatos/audit/pox//repro/`. +- Policy hooks: research profiles can require PoX IDs before merges; the GitHub App surfaces PoX status as a required check. + +When you cite results, reference the PoX ULID and repo commit (or DOI). Anyone with access to the repo + referenced blobs can rebuild your outputs exactly—or see a divergence report if something drifted. + ## Summary From b70539bfa00ade8ac15e38bc11d842a69932a383 Mon Sep 17 00:00:00 2001 From: "J. Kirby Ross" Date: Tue, 18 Nov 2025 08:57:19 -0800 Subject: [PATCH 25/25] docs: add operations & observability chapter --- docs/guide/CHAPTER-013.md | 132 ++++++++++++++++++++++++++++++++++++++ docs/guide/README.md | 11 ++++ 2 files changed, 143 insertions(+) create mode 100644 docs/guide/CHAPTER-013.md diff --git a/docs/guide/CHAPTER-013.md b/docs/guide/CHAPTER-013.md new file mode 100644 index 00000000..55a0ba83 --- /dev/null +++ b/docs/guide/CHAPTER-013.md @@ -0,0 +1,132 @@ +# Chapter 13: Operations & Observability + + + + + +- [Operating Profiles & SLO Guardrails](#operating-profiles--slo-guardrails) +- [Health Checks & Probes](#health-checks--probes) +- [Metrics & Dashboards](#metrics--dashboards) +- [Audit Trails & Forensics](#audit-trails--forensics) +- [Troubleshooting Playbooks](#troubleshooting-playbooks) +- [Runbook Starter Checklist](#runbook-starter-checklist) + + + +GATOS is more than a set of crates—it is a living operating surface. Once the Ledger, Policy, Message, Job, and State planes are running, operators need clear guidance on how to keep them healthy. This chapter distills best practices from ADR-0006 (Watcher), ADR-0009 (Streams), ADR-0011 (Exporter), ADR-0014 (PoF), ADR-0015 (Sessions), and ADR-0016 (PoX) into concrete SLOs, probes, and playbooks. + +## Operating Profiles & SLO Guardrails + + + +Profiles (SPEC §12) declare enforcement defaults; operators should extend them with service-level objectives: + +| Profile | Default Guardrails | Recommended SLOs | +| :-- | :-- | :-- | +| `local` | Hooks optional, watcher best-effort | Fold latency < 2s p95, watcher drift alerts < 5 min | +| `push-gate` | Stargate required, PoF enforced, locks FF-only | Policy gate latency < 1s p95, mirror lag < 30s | +| `research` | PoF + PoE mandatory, PoX encouraged | PoX backlog < 3 experiments, Pointer drift incidents = 0 | + +Configure `gatosd` with per-profile budgets (e.g., `gatos.toml`): + +```toml +[profile.research.slo] +policy_gate_ms_p95 = 750 +mirror_lag_seconds = 30 +pox_backlog_max = 3 +``` + +## Health Checks & Probes + + + +Expose the following HTTP endpoints on `gatosd` (or the Stargate service): + +| Endpoint | Purpose | Data | +| :-- | :-- | :-- | +| `/healthz` | Liveness | basic process + disk checks | +| `/readyz` | Readiness | Git remote reachability, policy cache warm, job runners available | +| `/hooksz` | Hook status | watcher daemon heartbeat, lock cache age | +| `/streamz` | Message/stream backlog | per-topic lag, credit utilization | + +```mermaid +sequenceDiagram + participant Probe + participant gatosd + participant Policy + participant Git + Probe->>gatosd: GET /readyz + gatosd->>Policy: ping policy VM + gatosd->>Git: fetch --dry-run mirror + gatosd-->>Probe: 200 { "policy_ms": 42, "mirror_lag": 1.2 } +``` + +Color the response JSON with simple status buckets (`ok`, `degraded`, `failed`). Integrate with Kubernetes, Nomad, or systemd watchdogs as appropriate. + +## Metrics & Dashboards + + + +Prometheus-style metrics should cover: + +- `gatos_policy_gate_duration_seconds{profile}` — histogram per profile (target p95 < 1s). +- `gatos_message_plane_lag_messages{topic}` — difference between newest ULID and consumer checkpoints. +- `gatos_jobs_claim_conflicts_total` — CAS failures; alert when rising. +- `gatos_pox_backlog_total` — open PoX envelopes without verified reproduction. +- `gatos_session_active_total{actor}` — active sessions per actor; watch for runaway forks. + +Dashboards typically include panels for: + +1. **Policy & Fold Latency** (stacked area showing gates vs folds). +2. **Bus / Stream Lag** (per topic + federation proxy lag when ADR-0009 bridge is deployed). +3. **Proof Coverage** (counts of PoF, PoE, PoX produced per day). +4. **Export / Explorer Health** (export runtimes, explorer-root verification failures). + +## Audit Trails & Forensics + + + +Everything critical is already versioned, but operators should know where to look: + +- Policy denies → `refs/gatos/audit/policy/deny/`. +- Watcher/hook events → `refs/gatos/audit/locks/` and local JSONL logs. +- Session publishes/denials → `refs/gatos/audit/sessions//*`. +- PoX artifacts → `refs/gatos/audit/proofs/experiments/` and reproduction logs under `refs/gatos/audit/pox//repro/*`. +- Federation sync issues → `refs/gatos/audit/federation//`. + +Augment Git history with structured log shipping (e.g., forward watcher JSONL to Loki) for faster searches, but treat Git as the source of truth. + +## Troubleshooting Playbooks + + + +1. **Mirror lag > SLO** + - Check `/readyz` → `mirror_lag`. + - If degraded, run `git gatos stargate mirror --status` to identify stuck refs. + - Inspect `refs/gatos/audit/mirror/`; rollback if necessary. +2. **Policy gate latency spike** + - Review `gatos_policy_gate_duration_seconds`. + - Ensure policy cache is warm; run `gatos policy warmup`. + - Look for large session publishes; consider splitting sessions per ADR-0015. +3. **Message Plane backlog** + - Query `/streamz`; if over credit budget, scale consumers or enable federation proxy. + - Validate consumers advance `refs/gatos/consumers/**`; run `gatos messages checkpoint repair` if stale. +4. **PoX verification failure** + - `git gatos pox verify --id ` to see failing component. + - Re-run `git gatos reproduce` with `--log` to capture divergence diff. +5. **Opaque pointer drift** + - Compare pointer digest vs fetched plaintext; log incident under `refs/gatos/audit/privacy/`. + - Rotate capability credentials; update policy rules to require BAA signatures. + +## Runbook Starter Checklist + + + +- [ ] Define SLO dashboards per profile (`local`, `push-gate`, `research`). +- [ ] Wire `/healthz` + `/readyz` into orchestration layer. +- [ ] Configure log shipping for watcher, policy gate, and federation events. +- [ ] Schedule `gatos export verify` nightly; alert on explorer-root drift. +- [ ] Enforce PoX required checks for research namespaces. +- [ ] Document emergency procedures (mirror rollback, policy hotfix, pointer quarantine) in your own runbooks referencing the audit refs above. + +GATOS makes it easy to prove *what* happened; this chapter shows how to prove it is *still healthy*. diff --git a/docs/guide/README.md b/docs/guide/README.md index 61922c2e..2b7df5ca 100644 --- a/docs/guide/README.md +++ b/docs/guide/README.md @@ -338,6 +338,17 @@ See the full step-by-step guides: - **Read this if:** - You are interested in the future direction of GATOS and its potential to change how we build distributed and AI-integrated systems. +- [Chapter 13: Operations & Observability](./CHAPTER-013.md) + - **Objective:** + - To guide operators through health probes, SLOs, metrics, and incident response using the audit surfaces built into GATOS. + - **Key Concepts:** + - Profiles & Guardrails, `/healthz` vs `/readyz`, stream lag, PoX backlogs + - Metrics (policy gate latency, message lag, proof coverage) + - Audit references for sessions, locks, federation, PoX + - Troubleshooting playbooks (mirror lag, policy spikes, pointer drift) + - **Read this if:** + - You deploy or operate GATOS nodes and need concrete runbooks and observability patterns. + ## Glossary (Quick Reference)