GATOS Privacy: ADR‑0004 + SPEC/TECH‑SPEC + Pointer Schema + gatos‑privacy crate #47

flyingrobots · 2025-11-10T18:04:22Z

This PR formalizes the hybrid privacy model (ADR‑0004) and wires the supporting schema/tooling.

Highlights

POST + Bearer JWT resolver (optional HTTP Message Signatures profile).
Opaque Pointer schema (draft 2020‑12) with ciphertext_digest and extensions.
RFC 8785 JCS canonicalization (normative) + non‑JSON key ordering.
Policy schema extended with privacy.classes and privacy.rules (optional), with examples.
Makefile and scripts compile/validate schemas including privacy.
New gatos-privacy crate (typed OpaquePointer envelope).

Notable details

Low‑entropy privacy class gates public plaintext digests; requires ciphertext_digest in public pointers.
Resolver response digest headers reflect the response body (plaintext or ciphertext).
Error taxonomy harmonized: DigestMismatch is 422.
Namespacing clarified: private overlay indices in refs, blobs in external stores.
Pointer rotation flow documented with Privacy-Pointer-Rotations trailer.

Follow‑ups to be tracked as issues

Projection Determinism golden vectors (JCS across OS/arch)
Low‑entropy PII torture vector (ciphertext‑only exposure)
Pointer rotation example + CI trailer gate
capability_ref indirection example
Governance PoC canonical storage harmonization (ADR‑0003 and SPEC now aligned to MUST)

Receipts

Branch: decisions/ADR-0004
Commits: d363688, 4bbbccb

Ready for review.

This commit addresses 8 specific issues raised during review of the initial ADR-0004 documentation. The changes improve clarity, consistency, and technical precision across the specification. Fixes include: - **FEATURES.md**: - Rewrote acceptance criteria for F9-US-DEV to be specific and verifiable. - Removed the superseded F6 feature to avoid confusion with F9. - **SPEC.md**: - Corrected Mermaid diagram type casing ( -> ) to align with JSON standards. - Clarified that private blob decryption during pointer resolution is conditional, not assumed. - **TECH-SPEC.md**: - Removed redundant participant aliases in the projection sequence diagram. - Replaced the unconventional endpoint with a more RESTful URL. - Improved diagram precision by changing 'node' to 'field path'. - Specified JWS with a detached payload as the required authentication mechanism for pointer resolution.

Updated the acceptance criteria for F9-US-SEC in FEATURES.md to explicitly mention the and commit trailers, improving clarity for developers.

…esolution

…iagram

…SPEC.md

…dance

…nularity

…ional HTTP Message Signatures profile; tighten determinism (RFC 8785 JCS + key ordering); clarify namespaces and error taxonomy; add pointer rotation trailer\nschema(privacy): add ciphertext_digest + extensions; bump to draft 2020-12\nschema(policy): add optional privacy.classes + rules\nexamples: add privacy_min and updated opaque_pointer example\nscripts: validate privacy schema/examples in CI\nmake: fix tab indentation for lint/fix targets\nchore: add gatos-privacy crate with OpaquePointer type and notes

…onse digest headers reflect response body\nschema(privacy): allow ciphertext-only pointers via anyOf; keep kind/algo/location/capability required\ndocs(ADR-0004): fix colon in class diagram;\ndocs(ADR-0003): PoC storage level to MUST

coderabbitai · 2025-11-10T18:04:33Z

Summary by CodeRabbit

New Features
- Added privacy and opaque pointer support for securing sensitive data.
- Hybrid privacy model enables separation of public and private state with configurable policies.
- Policy-driven redaction and pointerization of protected content.
- Opaque pointer resolution protocol for accessing encrypted private data.
Documentation
- Added comprehensive privacy architecture and implementation guides.
- New use case documentation for verifiable, compliant PII management.

Walkthrough

Introduces a hybrid privacy model to GATOS comprising: a new gatos-privacy crate defining OpaquePointer types, schema definitions for privacy policies and opaque pointers, normative ADR-0004 formalizing privacy projection and pointer resolution mechanics, extended specification and implementation docs, build tooling updates for schema validation, and minimal example configurations.

Changes

Cohort / File(s)	Summary
Privacy Crate Foundation `Cargo.toml`, `crates/gatos-privacy/Cargo.toml`, `crates/gatos-privacy/src/lib.rs`	Workspace member registration and new Rust crate with `OpaquePointer` struct (kind, algo, digest, ciphertext_digest, size, location, capability, extensions) and supporting enums `Kind` (OpaquePointer), `Algo` (Blake3); derives serde annotations for JSON serialization with `deny_unknown_fields` and optional field handling.
Schema & Validation Infrastructure `schemas/v1/privacy/opaque_pointer.schema.json`, `schemas/v1/policy/governance_policy.schema.json`, `scripts/validate_schemas.sh`, `Makefile`	Adds JSON schema for opaque pointers (with anyOf conditional requirements for digest/ciphertext_digest), expands governance policy schema with privacy classes/rules object, extends schema compilation/validation targets, and updates validation script to include privacy schema example checks.
Example Artifacts `examples/v1/privacy/opaque_pointer_min.json`, `examples/v1/policy/privacy_min.json`	Minimal opaque pointer instance (blake3 hashes, location/capability URIs) and minimal privacy policy (pii_low_entropy class, pointerize rule with capability/location).
Specification & Narrative `docs/SPEC.md`, `docs/TECH-SPEC.md`, `docs/FEATURES.md`, `docs/USE-CASES.md`	Expanded SPEC with privacy/opaque pointer section, projection model, and resolution protocol; TECH-SPEC extended with privacy projection/resolution flows, component responsibilities, and determinism assertions; FEATURES replaced F6 with F9 (Hybrid Privacy Model); USE-CASES added PII management use case.
Decision & Architecture Records `docs/decisions/ADR-0004/DECISION.md`, `docs/decisions/ADR-0003/DECISION.md`, `docs/decisions/README.md`	New ADR-0004 formalizing hybrid privacy model (Projection Functor, opaque pointers, resolution protocol, policy hooks, trailers, namespacing); ADR-0003 refined PoC envelope approval ordering to lexicographic sort; README updated with ADR-0004 entry.

Sequence Diagram(s)

sequenceDiagram
    participant Echo as gatos-echo<br/>(State Engine)
    participant Policy as gatos-policy<br/>(Policy Service)
    participant PrivateStore as PrivateStore<br/>(Backend)
    participant Ledger as gatos-ledger<br/>(Refs)
    
    Note over Echo,Ledger: Privacy Projection Flow
    Echo->>Echo: Fold history → UnifiedState
    Echo->>Policy: Fetch privacy policy (policy.yaml)
    Policy-->>Echo: Privacy config (classes, rules)
    Echo->>Echo: Apply redact/pointerize<br/>Project UnifiedState → PublicState + PrivateBlobs
    Echo->>PrivateStore: Persist PrivateBlobs (encrypted)
    PrivateStore-->>Echo: OK (storage refs)
    Echo->>Ledger: Commit PublicState (with opaque pointers)<br/>+ Trailers (Privacy-Redactions, Privacy-Pointers)
    Ledger-->>Echo: Refs updated

sequenceDiagram
    participant Client
    participant gatosd as gatosd<br/>(Daemon)
    participant Ledger as gatos-ledger
    participant PrivateStore
    
    Note over Client,PrivateStore: Opaque Pointer Resolution Flow
    Client->>gatosd: POST /gatos/private/blobs/resolve<br/>(pointer, JWT auth)
    gatosd->>Ledger: Fetch pointer metadata<br/>+ access policy
    Ledger-->>gatosd: Pointer + policy
    gatosd->>gatosd: Verify JWT (auth)<br/>Evaluate policy
    alt Authorization Success
        gatosd->>PrivateStore: Retrieve encrypted blob<br/>(using location/capability)
        PrivateStore-->>gatosd: Encrypted blob
        gatosd->>gatosd: Decrypt blob (capability key)<br/>Verify digest & ciphertext_digest
        gatosd-->>Client: Decrypted blob (HTTP 200)
    else Authorization Denied
        gatosd-->>Client: Error (Unauthorized/Forbidden)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

ADR-0004 normative specifications — Dense formalization of projection functor, pointer resolution protocol, policy hooks, and error taxonomy; requires careful validation of correctness against spec intent and RFC references (RFC 8785 JCS, RFC 9421).
OpaquePointer struct design — Field semantics (digest vs ciphertext_digest conditional logic, size/extensions optionality, URI format for location/capability) must align with schema constraints and resolution protocol expectations.
Schema conditional logic — anyOf requirements for digest/ciphertext_digest in opaque_pointer.schema.json need verification; governance policy schema privacy section structure must be compatible with policy evaluation rules.
Projection determinism claims — TECH-SPEC asserts byte-for-byte determinism across platforms; this requires critical review of canonicalization rules, key ordering, and whether all sources of non-determinism are accounted for.
Integration points — Verify component responsibilities (Echo projection, Policy rule evaluation, PrivateStore persistence, gatosd resolution) are correctly sequenced and error handling is complete.

Possibly related PRs

ADR-0003: Consensus Governance + v1 Schemas + SPEC/TECH updates #8 — Added governance schema tooling and initial validation pipeline; this PR extends that foundation with privacy schemas, examples, and validation coverage in the same files (Makefile, scripts/validate_schemas.sh).

Poem

🔐 A private whisper wrapped in blake3's embrace,
Opaque pointers hide truth in a cryptic space,
Where policies govern what the world may see—
Public projection keeps secrets locked and free! 🗝️✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title precisely captures the core changes: ADR-0004 formalization, SPEC/TECH-SPEC updates, privacy schema introduction, and the new gatos-privacy crate. It is specific, directly related to the main changeset, and clearly summarizes the primary contribution.
Description check	✅ Passed	The description is comprehensive and directly related to all aspects of the changeset, detailing specific implementation choices (POST + JWT, RFC 8785 JCS, opaque pointer schema, policy extensions), architectural considerations, and follow-up items. It effectively conveys the purpose and scope of the changes.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch decisions/ADR-0004

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-11-10T18:06:25Z

crates/gatos-privacy/src/lib.rs

+pub struct OpaquePointer {
+    pub kind: Kind,
+    pub algo: Algo,
+    pub digest: String,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub ciphertext_digest: Option<String>,


Allow ciphertext-only pointers without digest

The new JSON schema explicitly permits opaque pointers that omit digest when ciphertext_digest is provided, but the Rust struct still models digest as a mandatory String. Any ciphertext-only pointer produced according to the schema will fail deserialization because serde expects the digest field. Either digest should be optional (Option<String>) or the schema should require it so that the type and schema remain aligned.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 16

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 38be8f1 and 4bbbccb.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (16)

Cargo.toml (1 hunks)
Makefile (3 hunks)
crates/gatos-privacy/Cargo.toml (1 hunks)
crates/gatos-privacy/src/lib.rs (1 hunks)
docs/FEATURES.md (1 hunks)
docs/SPEC.md (6 hunks)
docs/TECH-SPEC.md (4 hunks)
docs/USE-CASES.md (1 hunks)
docs/decisions/ADR-0003/DECISION.md (1 hunks)
docs/decisions/ADR-0004/DECISION.md (1 hunks)
docs/decisions/README.md (1 hunks)
examples/v1/policy/privacy_min.json (1 hunks)
examples/v1/privacy/opaque_pointer_min.json (1 hunks)
schemas/v1/policy/governance_policy.schema.json (1 hunks)
schemas/v1/privacy/opaque_pointer.schema.json (1 hunks)
scripts/validate_schemas.sh (3 hunks)

🧰 Additional context used

🪛 LanguageTool

docs/decisions/ADR-0004/DECISION.md

[style] ~123-~123: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...cy Policy** (.gatos/policy.yaml). 3. It traverses the UnifiedState tree, appl...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~129-~129: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...MUST be canonicalized with RFC 8785 JCS prior to hashing. - When non‑JSON maps are mater...

(EN_WORDINESS_PREMIUM_PRIOR_TO)

docs/TECH-SPEC.md

[grammar] ~222-~222: Ensure spelling is correct
Context: ...on that: (1) fetches; (2) decrypts; (3) re‑encrypts; (4) stores; (5) emits an audit event up...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: audit

🔇 Additional comments (19)

Cargo.toml (1)

10-10: LGTM! Clean addition of gatos-privacy to workspace members with proper ordering.

crates/gatos-privacy/Cargo.toml (1)

6-12: LGTM! Minimal, focused dependency set aligned with OpaquePointer serialization and crypto needs. Workspace coordination is clean.

docs/decisions/README.md (1)

23-23: LGTM! ADR-0004 entry follows established format and is correctly positioned in the decision log.

examples/v1/privacy/opaque_pointer_min.json (1)

1-9: LGTM! Minimal example correctly demonstrates all required OpaquePointer fields with appropriate placeholder digest values and URI formats.

docs/FEATURES.md (1)

198-213: F9-US-SEC user story is well-structured.

Acceptance criteria are specific and measurable (capability validation, digest verification, trailer accuracy). Test plan covers golden/edge/failure scenarios appropriately. ADR-0004 cross-reference adds context.

docs/USE-CASES.md (1)

89-95: LGTM! Use case 9 is well-articulated, cohesive with ADR-0004 concepts, and maintains format consistency with existing use cases 1-8. Clearly communicates the value of capability-gated, auditable PII handling.

docs/decisions/ADR-0003/DECISION.md (1)

89-92: Excellent normative clarifications on PoC envelope determinism.

Line 89's explicit "lexicographically sorted list of approvals by the lowercase ASCII of each approval's Signer value" pins down sorting order (critical for BLAKE3 reproducibility). Line 92's storage requirement (persisting canonical JSON blob under refs/gatos/audit/proofs/governance/<proposal-id>) provides auditability and enables offline verification. Both reinforce RFC 8785 JCS determinism goals and integrate well with ADR-0004's privacy/determinism narrative.

Makefile (2)

47-48: Schema-compile correctly extended for privacy schema.

Privacy schema compilation follows established pattern (draft2020, strict, ajv-formats, ids reference). Positioning before governance_policy compilation is logical. No missing dependencies or references.

61-62: Schema-validate correctly adds privacy example validation.

Validates opaque_pointer_min.json against opaque_pointer.schema.json with proper ids reference. Syntax mirrors existing validations (governance, job) and integrates cleanly into error-halting chain.

scripts/validate_schemas.sh (1)

57-58: LGTM: Privacy example validation correctly extends the validation chain.

The addition properly validates privacy_min.json against the governance policy schema, confirming that the governance schema's new privacy section is exercised. This creates a clean validation chain:

Standard governance example (line 56)

Privacy-focused governance example (lines 57-58)

Both validate against the same schema but exercise different schema surfaces.

docs/TECH-SPEC.md (3)

104-108: LGTM: Crate responsibilities updated to reflect privacy capabilities.

The crate summary accurately reflects the new privacy-related responsibilities:

gatos-echo: Privacy projection logic for folding UnifiedState into PublicState

gatos-policy: Privacy rule evaluation

gatosd: Opaque Pointer resolution endpoint

These align with the architectural flow described in ADR-0004.

284-288: LGTM: Projection Determinism test suite properly integrated.

The addition of a "Projection Determinism" test suite (line 284) with clear requirements (line 287-288) ensures that privacy projections produce byte-for-byte identical results across platforms. This is essential for verifiable privacy guarantees.

The test coverage aligns with the determinism requirements specified in ADR-0004 (lines 128-130).

173-192: Fix diagram participant label for naming consistency; verify PrivateStore trait implementation status.

The diagram lists "StorageBackend (Interface)" as a participant (line 178) but references PrivateStore in the sequence flow (line 187) and subsequent documentation (line 194). Either rename the participant to PrivateStore or rename all references to StorageBackend for consistency.

Additionally, PrivateStore is described as "a pluggable trait" (line 194) but no trait/interface definition exists in the codebase—verify whether this is aspirational design documentation or if the implementation is planned for a separate branch/crate. If planned, document the implementation status.

crates/gatos-privacy/src/lib.rs (1)

1-11: Excellent documentation and architectural notes.

The module-level documentation clearly explains the crate's purpose, references the normative ADR-0004, and provides critical guidance about JCS canonicalization requirements without introducing unnecessary dependencies.

The explicit note about delegating JCS implementation to higher layers (lines 7-10) is a good design choice for keeping the crate lean.

docs/decisions/ADR-0004/DECISION.md (2)

1-11: LGTM: ADR metadata complete and well-structured.

The front matter includes all required fields: Status, Date, ADR number, Authors, dependency relationships (Requires/Related), Tags, and Schema references. The Accepted status and date (2025-11-10) align with the PR timeline.

277-286: LGTM: Error taxonomy is complete and well-defined.

The normative error taxonomy (lines 277-286) covers all expected failure modes with appropriate HTTP status codes:

Authentication failures (401)

Authorization failures (403)

Missing resources (404)

Integrity failures (422 DigestMismatch)

Availability failures (503 CapabilityUnavailable)

The error codes align with HTTP semantics and provide clear guidance for implementers.

docs/SPEC.md (3)

74-74: Good: Explicitly marking compute crate as planned.

Adding "(planned)" to gatos-compute (line 74) sets clear expectations that this component is not yet implemented. This prevents confusion for early implementers or integrators.

682-683: Excellent clarification on approval expiration semantics.

The comment on line 682 clearly explains that an approval's Expires-At cannot extend the proposal's expiration—it can only further restrict validity. This prevents confusion about expiration precedence and potential security issues.

698-701: Good: PoC envelope storage made normative with MUST.

Line 701 correctly elevates PoC envelope storage from a recommendation to a requirement by changing "should" to "MUST". This ensures auditability and verifiability of consensus decisions.

The lexicographic ordering by signer (line 698) ensures deterministic PoC computation, which is critical for consensus verification.

coderabbitai · 2025-11-10T18:10:17Z

crates/gatos-privacy/src/lib.rs

+pub struct OpaquePointer {
+    pub kind: Kind,
+    pub algo: Algo,
+    pub digest: String,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub ciphertext_digest: Option<String>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub size: Option<u64>,
+    pub location: String,
+    pub capability: String,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub extensions: Option<Value>,
+}


⚠️ Potential issue | 🔴 Critical

CRITICAL: Rust struct violates schema's anyOf requirement.

Line 20 declares digest as a non-optional String, making it always required. However, schemas/v1/privacy/opaque_pointer.schema.json (lines 17-30) uses anyOf to require EITHER digest OR ciphertext_digest, allowing ciphertext-only pointers for low-entropy data per ADR-0004.

This mismatch means:

Deserialization will fail for valid ciphertext-only pointers

The Rust type cannot represent low-entropy privacy classes that MUST NOT expose plaintext digest

Both digest and ciphertext_digest must be Option<String>, with validation logic ensuring at least one is Some.

#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] #[serde(deny_unknown_fields)] pub struct OpaquePointer { pub kind: Kind, pub algo: Algo, - pub digest: String, + #[serde(skip_serializing_if = "Option::is_none")] + pub digest: Option<String>, #[serde(skip_serializing_if = "Option::is_none")] pub ciphertext_digest: Option<String>, #[serde(skip_serializing_if = "Option::is_none")] pub size: Option<u64>, pub location: String, pub capability: String, #[serde(skip_serializing_if = "Option::is_none")] pub extensions: Option<Value>, }

Then add a validation method:

impl OpaquePointer { pub fn validate(&self) -> Result<(), &'static str> { if self.digest.is_none() && self.ciphertext_digest.is_none() { return Err("At least one of digest or ciphertext_digest must be present"); } Ok(()) } }

coderabbitai · 2025-11-10T18:10:17Z

crates/gatos-privacy/src/lib.rs

+#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
+#[serde(rename_all = "snake_case")]
+pub enum Kind {
+    OpaquePointer,
+}


🧹 Nitpick | 🔵 Trivial

Consider: Single-variant enum may be premature.

The Kind enum has only one variant (OpaquePointer). Unless additional pointer kinds are planned in the near future, this could be simplified to a constant string or removed entirely, with the field being a simple &'static str or validated string.

However, if the enum is intentional for future extensibility, consider adding a doc comment explaining the planned variants.

If future kinds are planned:

/// Pointer envelope kinds. /// /// Future variants may include `TransparentPointer`, `NullPointer`, etc. #[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] #[serde(rename_all = "snake_case")] pub enum Kind { OpaquePointer, // Future: TransparentPointer, NullPointer, ... }

If not:

// Replace enum with constant pub const KIND: &str = "opaque_pointer"; // In struct: pub struct OpaquePointer { #[serde(default = "default_kind")] pub kind: String, // ... } fn default_kind() -> String { KIND.to_string() }

🤖 Prompt for AI Agents

In crates/gatos-privacy/src/lib.rs around lines 31 to 35, the Kind enum only has a single variant (OpaquePointer) which is premature; if you intend to add more pointer kinds later, keep the enum and add a doc comment above it describing planned future variants (e.g., TransparentPointer, NullPointer) so intent is clear; otherwise remove the enum and replace usages with a constant string (e.g., pub const KIND: &str = "opaque_pointer") and change the struct field to a String or &'static str with a serde default helper (add a default_kind() that returns KIND.to_string() and annotate the field with #[serde(default = "default_kind")]) so serialization and defaults behave the same.

coderabbitai · 2025-11-10T18:10:17Z

crates/gatos-privacy/src/lib.rs

+#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
+#[serde(rename_all = "lowercase")]
+pub enum Algo {
+    Blake3,
+}


🧹 Nitpick | 🔵 Trivial

Same consideration for Algo enum: future-proof or simplify.

The Algo enum has a single variant (Blake3). Similar to Kind, if no other algorithms are planned, consider whether an enum is necessary. However, cryptographic agility is often important, so the enum may be justified for future hash function migration.

If the enum is intentional for crypto-agility, add documentation explaining the design choice.

/// Hash algorithm for pointer digests. /// /// Currently only BLAKE3 is supported. Future variants may include /// SHA3-256 or other cryptographic hash functions as the ecosystem evolves. #[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] #[serde(rename_all = "lowercase")] pub enum Algo { Blake3, // Future: Sha3, ... }

🤖 Prompt for AI Agents

In crates/gatos-privacy/src/lib.rs around lines 37 to 41, the Algo enum currently has a single variant (Blake3) which is fine but needs explicit documentation for crypto-agility; add a doc comment above the enum explaining that this represents the hash algorithm for pointer digests, that BLAKE3 is the currently supported algorithm, and that additional algorithms (e.g., SHA3-256) may be added in the future to support migration, and optionally include a commented placeholder for future variants so intent is clear to readers and maintainers.

coderabbitai · 2025-11-10T18:10:17Z

docs/decisions/ADR-0004/DECISION.md

+        +string digest: "blake3:<hex>"            // plaintext digest
+        +string ciphertext_digest: "blake3:<hex>"  // MAY be present


⚠️ Potential issue | 🟠 Major

Ambiguity: "MAY be present" conflicts with conditional MUST requirement.

Line 96 states ciphertext_digest "MAY be present", but line 105 contradicts this with "the public pointer MUST include ciphertext_digest" for low-entropy privacy classes.

The correct phrasing should clarify that presence is conditional on privacy class:

-- **`ciphertext_digest`**: The content-address of the stored ciphertext (`blake3(ciphertext_bytes)`). For low‑entropy privacy classes (see Policy Hooks), the public pointer **MUST** include `ciphertext_digest` and policy **MUST NOT** expose the plaintext digest publicly. +- **`ciphertext_digest`**: The content-address of the stored ciphertext (`blake3(ciphertext_bytes)`). For low‑entropy privacy classes (see Policy Hooks), this field is **REQUIRED** and the public pointer **MUST NOT** expose the plaintext digest. For high-entropy data, this field is **OPTIONAL**.

Replace "MAY be present" with a conditional requirement tied to privacy class to eliminate ambiguity.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In docs/decisions/ADR-0004/DECISION.md around lines 95 to 96, the field description currently says `ciphertext_digest: "blake3:<hex>" // MAY be present`, which conflicts with the later statement that low-entropy privacy classes MUST include it; change this line to state the presence is conditional on privacy class (e.g., "MUST be present for low-entropy privacy classes; otherwise MAY be omitted") so the requirement is explicit and unambiguous.

coderabbitai · 2025-11-10T18:10:18Z

docs/decisions/ADR-0004/DECISION.md

+### 4. Pointer Resolution Protocol (Normative)
+
+Authentication semantics are aligned with HTTP. We adopt a simple, interoperable model (JWT default; HTTP Message Signatures optional):
+
+-  **Endpoint**: `POST /gatos/private/blobs/resolve`
+-  **Request Body (application/json; JCS canonical form)**:
+   `{ "digest": "blake3:<hex>", "want": "plaintext"|"ciphertext" }`
+-  **Authorization**: `Authorization: Bearer <JWT>`
+   - Claims MUST include: `sub` (ed25519:<pubkey>), `aud` (node id or URL), `method` ("POST"), `path` ("/gatos/private/blobs/resolve"), `exp`, and `nbf`.
+   - Clock skew tolerance: ±300 seconds.
+   - On missing/invalid token: `401 Unauthorized`. On policy denial: `403 Forbidden`.


⚠️ Potential issue | 🔴 Critical

Critical: Authentication mechanism underspecified for JWT validation.

Lines 154-157 specify JWT claims but don't clarify:

Signature algorithm: Which signing algorithm is required? (e.g., EdDSA with Ed25519 per the GATOS trust model?)

Public key discovery: How does the server obtain sub's public key to verify the JWT signature?

Token binding: How is the JWT cryptographically bound to the request body to prevent token substitution?

Without these details, implementations may be incompatible or insecure.

Add a normative subsection after line 157:

#### JWT Validation (Normative) Servers MUST validate JWTs as follows: 1. **Algorithm**: Only EdDSA (Ed25519) is supported. The `alg` header MUST be `"EdDSA"`. Reject tokens with other algorithms (including `none`). 2. **Public Key**: Resolve the signing public key from the `sub` claim via the trust graph. The `sub` MUST be of the form `ed25519:<pubkey>`. 3. **Signature**: Verify the JWT signature using the resolved public key. 4. **Claims**: - `digest` claim MUST match the `digest` field in the request body (cryptographic binding). - `method` and `path` MUST match the HTTP request method and path. - `exp` and `nbf` MUST be valid within ±300s clock skew. - `aud` MUST match the server's node identifier. 5. **Replay protection**: Servers SHOULD implement short-lived tokens (≤60s expiry) or maintain a `jti` nonce cache to prevent replay attacks.

🤖 Prompt for AI Agents

docs/decisions/ADR-0004/DECISION.md around lines 147 to 157: the JWT validation behavior is underspecified — add a normative "JWT Validation" subsection after line 157 that mandates algorithm, public-key discovery, signature verification, claim checks, cryptographic binding to the request body, clock skew rules, audience matching, and replay protection; specifically require EdDSA (Ed25519) only with `alg: "EdDSA"`, require `sub` be `ed25519:<pubkey>` and instruct servers to resolve the public key via the trust graph, verify the JWT signature with that key, require `digest` claim to match the request body `digest`, ensure `method` and `path` claims match the HTTP request, enforce `exp`/`nbf` with ±300s skew and `aud` matching the server node id, and recommend short-lived tokens (≤60s) or `jti` nonce caching for replay protection.

coderabbitai · 2025-11-10T18:10:19Z

schemas/v1/policy/governance_policy.schema.json

+            "type": "object",
+            "additionalProperties": false,
+            "properties": {
+              "min_entropy_bits": { "type": "integer", "minimum": 0 },


⚠️ Potential issue | 🟡 Minor

Consider: Minimum entropy of 0 bits allows degenerate case.

Line 60 allows min_entropy_bits: 0, which represents zero-entropy (completely deterministic/constant) data. Protecting zero-entropy data with pointer indirection provides no privacy benefit, as the plaintext digest alone uniquely identifies the value.

For meaningful privacy protection, consider "minimum": 1 to require at least 1 bit of entropy. Alternatively, if 0 is intentionally allowed for testing or edge cases, document this rationale in the schema description.

- "min_entropy_bits": { "type": "integer", "minimum": 0 }, + "min_entropy_bits": { "type": "integer", "minimum": 1 },

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"min_entropy_bits": { "type": "integer", "minimum": 0 },

"min_entropy_bits": { "type": "integer", "minimum": 1 },

🤖 Prompt for AI Agents

In schemas/v1/policy/governance_policy.schema.json around line 60, the property "min_entropy_bits" currently permits 0 which allows a degenerate zero-entropy case; change the schema to require a minimum of 1 (i.e., set "minimum": 1) to prevent zero-entropy values, or if 0 is intentionally allowed add a descriptive "description" explaining why 0 is permitted (e.g., for testing/edge cases) so the intent is explicit.

coderabbitai · 2025-11-10T18:10:19Z

schemas/v1/policy/governance_policy.schema.json

+        "rules": {
+          "type": "array",
+          "items": {
+            "type": "object",
+            "additionalProperties": false,
+            "properties": {
+              "select": { "type": "string" },
+              "action": { "type": "string", "enum": ["redact", "pointerize"] },
+              "class": { "type": "string" },
+              "capability": { "type": "string", "format": "uri" },
+              "location": { "type": "string", "format": "uri" }
+            },
+            "required": ["select", "action"]
+          }
+        }
+      }


⚠️ Potential issue | 🔴 Critical

Critical: Missing conditional validation for pointerize action.

The rules schema allows action: "pointerize" without requiring capability or location fields. According to ADR-0004 (lines 214-218), when action is "pointerize", both capability and location are REQUIRED to specify where and how to store the private blob.

Currently, line 78 only requires ["select", "action"], making capability/location universally optional.

Add conditional validation using if/then to enforce that pointerize actions include required fields:

"rules": { "type": "array", "items": { "type": "object", "additionalProperties": false, "properties": { "select": { "type": "string" }, "action": { "type": "string", "enum": ["redact", "pointerize"] }, "class": { "type": "string" }, "capability": { "type": "string", "format": "uri" }, "location": { "type": "string", "format": "uri" } }, - "required": ["select", "action"] + "required": ["select", "action"], + "if": { + "properties": { "action": { "const": "pointerize" } } + }, + "then": { + "required": ["select", "action", "capability", "location"] + } } }

This ensures schema validation catches invalid configurations at policy load time rather than runtime.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"rules": {

"type": "array",

"items": {

"type": "object",

"additionalProperties": false,

"properties": {

"select": { "type": "string" },

"action": { "type": "string", "enum": ["redact", "pointerize"] },

"class": { "type": "string" },

"capability": { "type": "string", "format": "uri" },

"location": { "type": "string", "format": "uri" }

},

"required": ["select", "action"]

}

}

}

"rules": {

"type": "array",

"items": {

"type": "object",

"additionalProperties": false,

"properties": {

"select": { "type": "string" },

"action": { "type": "string", "enum": ["redact", "pointerize"] },

"class": { "type": "string" },

"capability": { "type": "string", "format": "uri" },

"location": { "type": "string", "format": "uri" }

},

"required": ["select", "action"],

"if": {

"properties": { "action": { "const": "pointerize" } }

},

"then": {

"required": ["select", "action", "capability", "location"]

}

}

}

🤖 Prompt for AI Agents

In schemas/v1/policy/governance_policy.schema.json around lines 66 to 81, the rules item allows "action":"pointerize" without requiring capability or location; add a JSON Schema conditional (inside the items object) using "if": { "properties": { "action": { "const": "pointerize" } } } and a corresponding "then" that adds "required": ["capability", "location"] so pointerize rules must include those fields (keep existing required ["select","action"], additionalProperties: false, and existing formats intact).

coderabbitai · 2025-11-10T18:10:19Z

schemas/v1/privacy/opaque_pointer.schema.json

+  "required": ["kind","algo","location","capability"],
+  "anyOf": [
+    {
+      "required": ["digest"],
+      "properties": {
+        "digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" }
+      }
+    },
+    {
+      "required": ["ciphertext_digest"],
+      "properties": {
+        "ciphertext_digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" }
+      }
+    }
+  ],


⚠️ Potential issue | 🟠 Major

Documentation inconsistency: digest optionality.

The schema correctly allows either digest OR ciphertext_digest (or both) via the anyOf constraint, supporting ciphertext-only pointers for low-entropy data per ADR-0004 §2 (lines 104-105).

However, docs/SPEC.md line 326 states: "digest: The **REQUIRED** blake3 hash of the plaintext". This contradicts the schema, which makes digest optional when ciphertext_digest is present.

The ADR is correct: low-entropy privacy classes MUST use ciphertext_digest and MUST NOT expose plaintext digest. The schema aligns with this requirement, but SPEC.md needs correction.

The SPEC.md documentation should be updated to clarify that digest is conditionally required based on privacy class, not universally required. Consider revising SPEC.md line 326 to match the schema's anyOf semantics.

🤖 Prompt for AI Agents

In docs/SPEC.md around line 326, the doc currently states "digest: The REQUIRED blake3 hash of the plaintext" which contradicts the JSON schema that allows either digest OR ciphertext_digest; update this line to state that digest is conditionally required only for privacy classes that reveal plaintext (plaintext-available classes), and that low-entropy privacy classes MUST use ciphertext_digest and MUST NOT expose digest (refer ADR-0004 §2); rephrase to make clear the schema's anyOf semantics (either digest or ciphertext_digest must be present depending on privacy class) and link or cite ADR-0004 §2 for authority.

coderabbitai · 2025-11-10T18:10:19Z

schemas/v1/privacy/opaque_pointer.schema.json

+  "anyOf": [
+    {
+      "required": ["digest"],
+      "properties": {
+        "digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" }
+      }
+    },
+    {
+      "required": ["ciphertext_digest"],
+      "properties": {
+        "ciphertext_digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" }
+      }
+    }
+  ],


🛠️ Refactor suggestion | 🟠 Major

Simplify anyOf: remove redundant property definitions.

The anyOf branches (lines 18-22 and 24-28) redefine properties that are already declared in the root properties section (lines 9-10). This is redundant and violates the DRY principle.

The anyOf should only specify which fields are required, not redefine their schema:

"anyOf": [ { - "required": ["digest"], - "properties": { - "digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" } - } + "required": ["digest"] }, { - "required": ["ciphertext_digest"], - "properties": { - "ciphertext_digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" } - } + "required": ["ciphertext_digest"] } ],

This makes the schema more maintainable: if you need to update the pattern, you only change it in one place (the root properties).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"anyOf": [

{

"required": ["digest"],

"properties": {

"digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" }

}

},

{

"required": ["ciphertext_digest"],

"properties": {

"ciphertext_digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" }

}

}

],

"anyOf": [

{

"required": ["digest"]

},

{

"required": ["ciphertext_digest"]

}

],

🤖 Prompt for AI Agents

In schemas/v1/privacy/opaque_pointer.schema.json around lines 17 to 30, the anyOf branches redundantly re-declare the "digest" and "ciphertext_digest" property schemas that are already defined in the root properties; remove the duplicate property definitions inside each anyOf branch and leave only the required arrays (e.g., {"required":["digest"]} and {"required":["ciphertext_digest"]}) so the schema references the root property definitions for type/pattern and adheres to DRY.

coderabbitai · 2025-11-10T18:10:20Z

scripts/validate_schemas.sh

  "schemas/v1/governance/revocation.schema.json"
  "schemas/v1/governance/proof_of_consensus_envelope.schema.json"
  "schemas/v1/policy/governance_policy.schema.json"
+  "schemas/v1/privacy/opaque_pointer.schema.json"


🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

Verify whether opaque_pointer.schema.json requires common/ids.schema.json reference.

The script adds schemas/v1/privacy/opaque_pointer.schema.json to the SCHEMAS array, which will be compiled with -r "$AJV_COMMON_REF" (line 29). This assumes the opaque pointer schema references definitions from schemas/v1/common/ids.schema.json.

Review the opaque pointer schema to confirm whether it actually uses any $ref to ids.schema.json definitions. If it doesn't reference the common ids schema, the -r flag is unnecessary (though harmless).

🏁 Script executed:

#!/bin/bash # Check if opaque_pointer.schema.json references the ids schema echo "Checking for references to ids.schema.json in opaque_pointer.schema.json:" grep -n '\$ref.*ids\.schema\.json' schemas/v1/privacy/opaque_pointer.schema.json || echo "No references found"

Length of output: 93

Remove opaque_pointer.schema.json from AJV common reference compilation or add to exclusion conditional.

The verification confirms opaque_pointer.schema.json contains no $ref to ids.schema.json. The -r flag applied at line 29 is unnecessary—update line 26 to exclude this schema alongside governance_policy.schema.json:

if [[ "$schema" == "schemas/v1/common/ids.schema.json" ]] || [[ "$schema" == "schemas/v1/governance/governance_policy.schema.json" ]] || [[ "$schema" == "schemas/v1/privacy/opaque_pointer.schema.json" ]]; then

This eliminates wasteful compilation overhead.

🤖 Prompt for AI Agents

In scripts/validate_schemas.sh around line 21 (and the conditional at line 26), the privacy/opaque_pointer.schema.json is being needlessly included in the AJV common reference compilation; update the exclusion conditional to also match "schemas/v1/privacy/opaque_pointer.schema.json" so it is skipped (i.e., add an OR check for that path alongside the existing ids.schema.json and governance_policy.schema.json checks) to avoid the redundant -r compilation for that file.

flyingrobots added 25 commits November 10, 2025 05:43

docs: Clarify commit trailer format in F9-US-SEC acceptance criteria

46ec339

Updated the acceptance criteria for F9-US-SEC in FEATURES.md to explicitly mention the and commit trailers, improving clarity for developers.

docs: Clarify actor-id format in SPEC.md

676b6fa

docs: Clarify size unit for OpaquePointer in SPEC.md

7c35665

docs: Clarify error handling for digest mismatch in SPEC.md Pointer R…

fce8bb1

…esolution

docs: Clarify PrivateStore participant as interface in TECH-SPEC.md d…

3d6bcbd

…iagram

docs: Add details for Projection Determinism test suite in TECH-SPEC.md

935ee36

docs: Update F9-US-DEV user story to BDD format in FEATURES.md

f643aea

docs: Mark gatos-compute as planned in SPEC.md system diagram

b39373c

docs: Change PoC envelope storage requirement from SHOULD to MUST in …

412b9b9

…SPEC.md

docs: Clarify purpose of gatos-kv crate in TECH-SPEC.md

c63785e

docs: Mark chart data as illustrative in TECH-SPEC.md Performance Gui…

1ca5ec6

…dance

docs: Refine F9-US-DEV acceptance criteria to be more granular

b5235c4

docs: Simplify PrivateStore participant name in TECH-SPEC.md diagram

8805e1b

docs: Remove redundant acceptance criteria for F9-US-DEV

cc8d49e

docs: Add missing field description for digest in OpaquePointer

f1e6f8b

docs: Clarify interaction of expiration dates in Consensus Governance

8cc6091

docs: Specify HTTP GET method for pointer resolution endpoint

f8de6f2

docs: Correct type casing in ADR-0004 OpaquePointer diagram

e00b578

docs: Further refine F9-US-DEV acceptance criteria with BDD-style gra…

e48b7f9

…nularity

docs: Remove redundant acceptance criteria for F9-US-DEV in FEATURES.md

b8b9504

docs: Clarify sorting order for approvals in SPEC.md PoC section

5dca809

docs: Correct inconsistent endpoint URL in ADR-0004

a7b1a21

flyingrobots changed the title ~~decisions/ADR 0004~~ GATOS Privacy: ADR‑0004 + SPEC/TECH‑SPEC + Pointer Schema + gatos‑privacy crate Nov 10, 2025

chatgpt-codex-connector bot reviewed Nov 10, 2025

View reviewed changes

flyingrobots mentioned this pull request Nov 10, 2025

Schema: digest vs ciphertext_digest (oneOf/anyOf) #53

Open

coderabbitai bot requested changes Nov 10, 2025

View reviewed changes

		+string digest: "blake3:<hex>" // plaintext digest
		+string ciphertext_digest: "blake3:<hex>" // MAY be present

	"min_entropy_bits": { "type": "integer", "minimum": 0 },
	"min_entropy_bits": { "type": "integer", "minimum": 1 },

GATOS Privacy: ADR‑0004 + SPEC/TECH‑SPEC + Pointer Schema + gatos‑privacy crate #47

Are you sure you want to change the base?

GATOS Privacy: ADR‑0004 + SPEC/TECH‑SPEC + Pointer Schema + gatos‑privacy crate #47

Conversation

flyingrobots commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

flyingrobots commented Nov 10, 2025 •

edited

Loading

coderabbitai bot commented Nov 10, 2025 •

edited

Loading