Skip to content

Conversation

@flyingrobots
Copy link
Owner

@flyingrobots flyingrobots commented Nov 10, 2025

This PR formalizes the hybrid privacy model (ADR‑0004) and wires the supporting schema/tooling.

Highlights

  • POST + Bearer JWT resolver (optional HTTP Message Signatures profile).
  • Opaque Pointer schema (draft 2020‑12) with ciphertext_digest and extensions.
  • RFC 8785 JCS canonicalization (normative) + non‑JSON key ordering.
  • Policy schema extended with privacy.classes and privacy.rules (optional), with examples.
  • Makefile and scripts compile/validate schemas including privacy.
  • New gatos-privacy crate (typed OpaquePointer envelope).

Notable details

  • Low‑entropy privacy class gates public plaintext digests; requires ciphertext_digest in public pointers.
  • Resolver response digest headers reflect the response body (plaintext or ciphertext).
  • Error taxonomy harmonized: DigestMismatch is 422.
  • Namespacing clarified: private overlay indices in refs, blobs in external stores.
  • Pointer rotation flow documented with Privacy-Pointer-Rotations trailer.

Follow‑ups to be tracked as issues

  1. Projection Determinism golden vectors (JCS across OS/arch)
  2. Low‑entropy PII torture vector (ciphertext‑only exposure)
  3. Pointer rotation example + CI trailer gate
  4. capability_ref indirection example
  5. Governance PoC canonical storage harmonization (ADR‑0003 and SPEC now aligned to MUST)

Receipts

Ready for review.

This commit addresses 8 specific issues raised during review of the initial ADR-0004 documentation. The changes improve clarity, consistency, and technical precision across the specification.

Fixes include:
- **FEATURES.md**:
  - Rewrote acceptance criteria for F9-US-DEV to be specific and verifiable.
  - Removed the superseded F6 feature to avoid confusion with F9.

- **SPEC.md**:
  - Corrected Mermaid diagram type casing ( -> ) to align with JSON standards.
  - Clarified that private blob decryption during pointer resolution is conditional, not assumed.

- **TECH-SPEC.md**:
  - Removed redundant participant aliases in the projection sequence diagram.
  - Replaced the unconventional  endpoint with a more RESTful  URL.
  - Improved diagram precision by changing 'node' to 'field path'.
  - Specified JWS with a detached payload as the required authentication mechanism for pointer resolution.
Updated the acceptance criteria for F9-US-SEC in FEATURES.md to explicitly mention the  and  commit trailers, improving clarity for developers.
…ional HTTP Message Signatures profile; tighten determinism (RFC 8785 JCS + key ordering); clarify namespaces and error taxonomy; add pointer rotation trailer\nschema(privacy): add ciphertext_digest + extensions; bump to draft 2020-12\nschema(policy): add optional privacy.classes + rules\nexamples: add privacy_min and updated opaque_pointer example\nscripts: validate privacy schema/examples in CI\nmake: fix tab indentation for lint/fix targets\nchore: add gatos-privacy crate with OpaquePointer type and notes
…onse digest headers reflect response body\nschema(privacy): allow ciphertext-only pointers via anyOf; keep kind/algo/location/capability required\ndocs(ADR-0004): fix colon in class diagram;\ndocs(ADR-0003): PoC storage level to MUST
@flyingrobots flyingrobots changed the title decisions/ADR 0004 GATOS Privacy: ADR‑0004 + SPEC/TECH‑SPEC + Pointer Schema + gatos‑privacy crate Nov 10, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 10, 2025

Summary by CodeRabbit

  • New Features

    • Added privacy and opaque pointer support for securing sensitive data.
    • Hybrid privacy model enables separation of public and private state with configurable policies.
    • Policy-driven redaction and pointerization of protected content.
    • Opaque pointer resolution protocol for accessing encrypted private data.
  • Documentation

    • Added comprehensive privacy architecture and implementation guides.
    • New use case documentation for verifiable, compliant PII management.

Walkthrough

Introduces a hybrid privacy model to GATOS comprising: a new gatos-privacy crate defining OpaquePointer types, schema definitions for privacy policies and opaque pointers, normative ADR-0004 formalizing privacy projection and pointer resolution mechanics, extended specification and implementation docs, build tooling updates for schema validation, and minimal example configurations.

Changes

Cohort / File(s) Summary
Privacy Crate Foundation
Cargo.toml, crates/gatos-privacy/Cargo.toml, crates/gatos-privacy/src/lib.rs
Workspace member registration and new Rust crate with OpaquePointer struct (kind, algo, digest, ciphertext_digest, size, location, capability, extensions) and supporting enums Kind (OpaquePointer), Algo (Blake3); derives serde annotations for JSON serialization with deny_unknown_fields and optional field handling.
Schema & Validation Infrastructure
schemas/v1/privacy/opaque_pointer.schema.json, schemas/v1/policy/governance_policy.schema.json, scripts/validate_schemas.sh, Makefile
Adds JSON schema for opaque pointers (with anyOf conditional requirements for digest/ciphertext_digest), expands governance policy schema with privacy classes/rules object, extends schema compilation/validation targets, and updates validation script to include privacy schema example checks.
Example Artifacts
examples/v1/privacy/opaque_pointer_min.json, examples/v1/policy/privacy_min.json
Minimal opaque pointer instance (blake3 hashes, location/capability URIs) and minimal privacy policy (pii_low_entropy class, pointerize rule with capability/location).
Specification & Narrative
docs/SPEC.md, docs/TECH-SPEC.md, docs/FEATURES.md, docs/USE-CASES.md
Expanded SPEC with privacy/opaque pointer section, projection model, and resolution protocol; TECH-SPEC extended with privacy projection/resolution flows, component responsibilities, and determinism assertions; FEATURES replaced F6 with F9 (Hybrid Privacy Model); USE-CASES added PII management use case.
Decision & Architecture Records
docs/decisions/ADR-0004/DECISION.md, docs/decisions/ADR-0003/DECISION.md, docs/decisions/README.md
New ADR-0004 formalizing hybrid privacy model (Projection Functor, opaque pointers, resolution protocol, policy hooks, trailers, namespacing); ADR-0003 refined PoC envelope approval ordering to lexicographic sort; README updated with ADR-0004 entry.

Sequence Diagram(s)

sequenceDiagram
    participant Echo as gatos-echo<br/>(State Engine)
    participant Policy as gatos-policy<br/>(Policy Service)
    participant PrivateStore as PrivateStore<br/>(Backend)
    participant Ledger as gatos-ledger<br/>(Refs)
    
    Note over Echo,Ledger: Privacy Projection Flow
    Echo->>Echo: Fold history → UnifiedState
    Echo->>Policy: Fetch privacy policy (policy.yaml)
    Policy-->>Echo: Privacy config (classes, rules)
    Echo->>Echo: Apply redact/pointerize<br/>Project UnifiedState → PublicState + PrivateBlobs
    Echo->>PrivateStore: Persist PrivateBlobs (encrypted)
    PrivateStore-->>Echo: OK (storage refs)
    Echo->>Ledger: Commit PublicState (with opaque pointers)<br/>+ Trailers (Privacy-Redactions, Privacy-Pointers)
    Ledger-->>Echo: Refs updated
Loading
sequenceDiagram
    participant Client
    participant gatosd as gatosd<br/>(Daemon)
    participant Ledger as gatos-ledger
    participant PrivateStore
    
    Note over Client,PrivateStore: Opaque Pointer Resolution Flow
    Client->>gatosd: POST /gatos/private/blobs/resolve<br/>(pointer, JWT auth)
    gatosd->>Ledger: Fetch pointer metadata<br/>+ access policy
    Ledger-->>gatosd: Pointer + policy
    gatosd->>gatosd: Verify JWT (auth)<br/>Evaluate policy
    alt Authorization Success
        gatosd->>PrivateStore: Retrieve encrypted blob<br/>(using location/capability)
        PrivateStore-->>gatosd: Encrypted blob
        gatosd->>gatosd: Decrypt blob (capability key)<br/>Verify digest & ciphertext_digest
        gatosd-->>Client: Decrypted blob (HTTP 200)
    else Authorization Denied
        gatosd-->>Client: Error (Unauthorized/Forbidden)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • ADR-0004 normative specifications — Dense formalization of projection functor, pointer resolution protocol, policy hooks, and error taxonomy; requires careful validation of correctness against spec intent and RFC references (RFC 8785 JCS, RFC 9421).
  • OpaquePointer struct design — Field semantics (digest vs ciphertext_digest conditional logic, size/extensions optionality, URI format for location/capability) must align with schema constraints and resolution protocol expectations.
  • Schema conditional logic — anyOf requirements for digest/ciphertext_digest in opaque_pointer.schema.json need verification; governance policy schema privacy section structure must be compatible with policy evaluation rules.
  • Projection determinism claims — TECH-SPEC asserts byte-for-byte determinism across platforms; this requires critical review of canonicalization rules, key ordering, and whether all sources of non-determinism are accounted for.
  • Integration points — Verify component responsibilities (Echo projection, Policy rule evaluation, PrivateStore persistence, gatosd resolution) are correctly sequenced and error handling is complete.

Possibly related PRs

Poem

🔐 A private whisper wrapped in blake3's embrace,
Opaque pointers hide truth in a cryptic space,
Where policies govern what the world may see—
Public projection keeps secrets locked and free! 🗝️✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title precisely captures the core changes: ADR-0004 formalization, SPEC/TECH-SPEC updates, privacy schema introduction, and the new gatos-privacy crate. It is specific, directly related to the main changeset, and clearly summarizes the primary contribution.
Description check ✅ Passed The description is comprehensive and directly related to all aspects of the changeset, detailing specific implementation choices (POST + JWT, RFC 8785 JCS, opaque pointer schema, policy extensions), architectural considerations, and follow-up items. It effectively conveys the purpose and scope of the changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch decisions/ADR-0004

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +17 to +22
pub struct OpaquePointer {
pub kind: Kind,
pub algo: Algo,
pub digest: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub ciphertext_digest: Option<String>,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Allow ciphertext-only pointers without digest

The new JSON schema explicitly permits opaque pointers that omit digest when ciphertext_digest is provided, but the Rust struct still models digest as a mandatory String. Any ciphertext-only pointer produced according to the schema will fail deserialization because serde expects the digest field. Either digest should be optional (Option<String>) or the schema should require it so that the type and schema remain aligned.

Useful? React with 👍 / 👎.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 38be8f1 and 4bbbccb.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (16)
  • Cargo.toml (1 hunks)
  • Makefile (3 hunks)
  • crates/gatos-privacy/Cargo.toml (1 hunks)
  • crates/gatos-privacy/src/lib.rs (1 hunks)
  • docs/FEATURES.md (1 hunks)
  • docs/SPEC.md (6 hunks)
  • docs/TECH-SPEC.md (4 hunks)
  • docs/USE-CASES.md (1 hunks)
  • docs/decisions/ADR-0003/DECISION.md (1 hunks)
  • docs/decisions/ADR-0004/DECISION.md (1 hunks)
  • docs/decisions/README.md (1 hunks)
  • examples/v1/policy/privacy_min.json (1 hunks)
  • examples/v1/privacy/opaque_pointer_min.json (1 hunks)
  • schemas/v1/policy/governance_policy.schema.json (1 hunks)
  • schemas/v1/privacy/opaque_pointer.schema.json (1 hunks)
  • scripts/validate_schemas.sh (3 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/decisions/ADR-0004/DECISION.md

[style] ~123-~123: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...cy Policy** (.gatos/policy.yaml). 3. It traverses the UnifiedState tree, appl...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~129-~129: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...MUST be canonicalized with RFC 8785 JCS prior to hashing. - When non‑JSON maps are mater...

(EN_WORDINESS_PREMIUM_PRIOR_TO)

docs/TECH-SPEC.md

[grammar] ~222-~222: Ensure spelling is correct
Context: ...on that: (1) fetches; (2) decrypts; (3) re‑encrypts; (4) stores; (5) emits an audit event up...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: audit
🔇 Additional comments (19)
Cargo.toml (1)

10-10: LGTM! Clean addition of gatos-privacy to workspace members with proper ordering.

crates/gatos-privacy/Cargo.toml (1)

6-12: LGTM! Minimal, focused dependency set aligned with OpaquePointer serialization and crypto needs. Workspace coordination is clean.

docs/decisions/README.md (1)

23-23: LGTM! ADR-0004 entry follows established format and is correctly positioned in the decision log.

examples/v1/privacy/opaque_pointer_min.json (1)

1-9: LGTM! Minimal example correctly demonstrates all required OpaquePointer fields with appropriate placeholder digest values and URI formats.

docs/FEATURES.md (1)

198-213: F9-US-SEC user story is well-structured.

Acceptance criteria are specific and measurable (capability validation, digest verification, trailer accuracy). Test plan covers golden/edge/failure scenarios appropriately. ADR-0004 cross-reference adds context.

docs/USE-CASES.md (1)

89-95: LGTM! Use case 9 is well-articulated, cohesive with ADR-0004 concepts, and maintains format consistency with existing use cases 1-8. Clearly communicates the value of capability-gated, auditable PII handling.

docs/decisions/ADR-0003/DECISION.md (1)

89-92: Excellent normative clarifications on PoC envelope determinism.

Line 89's explicit "lexicographically sorted list of approvals by the lowercase ASCII of each approval's Signer value" pins down sorting order (critical for BLAKE3 reproducibility). Line 92's storage requirement (persisting canonical JSON blob under refs/gatos/audit/proofs/governance/<proposal-id>) provides auditability and enables offline verification. Both reinforce RFC 8785 JCS determinism goals and integrate well with ADR-0004's privacy/determinism narrative.

Makefile (2)

47-48: Schema-compile correctly extended for privacy schema.

Privacy schema compilation follows established pattern (draft2020, strict, ajv-formats, ids reference). Positioning before governance_policy compilation is logical. No missing dependencies or references.


61-62: Schema-validate correctly adds privacy example validation.

Validates opaque_pointer_min.json against opaque_pointer.schema.json with proper ids reference. Syntax mirrors existing validations (governance, job) and integrates cleanly into error-halting chain.

scripts/validate_schemas.sh (1)

57-58: LGTM: Privacy example validation correctly extends the validation chain.

The addition properly validates privacy_min.json against the governance policy schema, confirming that the governance schema's new privacy section is exercised. This creates a clean validation chain:

  1. Standard governance example (line 56)
  2. Privacy-focused governance example (lines 57-58)

Both validate against the same schema but exercise different schema surfaces.

docs/TECH-SPEC.md (3)

104-108: LGTM: Crate responsibilities updated to reflect privacy capabilities.

The crate summary accurately reflects the new privacy-related responsibilities:

  • gatos-echo: Privacy projection logic for folding UnifiedState into PublicState
  • gatos-policy: Privacy rule evaluation
  • gatosd: Opaque Pointer resolution endpoint

These align with the architectural flow described in ADR-0004.


284-288: LGTM: Projection Determinism test suite properly integrated.

The addition of a "Projection Determinism" test suite (line 284) with clear requirements (line 287-288) ensures that privacy projections produce byte-for-byte identical results across platforms. This is essential for verifiable privacy guarantees.

The test coverage aligns with the determinism requirements specified in ADR-0004 (lines 128-130).


173-192: Fix diagram participant label for naming consistency; verify PrivateStore trait implementation status.

The diagram lists "StorageBackend (Interface)" as a participant (line 178) but references PrivateStore in the sequence flow (line 187) and subsequent documentation (line 194). Either rename the participant to PrivateStore or rename all references to StorageBackend for consistency.

Additionally, PrivateStore is described as "a pluggable trait" (line 194) but no trait/interface definition exists in the codebase—verify whether this is aspirational design documentation or if the implementation is planned for a separate branch/crate. If planned, document the implementation status.

crates/gatos-privacy/src/lib.rs (1)

1-11: Excellent documentation and architectural notes.

The module-level documentation clearly explains the crate's purpose, references the normative ADR-0004, and provides critical guidance about JCS canonicalization requirements without introducing unnecessary dependencies.

The explicit note about delegating JCS implementation to higher layers (lines 7-10) is a good design choice for keeping the crate lean.

docs/decisions/ADR-0004/DECISION.md (2)

1-11: LGTM: ADR metadata complete and well-structured.

The front matter includes all required fields: Status, Date, ADR number, Authors, dependency relationships (Requires/Related), Tags, and Schema references. The Accepted status and date (2025-11-10) align with the PR timeline.


277-286: LGTM: Error taxonomy is complete and well-defined.

The normative error taxonomy (lines 277-286) covers all expected failure modes with appropriate HTTP status codes:

  • Authentication failures (401)
  • Authorization failures (403)
  • Missing resources (404)
  • Integrity failures (422 DigestMismatch)
  • Availability failures (503 CapabilityUnavailable)

The error codes align with HTTP semantics and provide clear guidance for implementers.

docs/SPEC.md (3)

74-74: Good: Explicitly marking compute crate as planned.

Adding "(planned)" to gatos-compute (line 74) sets clear expectations that this component is not yet implemented. This prevents confusion for early implementers or integrators.


682-683: Excellent clarification on approval expiration semantics.

The comment on line 682 clearly explains that an approval's Expires-At cannot extend the proposal's expiration—it can only further restrict validity. This prevents confusion about expiration precedence and potential security issues.


698-701: Good: PoC envelope storage made normative with MUST.

Line 701 correctly elevates PoC envelope storage from a recommendation to a requirement by changing "should" to "MUST". This ensures auditability and verifiability of consensus decisions.

The lexicographic ordering by signer (line 698) ensures deterministic PoC computation, which is critical for consensus verification.

Comment on lines +17 to +29
pub struct OpaquePointer {
pub kind: Kind,
pub algo: Algo,
pub digest: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub ciphertext_digest: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub size: Option<u64>,
pub location: String,
pub capability: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub extensions: Option<Value>,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CRITICAL: Rust struct violates schema's anyOf requirement.

Line 20 declares digest as a non-optional String, making it always required. However, schemas/v1/privacy/opaque_pointer.schema.json (lines 17-30) uses anyOf to require EITHER digest OR ciphertext_digest, allowing ciphertext-only pointers for low-entropy data per ADR-0004.

This mismatch means:

  1. Deserialization will fail for valid ciphertext-only pointers
  2. The Rust type cannot represent low-entropy privacy classes that MUST NOT expose plaintext digest

Both digest and ciphertext_digest must be Option<String>, with validation logic ensuring at least one is Some.

 #[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
 #[serde(deny_unknown_fields)]
 pub struct OpaquePointer {
     pub kind: Kind,
     pub algo: Algo,
-    pub digest: String,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub digest: Option<String>,
     #[serde(skip_serializing_if = "Option::is_none")]
     pub ciphertext_digest: Option<String>,
     #[serde(skip_serializing_if = "Option::is_none")]
     pub size: Option<u64>,
     pub location: String,
     pub capability: String,
     #[serde(skip_serializing_if = "Option::is_none")]
     pub extensions: Option<Value>,
 }

Then add a validation method:

impl OpaquePointer {
    pub fn validate(&self) -> Result<(), &'static str> {
        if self.digest.is_none() && self.ciphertext_digest.is_none() {
            return Err("At least one of digest or ciphertext_digest must be present");
        }
        Ok(())
    }
}

Comment on lines +31 to +35
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum Kind {
OpaquePointer,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider: Single-variant enum may be premature.

The Kind enum has only one variant (OpaquePointer). Unless additional pointer kinds are planned in the near future, this could be simplified to a constant string or removed entirely, with the field being a simple &'static str or validated string.

However, if the enum is intentional for future extensibility, consider adding a doc comment explaining the planned variants.

If future kinds are planned:

/// Pointer envelope kinds.
/// 
/// Future variants may include `TransparentPointer`, `NullPointer`, etc.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum Kind {
    OpaquePointer,
    // Future: TransparentPointer, NullPointer, ...
}

If not:

// Replace enum with constant
pub const KIND: &str = "opaque_pointer";

// In struct:
pub struct OpaquePointer {
    #[serde(default = "default_kind")]
    pub kind: String,
    // ...
}

fn default_kind() -> String {
    KIND.to_string()
}
🤖 Prompt for AI Agents
In crates/gatos-privacy/src/lib.rs around lines 31 to 35, the Kind enum only has
a single variant (OpaquePointer) which is premature; if you intend to add more
pointer kinds later, keep the enum and add a doc comment above it describing
planned future variants (e.g., TransparentPointer, NullPointer) so intent is
clear; otherwise remove the enum and replace usages with a constant string
(e.g., pub const KIND: &str = "opaque_pointer") and change the struct field to a
String or &'static str with a serde default helper (add a default_kind() that
returns KIND.to_string() and annotate the field with #[serde(default =
"default_kind")]) so serialization and defaults behave the same.

Comment on lines +37 to +41
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "lowercase")]
pub enum Algo {
Blake3,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Same consideration for Algo enum: future-proof or simplify.

The Algo enum has a single variant (Blake3). Similar to Kind, if no other algorithms are planned, consider whether an enum is necessary. However, cryptographic agility is often important, so the enum may be justified for future hash function migration.

If the enum is intentional for crypto-agility, add documentation explaining the design choice.

/// Hash algorithm for pointer digests.
/// 
/// Currently only BLAKE3 is supported. Future variants may include
/// SHA3-256 or other cryptographic hash functions as the ecosystem evolves.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "lowercase")]
pub enum Algo {
    Blake3,
    // Future: Sha3, ...
}
🤖 Prompt for AI Agents
In crates/gatos-privacy/src/lib.rs around lines 37 to 41, the Algo enum
currently has a single variant (Blake3) which is fine but needs explicit
documentation for crypto-agility; add a doc comment above the enum explaining
that this represents the hash algorithm for pointer digests, that BLAKE3 is the
currently supported algorithm, and that additional algorithms (e.g., SHA3-256)
may be added in the future to support migration, and optionally include a
commented placeholder for future variants so intent is clear to readers and
maintainers.

Comment on lines +95 to +96
+string digest: "blake3:<hex>" // plaintext digest
+string ciphertext_digest: "blake3:<hex>" // MAY be present
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Ambiguity: "MAY be present" conflicts with conditional MUST requirement.

Line 96 states ciphertext_digest "MAY be present", but line 105 contradicts this with "the public pointer MUST include ciphertext_digest" for low-entropy privacy classes.

The correct phrasing should clarify that presence is conditional on privacy class:

--   **`ciphertext_digest`**: The content-address of the stored ciphertext (`blake3(ciphertext_bytes)`). For low‑entropy privacy classes (see Policy Hooks), the public pointer **MUST** include `ciphertext_digest` and policy **MUST NOT** expose the plaintext digest publicly.
+-   **`ciphertext_digest`**: The content-address of the stored ciphertext (`blake3(ciphertext_bytes)`). For low‑entropy privacy classes (see Policy Hooks), this field is **REQUIRED** and the public pointer **MUST NOT** expose the plaintext digest. For high-entropy data, this field is **OPTIONAL**.

Replace "MAY be present" with a conditional requirement tied to privacy class to eliminate ambiguity.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In docs/decisions/ADR-0004/DECISION.md around lines 95 to 96, the field
description currently says `ciphertext_digest: "blake3:<hex>"  // MAY be
present`, which conflicts with the later statement that low-entropy privacy
classes MUST include it; change this line to state the presence is conditional
on privacy class (e.g., "MUST be present for low-entropy privacy classes;
otherwise MAY be omitted") so the requirement is explicit and unambiguous.

Comment on lines +147 to +157
### 4. Pointer Resolution Protocol (Normative)

Authentication semantics are aligned with HTTP. We adopt a simple, interoperable model (JWT default; HTTP Message Signatures optional):

- **Endpoint**: `POST /gatos/private/blobs/resolve`
- **Request Body (application/json; JCS canonical form)**:
`{ "digest": "blake3:<hex>", "want": "plaintext"|"ciphertext" }`
- **Authorization**: `Authorization: Bearer <JWT>`
- Claims MUST include: `sub` (ed25519:<pubkey>), `aud` (node id or URL), `method` ("POST"), `path` ("/gatos/private/blobs/resolve"), `exp`, and `nbf`.
- Clock skew tolerance: ±300 seconds.
- On missing/invalid token: `401 Unauthorized`. On policy denial: `403 Forbidden`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Authentication mechanism underspecified for JWT validation.

Lines 154-157 specify JWT claims but don't clarify:

  1. Signature algorithm: Which signing algorithm is required? (e.g., EdDSA with Ed25519 per the GATOS trust model?)
  2. Public key discovery: How does the server obtain sub's public key to verify the JWT signature?
  3. Token binding: How is the JWT cryptographically bound to the request body to prevent token substitution?

Without these details, implementations may be incompatible or insecure.

Add a normative subsection after line 157:

#### JWT Validation (Normative)

Servers MUST validate JWTs as follows:

1. **Algorithm**: Only EdDSA (Ed25519) is supported. The `alg` header MUST be `"EdDSA"`. Reject tokens with other algorithms (including `none`).
2. **Public Key**: Resolve the signing public key from the `sub` claim via the trust graph. The `sub` MUST be of the form `ed25519:<pubkey>`.
3. **Signature**: Verify the JWT signature using the resolved public key.
4. **Claims**:
   - `digest` claim MUST match the `digest` field in the request body (cryptographic binding).
   - `method` and `path` MUST match the HTTP request method and path.
   - `exp` and `nbf` MUST be valid within ±300s clock skew.
   - `aud` MUST match the server's node identifier.
5. **Replay protection**: Servers SHOULD implement short-lived tokens (≤60s expiry) or maintain a `jti` nonce cache to prevent replay attacks.
🤖 Prompt for AI Agents
docs/decisions/ADR-0004/DECISION.md around lines 147 to 157: the JWT validation
behavior is underspecified — add a normative "JWT Validation" subsection after
line 157 that mandates algorithm, public-key discovery, signature verification,
claim checks, cryptographic binding to the request body, clock skew rules,
audience matching, and replay protection; specifically require EdDSA (Ed25519)
only with `alg: "EdDSA"`, require `sub` be `ed25519:<pubkey>` and instruct
servers to resolve the public key via the trust graph, verify the JWT signature
with that key, require `digest` claim to match the request body `digest`, ensure
`method` and `path` claims match the HTTP request, enforce `exp`/`nbf` with
±300s skew and `aud` matching the server node id, and recommend short-lived
tokens (≤60s) or `jti` nonce caching for replay protection.

"type": "object",
"additionalProperties": false,
"properties": {
"min_entropy_bits": { "type": "integer", "minimum": 0 },
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Consider: Minimum entropy of 0 bits allows degenerate case.

Line 60 allows min_entropy_bits: 0, which represents zero-entropy (completely deterministic/constant) data. Protecting zero-entropy data with pointer indirection provides no privacy benefit, as the plaintext digest alone uniquely identifies the value.

For meaningful privacy protection, consider "minimum": 1 to require at least 1 bit of entropy. Alternatively, if 0 is intentionally allowed for testing or edge cases, document this rationale in the schema description.

-              "min_entropy_bits": { "type": "integer", "minimum": 0 },
+              "min_entropy_bits": { "type": "integer", "minimum": 1 },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"min_entropy_bits": { "type": "integer", "minimum": 0 },
"min_entropy_bits": { "type": "integer", "minimum": 1 },
🤖 Prompt for AI Agents
In schemas/v1/policy/governance_policy.schema.json around line 60, the property
"min_entropy_bits" currently permits 0 which allows a degenerate zero-entropy
case; change the schema to require a minimum of 1 (i.e., set "minimum": 1) to
prevent zero-entropy values, or if 0 is intentionally allowed add a descriptive
"description" explaining why 0 is permitted (e.g., for testing/edge cases) so
the intent is explicit.

Comment on lines +66 to +81
"rules": {
"type": "array",
"items": {
"type": "object",
"additionalProperties": false,
"properties": {
"select": { "type": "string" },
"action": { "type": "string", "enum": ["redact", "pointerize"] },
"class": { "type": "string" },
"capability": { "type": "string", "format": "uri" },
"location": { "type": "string", "format": "uri" }
},
"required": ["select", "action"]
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Missing conditional validation for pointerize action.

The rules schema allows action: "pointerize" without requiring capability or location fields. According to ADR-0004 (lines 214-218), when action is "pointerize", both capability and location are REQUIRED to specify where and how to store the private blob.

Currently, line 78 only requires ["select", "action"], making capability/location universally optional.

Add conditional validation using if/then to enforce that pointerize actions include required fields:

         "rules": {
           "type": "array",
           "items": {
             "type": "object",
             "additionalProperties": false,
             "properties": {
               "select": { "type": "string" },
               "action": { "type": "string", "enum": ["redact", "pointerize"] },
               "class": { "type": "string" },
               "capability": { "type": "string", "format": "uri" },
               "location": { "type": "string", "format": "uri" }
             },
-            "required": ["select", "action"]
+            "required": ["select", "action"],
+            "if": {
+              "properties": { "action": { "const": "pointerize" } }
+            },
+            "then": {
+              "required": ["select", "action", "capability", "location"]
+            }
           }
         }

This ensures schema validation catches invalid configurations at policy load time rather than runtime.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"rules": {
"type": "array",
"items": {
"type": "object",
"additionalProperties": false,
"properties": {
"select": { "type": "string" },
"action": { "type": "string", "enum": ["redact", "pointerize"] },
"class": { "type": "string" },
"capability": { "type": "string", "format": "uri" },
"location": { "type": "string", "format": "uri" }
},
"required": ["select", "action"]
}
}
}
"rules": {
"type": "array",
"items": {
"type": "object",
"additionalProperties": false,
"properties": {
"select": { "type": "string" },
"action": { "type": "string", "enum": ["redact", "pointerize"] },
"class": { "type": "string" },
"capability": { "type": "string", "format": "uri" },
"location": { "type": "string", "format": "uri" }
},
"required": ["select", "action"],
"if": {
"properties": { "action": { "const": "pointerize" } }
},
"then": {
"required": ["select", "action", "capability", "location"]
}
}
}
🤖 Prompt for AI Agents
In schemas/v1/policy/governance_policy.schema.json around lines 66 to 81, the
rules item allows "action":"pointerize" without requiring capability or
location; add a JSON Schema conditional (inside the items object) using "if": {
"properties": { "action": { "const": "pointerize" } } } and a corresponding
"then" that adds "required": ["capability", "location"] so pointerize rules must
include those fields (keep existing required ["select","action"],
additionalProperties: false, and existing formats intact).

Comment on lines +16 to +30
"required": ["kind","algo","location","capability"],
"anyOf": [
{
"required": ["digest"],
"properties": {
"digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" }
}
},
{
"required": ["ciphertext_digest"],
"properties": {
"ciphertext_digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" }
}
}
],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Documentation inconsistency: digest optionality.

The schema correctly allows either digest OR ciphertext_digest (or both) via the anyOf constraint, supporting ciphertext-only pointers for low-entropy data per ADR-0004 §2 (lines 104-105).

However, docs/SPEC.md line 326 states: "digest: The **REQUIRED** blake3 hash of the plaintext". This contradicts the schema, which makes digest optional when ciphertext_digest is present.

The ADR is correct: low-entropy privacy classes MUST use ciphertext_digest and MUST NOT expose plaintext digest. The schema aligns with this requirement, but SPEC.md needs correction.

The SPEC.md documentation should be updated to clarify that digest is conditionally required based on privacy class, not universally required. Consider revising SPEC.md line 326 to match the schema's anyOf semantics.

🤖 Prompt for AI Agents
In docs/SPEC.md around line 326, the doc currently states "digest: The REQUIRED
blake3 hash of the plaintext" which contradicts the JSON schema that allows
either digest OR ciphertext_digest; update this line to state that digest is
conditionally required only for privacy classes that reveal plaintext
(plaintext-available classes), and that low-entropy privacy classes MUST use
ciphertext_digest and MUST NOT expose digest (refer ADR-0004 §2); rephrase to
make clear the schema's anyOf semantics (either digest or ciphertext_digest must
be present depending on privacy class) and link or cite ADR-0004 §2 for
authority.

Comment on lines +17 to +30
"anyOf": [
{
"required": ["digest"],
"properties": {
"digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" }
}
},
{
"required": ["ciphertext_digest"],
"properties": {
"ciphertext_digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" }
}
}
],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Simplify anyOf: remove redundant property definitions.

The anyOf branches (lines 18-22 and 24-28) redefine properties that are already declared in the root properties section (lines 9-10). This is redundant and violates the DRY principle.

The anyOf should only specify which fields are required, not redefine their schema:

   "anyOf": [
     {
-      "required": ["digest"],
-      "properties": {
-        "digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" }
-      }
+      "required": ["digest"]
     },
     {
-      "required": ["ciphertext_digest"],
-      "properties": {
-        "ciphertext_digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" }
-      }
+      "required": ["ciphertext_digest"]
     }
   ],

This makes the schema more maintainable: if you need to update the pattern, you only change it in one place (the root properties).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"anyOf": [
{
"required": ["digest"],
"properties": {
"digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" }
}
},
{
"required": ["ciphertext_digest"],
"properties": {
"ciphertext_digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" }
}
}
],
"anyOf": [
{
"required": ["digest"]
},
{
"required": ["ciphertext_digest"]
}
],
🤖 Prompt for AI Agents
In schemas/v1/privacy/opaque_pointer.schema.json around lines 17 to 30, the
anyOf branches redundantly re-declare the "digest" and "ciphertext_digest"
property schemas that are already defined in the root properties; remove the
duplicate property definitions inside each anyOf branch and leave only the
required arrays (e.g., {"required":["digest"]} and
{"required":["ciphertext_digest"]}) so the schema references the root property
definitions for type/pattern and adheres to DRY.

"schemas/v1/governance/revocation.schema.json"
"schemas/v1/governance/proof_of_consensus_envelope.schema.json"
"schemas/v1/policy/governance_policy.schema.json"
"schemas/v1/privacy/opaque_pointer.schema.json"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

Verify whether opaque_pointer.schema.json requires common/ids.schema.json reference.

The script adds schemas/v1/privacy/opaque_pointer.schema.json to the SCHEMAS array, which will be compiled with -r "$AJV_COMMON_REF" (line 29). This assumes the opaque pointer schema references definitions from schemas/v1/common/ids.schema.json.

Review the opaque pointer schema to confirm whether it actually uses any $ref to ids.schema.json definitions. If it doesn't reference the common ids schema, the -r flag is unnecessary (though harmless).


🏁 Script executed:

#!/bin/bash
# Check if opaque_pointer.schema.json references the ids schema
echo "Checking for references to ids.schema.json in opaque_pointer.schema.json:"
grep -n '\$ref.*ids\.schema\.json' schemas/v1/privacy/opaque_pointer.schema.json || echo "No references found"

Length of output: 93


Remove opaque_pointer.schema.json from AJV common reference compilation or add to exclusion conditional.

The verification confirms opaque_pointer.schema.json contains no $ref to ids.schema.json. The -r flag applied at line 29 is unnecessary—update line 26 to exclude this schema alongside governance_policy.schema.json:

if [[ "$schema" == "schemas/v1/common/ids.schema.json" ]] || [[ "$schema" == "schemas/v1/governance/governance_policy.schema.json" ]] || [[ "$schema" == "schemas/v1/privacy/opaque_pointer.schema.json" ]]; then

This eliminates wasteful compilation overhead.

🤖 Prompt for AI Agents
In scripts/validate_schemas.sh around line 21 (and the conditional at line 26),
the privacy/opaque_pointer.schema.json is being needlessly included in the AJV
common reference compilation; update the exclusion conditional to also match
"schemas/v1/privacy/opaque_pointer.schema.json" so it is skipped (i.e., add an
OR check for that path alongside the existing ids.schema.json and
governance_policy.schema.json checks) to avoid the redundant -r compilation for
that file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants