-
Notifications
You must be signed in to change notification settings - Fork 0
GATOS Privacy: ADR‑0004 + SPEC/TECH‑SPEC + Pointer Schema + gatos‑privacy crate #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit addresses 8 specific issues raised during review of the initial ADR-0004 documentation. The changes improve clarity, consistency, and technical precision across the specification. Fixes include: - **FEATURES.md**: - Rewrote acceptance criteria for F9-US-DEV to be specific and verifiable. - Removed the superseded F6 feature to avoid confusion with F9. - **SPEC.md**: - Corrected Mermaid diagram type casing ( -> ) to align with JSON standards. - Clarified that private blob decryption during pointer resolution is conditional, not assumed. - **TECH-SPEC.md**: - Removed redundant participant aliases in the projection sequence diagram. - Replaced the unconventional endpoint with a more RESTful URL. - Improved diagram precision by changing 'node' to 'field path'. - Specified JWS with a detached payload as the required authentication mechanism for pointer resolution.
Updated the acceptance criteria for F9-US-SEC in FEATURES.md to explicitly mention the and commit trailers, improving clarity for developers.
…ional HTTP Message Signatures profile; tighten determinism (RFC 8785 JCS + key ordering); clarify namespaces and error taxonomy; add pointer rotation trailer\nschema(privacy): add ciphertext_digest + extensions; bump to draft 2020-12\nschema(policy): add optional privacy.classes + rules\nexamples: add privacy_min and updated opaque_pointer example\nscripts: validate privacy schema/examples in CI\nmake: fix tab indentation for lint/fix targets\nchore: add gatos-privacy crate with OpaquePointer type and notes
…onse digest headers reflect response body\nschema(privacy): allow ciphertext-only pointers via anyOf; keep kind/algo/location/capability required\ndocs(ADR-0004): fix colon in class diagram;\ndocs(ADR-0003): PoC storage level to MUST
Summary by CodeRabbit
WalkthroughIntroduces a hybrid privacy model to GATOS comprising: a new Changes
Sequence Diagram(s)sequenceDiagram
participant Echo as gatos-echo<br/>(State Engine)
participant Policy as gatos-policy<br/>(Policy Service)
participant PrivateStore as PrivateStore<br/>(Backend)
participant Ledger as gatos-ledger<br/>(Refs)
Note over Echo,Ledger: Privacy Projection Flow
Echo->>Echo: Fold history → UnifiedState
Echo->>Policy: Fetch privacy policy (policy.yaml)
Policy-->>Echo: Privacy config (classes, rules)
Echo->>Echo: Apply redact/pointerize<br/>Project UnifiedState → PublicState + PrivateBlobs
Echo->>PrivateStore: Persist PrivateBlobs (encrypted)
PrivateStore-->>Echo: OK (storage refs)
Echo->>Ledger: Commit PublicState (with opaque pointers)<br/>+ Trailers (Privacy-Redactions, Privacy-Pointers)
Ledger-->>Echo: Refs updated
sequenceDiagram
participant Client
participant gatosd as gatosd<br/>(Daemon)
participant Ledger as gatos-ledger
participant PrivateStore
Note over Client,PrivateStore: Opaque Pointer Resolution Flow
Client->>gatosd: POST /gatos/private/blobs/resolve<br/>(pointer, JWT auth)
gatosd->>Ledger: Fetch pointer metadata<br/>+ access policy
Ledger-->>gatosd: Pointer + policy
gatosd->>gatosd: Verify JWT (auth)<br/>Evaluate policy
alt Authorization Success
gatosd->>PrivateStore: Retrieve encrypted blob<br/>(using location/capability)
PrivateStore-->>gatosd: Encrypted blob
gatosd->>gatosd: Decrypt blob (capability key)<br/>Verify digest & ciphertext_digest
gatosd-->>Client: Decrypted blob (HTTP 200)
else Authorization Denied
gatosd-->>Client: Error (Unauthorized/Forbidden)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pub struct OpaquePointer { | ||
| pub kind: Kind, | ||
| pub algo: Algo, | ||
| pub digest: String, | ||
| #[serde(skip_serializing_if = "Option::is_none")] | ||
| pub ciphertext_digest: Option<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allow ciphertext-only pointers without
digest
The new JSON schema explicitly permits opaque pointers that omit digest when ciphertext_digest is provided, but the Rust struct still models digest as a mandatory String. Any ciphertext-only pointer produced according to the schema will fail deserialization because serde expects the digest field. Either digest should be optional (Option<String>) or the schema should require it so that the type and schema remain aligned.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 16
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (16)
Cargo.toml(1 hunks)Makefile(3 hunks)crates/gatos-privacy/Cargo.toml(1 hunks)crates/gatos-privacy/src/lib.rs(1 hunks)docs/FEATURES.md(1 hunks)docs/SPEC.md(6 hunks)docs/TECH-SPEC.md(4 hunks)docs/USE-CASES.md(1 hunks)docs/decisions/ADR-0003/DECISION.md(1 hunks)docs/decisions/ADR-0004/DECISION.md(1 hunks)docs/decisions/README.md(1 hunks)examples/v1/policy/privacy_min.json(1 hunks)examples/v1/privacy/opaque_pointer_min.json(1 hunks)schemas/v1/policy/governance_policy.schema.json(1 hunks)schemas/v1/privacy/opaque_pointer.schema.json(1 hunks)scripts/validate_schemas.sh(3 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/decisions/ADR-0004/DECISION.md
[style] ~123-~123: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...cy Policy** (.gatos/policy.yaml). 3. It traverses the UnifiedState tree, appl...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~129-~129: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...MUST be canonicalized with RFC 8785 JCS prior to hashing. - When non‑JSON maps are mater...
(EN_WORDINESS_PREMIUM_PRIOR_TO)
docs/TECH-SPEC.md
[grammar] ~222-~222: Ensure spelling is correct
Context: ...on that: (1) fetches; (2) decrypts; (3) re‑encrypts; (4) stores; (5) emits an audit event up...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: audit
🔇 Additional comments (19)
Cargo.toml (1)
10-10: LGTM! Clean addition ofgatos-privacyto workspace members with proper ordering.crates/gatos-privacy/Cargo.toml (1)
6-12: LGTM! Minimal, focused dependency set aligned with OpaquePointer serialization and crypto needs. Workspace coordination is clean.docs/decisions/README.md (1)
23-23: LGTM! ADR-0004 entry follows established format and is correctly positioned in the decision log.examples/v1/privacy/opaque_pointer_min.json (1)
1-9: LGTM! Minimal example correctly demonstrates all required OpaquePointer fields with appropriate placeholder digest values and URI formats.docs/FEATURES.md (1)
198-213: F9-US-SEC user story is well-structured.Acceptance criteria are specific and measurable (capability validation, digest verification, trailer accuracy). Test plan covers golden/edge/failure scenarios appropriately. ADR-0004 cross-reference adds context.
docs/USE-CASES.md (1)
89-95: LGTM! Use case 9 is well-articulated, cohesive with ADR-0004 concepts, and maintains format consistency with existing use cases 1-8. Clearly communicates the value of capability-gated, auditable PII handling.docs/decisions/ADR-0003/DECISION.md (1)
89-92: Excellent normative clarifications on PoC envelope determinism.Line 89's explicit "lexicographically sorted list of approvals by the lowercase ASCII of each approval's
Signervalue" pins down sorting order (critical for BLAKE3 reproducibility). Line 92's storage requirement (persisting canonical JSON blob underrefs/gatos/audit/proofs/governance/<proposal-id>) provides auditability and enables offline verification. Both reinforce RFC 8785 JCS determinism goals and integrate well with ADR-0004's privacy/determinism narrative.Makefile (2)
47-48: Schema-compile correctly extended for privacy schema.Privacy schema compilation follows established pattern (draft2020, strict, ajv-formats, ids reference). Positioning before governance_policy compilation is logical. No missing dependencies or references.
61-62: Schema-validate correctly adds privacy example validation.Validates opaque_pointer_min.json against opaque_pointer.schema.json with proper ids reference. Syntax mirrors existing validations (governance, job) and integrates cleanly into error-halting chain.
scripts/validate_schemas.sh (1)
57-58: LGTM: Privacy example validation correctly extends the validation chain.The addition properly validates
privacy_min.jsonagainst the governance policy schema, confirming that the governance schema's new privacy section is exercised. This creates a clean validation chain:
- Standard governance example (line 56)
- Privacy-focused governance example (lines 57-58)
Both validate against the same schema but exercise different schema surfaces.
docs/TECH-SPEC.md (3)
104-108: LGTM: Crate responsibilities updated to reflect privacy capabilities.The crate summary accurately reflects the new privacy-related responsibilities:
gatos-echo: Privacy projection logic for folding UnifiedState into PublicStategatos-policy: Privacy rule evaluationgatosd: Opaque Pointer resolution endpointThese align with the architectural flow described in ADR-0004.
284-288: LGTM: Projection Determinism test suite properly integrated.The addition of a "Projection Determinism" test suite (line 284) with clear requirements (line 287-288) ensures that privacy projections produce byte-for-byte identical results across platforms. This is essential for verifiable privacy guarantees.
The test coverage aligns with the determinism requirements specified in ADR-0004 (lines 128-130).
173-192: Fix diagram participant label for naming consistency; verify PrivateStore trait implementation status.The diagram lists
"StorageBackend (Interface)"as a participant (line 178) but referencesPrivateStorein the sequence flow (line 187) and subsequent documentation (line 194). Either rename the participant toPrivateStoreor rename all references toStorageBackendfor consistency.Additionally,
PrivateStoreis described as "a pluggable trait" (line 194) but no trait/interface definition exists in the codebase—verify whether this is aspirational design documentation or if the implementation is planned for a separate branch/crate. If planned, document the implementation status.crates/gatos-privacy/src/lib.rs (1)
1-11: Excellent documentation and architectural notes.The module-level documentation clearly explains the crate's purpose, references the normative ADR-0004, and provides critical guidance about JCS canonicalization requirements without introducing unnecessary dependencies.
The explicit note about delegating JCS implementation to higher layers (lines 7-10) is a good design choice for keeping the crate lean.
docs/decisions/ADR-0004/DECISION.md (2)
1-11: LGTM: ADR metadata complete and well-structured.The front matter includes all required fields: Status, Date, ADR number, Authors, dependency relationships (Requires/Related), Tags, and Schema references. The Accepted status and date (2025-11-10) align with the PR timeline.
277-286: LGTM: Error taxonomy is complete and well-defined.The normative error taxonomy (lines 277-286) covers all expected failure modes with appropriate HTTP status codes:
- Authentication failures (401)
- Authorization failures (403)
- Missing resources (404)
- Integrity failures (422 DigestMismatch)
- Availability failures (503 CapabilityUnavailable)
The error codes align with HTTP semantics and provide clear guidance for implementers.
docs/SPEC.md (3)
74-74: Good: Explicitly marking compute crate as planned.Adding "(planned)" to
gatos-compute(line 74) sets clear expectations that this component is not yet implemented. This prevents confusion for early implementers or integrators.
682-683: Excellent clarification on approval expiration semantics.The comment on line 682 clearly explains that an approval's
Expires-Atcannot extend the proposal's expiration—it can only further restrict validity. This prevents confusion about expiration precedence and potential security issues.
698-701: Good: PoC envelope storage made normative with MUST.Line 701 correctly elevates PoC envelope storage from a recommendation to a requirement by changing "should" to "MUST". This ensures auditability and verifiability of consensus decisions.
The lexicographic ordering by signer (line 698) ensures deterministic PoC computation, which is critical for consensus verification.
| pub struct OpaquePointer { | ||
| pub kind: Kind, | ||
| pub algo: Algo, | ||
| pub digest: String, | ||
| #[serde(skip_serializing_if = "Option::is_none")] | ||
| pub ciphertext_digest: Option<String>, | ||
| #[serde(skip_serializing_if = "Option::is_none")] | ||
| pub size: Option<u64>, | ||
| pub location: String, | ||
| pub capability: String, | ||
| #[serde(skip_serializing_if = "Option::is_none")] | ||
| pub extensions: Option<Value>, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CRITICAL: Rust struct violates schema's anyOf requirement.
Line 20 declares digest as a non-optional String, making it always required. However, schemas/v1/privacy/opaque_pointer.schema.json (lines 17-30) uses anyOf to require EITHER digest OR ciphertext_digest, allowing ciphertext-only pointers for low-entropy data per ADR-0004.
This mismatch means:
- Deserialization will fail for valid ciphertext-only pointers
- The Rust type cannot represent low-entropy privacy classes that MUST NOT expose plaintext digest
Both digest and ciphertext_digest must be Option<String>, with validation logic ensuring at least one is Some.
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
#[serde(deny_unknown_fields)]
pub struct OpaquePointer {
pub kind: Kind,
pub algo: Algo,
- pub digest: String,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub digest: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub ciphertext_digest: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub size: Option<u64>,
pub location: String,
pub capability: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub extensions: Option<Value>,
}Then add a validation method:
impl OpaquePointer {
pub fn validate(&self) -> Result<(), &'static str> {
if self.digest.is_none() && self.ciphertext_digest.is_none() {
return Err("At least one of digest or ciphertext_digest must be present");
}
Ok(())
}
}| #[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] | ||
| #[serde(rename_all = "snake_case")] | ||
| pub enum Kind { | ||
| OpaquePointer, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider: Single-variant enum may be premature.
The Kind enum has only one variant (OpaquePointer). Unless additional pointer kinds are planned in the near future, this could be simplified to a constant string or removed entirely, with the field being a simple &'static str or validated string.
However, if the enum is intentional for future extensibility, consider adding a doc comment explaining the planned variants.
If future kinds are planned:
/// Pointer envelope kinds.
///
/// Future variants may include `TransparentPointer`, `NullPointer`, etc.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum Kind {
OpaquePointer,
// Future: TransparentPointer, NullPointer, ...
}If not:
// Replace enum with constant
pub const KIND: &str = "opaque_pointer";
// In struct:
pub struct OpaquePointer {
#[serde(default = "default_kind")]
pub kind: String,
// ...
}
fn default_kind() -> String {
KIND.to_string()
}🤖 Prompt for AI Agents
In crates/gatos-privacy/src/lib.rs around lines 31 to 35, the Kind enum only has
a single variant (OpaquePointer) which is premature; if you intend to add more
pointer kinds later, keep the enum and add a doc comment above it describing
planned future variants (e.g., TransparentPointer, NullPointer) so intent is
clear; otherwise remove the enum and replace usages with a constant string
(e.g., pub const KIND: &str = "opaque_pointer") and change the struct field to a
String or &'static str with a serde default helper (add a default_kind() that
returns KIND.to_string() and annotate the field with #[serde(default =
"default_kind")]) so serialization and defaults behave the same.
| #[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] | ||
| #[serde(rename_all = "lowercase")] | ||
| pub enum Algo { | ||
| Blake3, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Same consideration for Algo enum: future-proof or simplify.
The Algo enum has a single variant (Blake3). Similar to Kind, if no other algorithms are planned, consider whether an enum is necessary. However, cryptographic agility is often important, so the enum may be justified for future hash function migration.
If the enum is intentional for crypto-agility, add documentation explaining the design choice.
/// Hash algorithm for pointer digests.
///
/// Currently only BLAKE3 is supported. Future variants may include
/// SHA3-256 or other cryptographic hash functions as the ecosystem evolves.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "lowercase")]
pub enum Algo {
Blake3,
// Future: Sha3, ...
}🤖 Prompt for AI Agents
In crates/gatos-privacy/src/lib.rs around lines 37 to 41, the Algo enum
currently has a single variant (Blake3) which is fine but needs explicit
documentation for crypto-agility; add a doc comment above the enum explaining
that this represents the hash algorithm for pointer digests, that BLAKE3 is the
currently supported algorithm, and that additional algorithms (e.g., SHA3-256)
may be added in the future to support migration, and optionally include a
commented placeholder for future variants so intent is clear to readers and
maintainers.
| +string digest: "blake3:<hex>" // plaintext digest | ||
| +string ciphertext_digest: "blake3:<hex>" // MAY be present |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ambiguity: "MAY be present" conflicts with conditional MUST requirement.
Line 96 states ciphertext_digest "MAY be present", but line 105 contradicts this with "the public pointer MUST include ciphertext_digest" for low-entropy privacy classes.
The correct phrasing should clarify that presence is conditional on privacy class:
-- **`ciphertext_digest`**: The content-address of the stored ciphertext (`blake3(ciphertext_bytes)`). For low‑entropy privacy classes (see Policy Hooks), the public pointer **MUST** include `ciphertext_digest` and policy **MUST NOT** expose the plaintext digest publicly.
+- **`ciphertext_digest`**: The content-address of the stored ciphertext (`blake3(ciphertext_bytes)`). For low‑entropy privacy classes (see Policy Hooks), this field is **REQUIRED** and the public pointer **MUST NOT** expose the plaintext digest. For high-entropy data, this field is **OPTIONAL**.Replace "MAY be present" with a conditional requirement tied to privacy class to eliminate ambiguity.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In docs/decisions/ADR-0004/DECISION.md around lines 95 to 96, the field
description currently says `ciphertext_digest: "blake3:<hex>" // MAY be
present`, which conflicts with the later statement that low-entropy privacy
classes MUST include it; change this line to state the presence is conditional
on privacy class (e.g., "MUST be present for low-entropy privacy classes;
otherwise MAY be omitted") so the requirement is explicit and unambiguous.
| ### 4. Pointer Resolution Protocol (Normative) | ||
|
|
||
| Authentication semantics are aligned with HTTP. We adopt a simple, interoperable model (JWT default; HTTP Message Signatures optional): | ||
|
|
||
| - **Endpoint**: `POST /gatos/private/blobs/resolve` | ||
| - **Request Body (application/json; JCS canonical form)**: | ||
| `{ "digest": "blake3:<hex>", "want": "plaintext"|"ciphertext" }` | ||
| - **Authorization**: `Authorization: Bearer <JWT>` | ||
| - Claims MUST include: `sub` (ed25519:<pubkey>), `aud` (node id or URL), `method` ("POST"), `path` ("/gatos/private/blobs/resolve"), `exp`, and `nbf`. | ||
| - Clock skew tolerance: ±300 seconds. | ||
| - On missing/invalid token: `401 Unauthorized`. On policy denial: `403 Forbidden`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Authentication mechanism underspecified for JWT validation.
Lines 154-157 specify JWT claims but don't clarify:
- Signature algorithm: Which signing algorithm is required? (e.g., EdDSA with Ed25519 per the GATOS trust model?)
- Public key discovery: How does the server obtain
sub's public key to verify the JWT signature? - Token binding: How is the JWT cryptographically bound to the request body to prevent token substitution?
Without these details, implementations may be incompatible or insecure.
Add a normative subsection after line 157:
#### JWT Validation (Normative)
Servers MUST validate JWTs as follows:
1. **Algorithm**: Only EdDSA (Ed25519) is supported. The `alg` header MUST be `"EdDSA"`. Reject tokens with other algorithms (including `none`).
2. **Public Key**: Resolve the signing public key from the `sub` claim via the trust graph. The `sub` MUST be of the form `ed25519:<pubkey>`.
3. **Signature**: Verify the JWT signature using the resolved public key.
4. **Claims**:
- `digest` claim MUST match the `digest` field in the request body (cryptographic binding).
- `method` and `path` MUST match the HTTP request method and path.
- `exp` and `nbf` MUST be valid within ±300s clock skew.
- `aud` MUST match the server's node identifier.
5. **Replay protection**: Servers SHOULD implement short-lived tokens (≤60s expiry) or maintain a `jti` nonce cache to prevent replay attacks.🤖 Prompt for AI Agents
docs/decisions/ADR-0004/DECISION.md around lines 147 to 157: the JWT validation
behavior is underspecified — add a normative "JWT Validation" subsection after
line 157 that mandates algorithm, public-key discovery, signature verification,
claim checks, cryptographic binding to the request body, clock skew rules,
audience matching, and replay protection; specifically require EdDSA (Ed25519)
only with `alg: "EdDSA"`, require `sub` be `ed25519:<pubkey>` and instruct
servers to resolve the public key via the trust graph, verify the JWT signature
with that key, require `digest` claim to match the request body `digest`, ensure
`method` and `path` claims match the HTTP request, enforce `exp`/`nbf` with
±300s skew and `aud` matching the server node id, and recommend short-lived
tokens (≤60s) or `jti` nonce caching for replay protection.
| "type": "object", | ||
| "additionalProperties": false, | ||
| "properties": { | ||
| "min_entropy_bits": { "type": "integer", "minimum": 0 }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider: Minimum entropy of 0 bits allows degenerate case.
Line 60 allows min_entropy_bits: 0, which represents zero-entropy (completely deterministic/constant) data. Protecting zero-entropy data with pointer indirection provides no privacy benefit, as the plaintext digest alone uniquely identifies the value.
For meaningful privacy protection, consider "minimum": 1 to require at least 1 bit of entropy. Alternatively, if 0 is intentionally allowed for testing or edge cases, document this rationale in the schema description.
- "min_entropy_bits": { "type": "integer", "minimum": 0 },
+ "min_entropy_bits": { "type": "integer", "minimum": 1 },📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "min_entropy_bits": { "type": "integer", "minimum": 0 }, | |
| "min_entropy_bits": { "type": "integer", "minimum": 1 }, |
🤖 Prompt for AI Agents
In schemas/v1/policy/governance_policy.schema.json around line 60, the property
"min_entropy_bits" currently permits 0 which allows a degenerate zero-entropy
case; change the schema to require a minimum of 1 (i.e., set "minimum": 1) to
prevent zero-entropy values, or if 0 is intentionally allowed add a descriptive
"description" explaining why 0 is permitted (e.g., for testing/edge cases) so
the intent is explicit.
| "rules": { | ||
| "type": "array", | ||
| "items": { | ||
| "type": "object", | ||
| "additionalProperties": false, | ||
| "properties": { | ||
| "select": { "type": "string" }, | ||
| "action": { "type": "string", "enum": ["redact", "pointerize"] }, | ||
| "class": { "type": "string" }, | ||
| "capability": { "type": "string", "format": "uri" }, | ||
| "location": { "type": "string", "format": "uri" } | ||
| }, | ||
| "required": ["select", "action"] | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Missing conditional validation for pointerize action.
The rules schema allows action: "pointerize" without requiring capability or location fields. According to ADR-0004 (lines 214-218), when action is "pointerize", both capability and location are REQUIRED to specify where and how to store the private blob.
Currently, line 78 only requires ["select", "action"], making capability/location universally optional.
Add conditional validation using if/then to enforce that pointerize actions include required fields:
"rules": {
"type": "array",
"items": {
"type": "object",
"additionalProperties": false,
"properties": {
"select": { "type": "string" },
"action": { "type": "string", "enum": ["redact", "pointerize"] },
"class": { "type": "string" },
"capability": { "type": "string", "format": "uri" },
"location": { "type": "string", "format": "uri" }
},
- "required": ["select", "action"]
+ "required": ["select", "action"],
+ "if": {
+ "properties": { "action": { "const": "pointerize" } }
+ },
+ "then": {
+ "required": ["select", "action", "capability", "location"]
+ }
}
}This ensures schema validation catches invalid configurations at policy load time rather than runtime.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "rules": { | |
| "type": "array", | |
| "items": { | |
| "type": "object", | |
| "additionalProperties": false, | |
| "properties": { | |
| "select": { "type": "string" }, | |
| "action": { "type": "string", "enum": ["redact", "pointerize"] }, | |
| "class": { "type": "string" }, | |
| "capability": { "type": "string", "format": "uri" }, | |
| "location": { "type": "string", "format": "uri" } | |
| }, | |
| "required": ["select", "action"] | |
| } | |
| } | |
| } | |
| "rules": { | |
| "type": "array", | |
| "items": { | |
| "type": "object", | |
| "additionalProperties": false, | |
| "properties": { | |
| "select": { "type": "string" }, | |
| "action": { "type": "string", "enum": ["redact", "pointerize"] }, | |
| "class": { "type": "string" }, | |
| "capability": { "type": "string", "format": "uri" }, | |
| "location": { "type": "string", "format": "uri" } | |
| }, | |
| "required": ["select", "action"], | |
| "if": { | |
| "properties": { "action": { "const": "pointerize" } } | |
| }, | |
| "then": { | |
| "required": ["select", "action", "capability", "location"] | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
In schemas/v1/policy/governance_policy.schema.json around lines 66 to 81, the
rules item allows "action":"pointerize" without requiring capability or
location; add a JSON Schema conditional (inside the items object) using "if": {
"properties": { "action": { "const": "pointerize" } } } and a corresponding
"then" that adds "required": ["capability", "location"] so pointerize rules must
include those fields (keep existing required ["select","action"],
additionalProperties: false, and existing formats intact).
| "required": ["kind","algo","location","capability"], | ||
| "anyOf": [ | ||
| { | ||
| "required": ["digest"], | ||
| "properties": { | ||
| "digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" } | ||
| } | ||
| }, | ||
| { | ||
| "required": ["ciphertext_digest"], | ||
| "properties": { | ||
| "ciphertext_digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" } | ||
| } | ||
| } | ||
| ], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation inconsistency: digest optionality.
The schema correctly allows either digest OR ciphertext_digest (or both) via the anyOf constraint, supporting ciphertext-only pointers for low-entropy data per ADR-0004 §2 (lines 104-105).
However, docs/SPEC.md line 326 states: "digest: The **REQUIRED** blake3 hash of the plaintext". This contradicts the schema, which makes digest optional when ciphertext_digest is present.
The ADR is correct: low-entropy privacy classes MUST use ciphertext_digest and MUST NOT expose plaintext digest. The schema aligns with this requirement, but SPEC.md needs correction.
The SPEC.md documentation should be updated to clarify that digest is conditionally required based on privacy class, not universally required. Consider revising SPEC.md line 326 to match the schema's anyOf semantics.
🤖 Prompt for AI Agents
In docs/SPEC.md around line 326, the doc currently states "digest: The REQUIRED
blake3 hash of the plaintext" which contradicts the JSON schema that allows
either digest OR ciphertext_digest; update this line to state that digest is
conditionally required only for privacy classes that reveal plaintext
(plaintext-available classes), and that low-entropy privacy classes MUST use
ciphertext_digest and MUST NOT expose digest (refer ADR-0004 §2); rephrase to
make clear the schema's anyOf semantics (either digest or ciphertext_digest must
be present depending on privacy class) and link or cite ADR-0004 §2 for
authority.
| "anyOf": [ | ||
| { | ||
| "required": ["digest"], | ||
| "properties": { | ||
| "digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" } | ||
| } | ||
| }, | ||
| { | ||
| "required": ["ciphertext_digest"], | ||
| "properties": { | ||
| "ciphertext_digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" } | ||
| } | ||
| } | ||
| ], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Simplify anyOf: remove redundant property definitions.
The anyOf branches (lines 18-22 and 24-28) redefine properties that are already declared in the root properties section (lines 9-10). This is redundant and violates the DRY principle.
The anyOf should only specify which fields are required, not redefine their schema:
"anyOf": [
{
- "required": ["digest"],
- "properties": {
- "digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" }
- }
+ "required": ["digest"]
},
{
- "required": ["ciphertext_digest"],
- "properties": {
- "ciphertext_digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" }
- }
+ "required": ["ciphertext_digest"]
}
],This makes the schema more maintainable: if you need to update the pattern, you only change it in one place (the root properties).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "anyOf": [ | |
| { | |
| "required": ["digest"], | |
| "properties": { | |
| "digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" } | |
| } | |
| }, | |
| { | |
| "required": ["ciphertext_digest"], | |
| "properties": { | |
| "ciphertext_digest": { "type": "string", "pattern": "^blake3:[a-f0-9]{64}$" } | |
| } | |
| } | |
| ], | |
| "anyOf": [ | |
| { | |
| "required": ["digest"] | |
| }, | |
| { | |
| "required": ["ciphertext_digest"] | |
| } | |
| ], |
🤖 Prompt for AI Agents
In schemas/v1/privacy/opaque_pointer.schema.json around lines 17 to 30, the
anyOf branches redundantly re-declare the "digest" and "ciphertext_digest"
property schemas that are already defined in the root properties; remove the
duplicate property definitions inside each anyOf branch and leave only the
required arrays (e.g., {"required":["digest"]} and
{"required":["ciphertext_digest"]}) so the schema references the root property
definitions for type/pattern and adheres to DRY.
| "schemas/v1/governance/revocation.schema.json" | ||
| "schemas/v1/governance/proof_of_consensus_envelope.schema.json" | ||
| "schemas/v1/policy/governance_policy.schema.json" | ||
| "schemas/v1/privacy/opaque_pointer.schema.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
Verify whether opaque_pointer.schema.json requires common/ids.schema.json reference.
The script adds schemas/v1/privacy/opaque_pointer.schema.json to the SCHEMAS array, which will be compiled with -r "$AJV_COMMON_REF" (line 29). This assumes the opaque pointer schema references definitions from schemas/v1/common/ids.schema.json.
Review the opaque pointer schema to confirm whether it actually uses any $ref to ids.schema.json definitions. If it doesn't reference the common ids schema, the -r flag is unnecessary (though harmless).
🏁 Script executed:
#!/bin/bash
# Check if opaque_pointer.schema.json references the ids schema
echo "Checking for references to ids.schema.json in opaque_pointer.schema.json:"
grep -n '\$ref.*ids\.schema\.json' schemas/v1/privacy/opaque_pointer.schema.json || echo "No references found"Length of output: 93
Remove opaque_pointer.schema.json from AJV common reference compilation or add to exclusion conditional.
The verification confirms opaque_pointer.schema.json contains no $ref to ids.schema.json. The -r flag applied at line 29 is unnecessary—update line 26 to exclude this schema alongside governance_policy.schema.json:
if [[ "$schema" == "schemas/v1/common/ids.schema.json" ]] || [[ "$schema" == "schemas/v1/governance/governance_policy.schema.json" ]] || [[ "$schema" == "schemas/v1/privacy/opaque_pointer.schema.json" ]]; thenThis eliminates wasteful compilation overhead.
🤖 Prompt for AI Agents
In scripts/validate_schemas.sh around line 21 (and the conditional at line 26),
the privacy/opaque_pointer.schema.json is being needlessly included in the AJV
common reference compilation; update the exclusion conditional to also match
"schemas/v1/privacy/opaque_pointer.schema.json" so it is skipped (i.e., add an
OR check for that path alongside the existing ids.schema.json and
governance_policy.schema.json checks) to avoid the redundant -r compilation for
that file.
This PR formalizes the hybrid privacy model (ADR‑0004) and wires the supporting schema/tooling.
Highlights
ciphertext_digestandextensions.privacy.classesandprivacy.rules(optional), with examples.gatos-privacycrate (typed OpaquePointer envelope).Notable details
ciphertext_digestin public pointers.DigestMismatchis 422.Privacy-Pointer-Rotationstrailer.Follow‑ups to be tracked as issues
Receipts
decisions/ADR-0004Ready for review.