Skip to content

Latest commit

 

History

History
514 lines (393 loc) · 15.4 KB

File metadata and controls

514 lines (393 loc) · 15.4 KB
title KCP Protocol Specification
description Full specification of the Knowledge Context Protocol (KCP) — Layer 8 for persistent governance, discovery, and lineage tracking of AI-generated knowledge.
tags
kcp
protocol
specification
layer8
knowledge-governance
lineage
ed25519
ai-protocol
open-standard
version 0.2
status draft
updated 2026-03-22

KCP Protocol Specification v0.2

Status: Draft
Date: March 2026
Author: Thiago Silva (contato@kcp-protocol.org)


Abstract

This document specifies the Knowledge Context Protocol (KCP), an application-layer protocol for persistent governance, discovery, and lineage tracking of knowledge outputs generated by AI agents. KCP introduces a new layer in the network stack (Layer 8: Context & Knowledge) that sits above the OSI Application Layer (Layer 7).


1. Introduction

1.1 Motivation

The proliferation of AI agents (LLMs, code assistants, analytics tools) has created an explosion of generated knowledge (reports, analyses, insights). However, current infrastructure treats these outputs as ephemeral data:

  • Knowledge disappears when sessions end
  • No mechanism for discovery ("has this been analyzed before?")
  • No lineage tracking (source → analysis → decision)
  • No multi-tenant governance (who can see what, based on business context)

KCP addresses these gaps by defining:

  1. A standard payload format for knowledge artifacts
  2. A protocol for publishing, discovering, and retrieving knowledge
  3. A multi-tenant governance model
  4. A lineage tracking mechanism
  5. A federated P2P storage architecture

1.2 Terminology

  • Knowledge Artifact: Any output generated by an AI agent (report, analysis, visualization, code, etc.)
  • Tenant: An organization or isolated context (e.g., company, open-source project)
  • Team: A subgroup within a tenant (e.g., engineering team, data science team)
  • Lineage: The chain from data sources → queries → insights → decisions
  • Visibility Tier: Access control level (public, org, team, private)

2. Protocol Overview

2.1 Layer Model

┌─────────────────────────────────────────┐
│  Layer 8: Context & Knowledge (KCP)    │  ← New Layer
├─────────────────────────────────────────┤
│  Layer 7: Application (HTTP, etc.)     │
├─────────────────────────────────────────┤
│  Layers 1-6: Traditional Stack         │
└─────────────────────────────────────────┘

2.2 Core Operations

Operation Method Endpoint Description
Publish POST /kcp/v1/artifacts Submit a knowledge artifact
Search GET /kcp/v1/artifacts?q=... Search by keywords/tags
Retrieve GET /kcp/v1/artifacts/{id} Get artifact metadata
Download GET /kcp/v1/artifacts/{id}/content Get artifact content
Delete DELETE /kcp/v1/artifacts/{id} Soft-delete artifact

3. Payload Format

3.1 Core Schema

{
  "id": "uuid-v4",
  "version": "1",
  "user_id": "string",
  "tenant_id": "string",
  "team": "string (optional)",
  "tags": ["string"],
  "source": "string (agent identifier)",
  "timestamp": "ISO 8601 datetime",
  "format": "html | json | markdown | pdf | png",
  "visibility": "public | org | team | private",
  "title": "string",
  "summary": "string (max 500 chars)",
  "lineage": {
    "query": "string (human-readable description)",
    "data_sources": ["uri"],
    "agent": "string",
    "parent_reports": ["uuid"] 
  },
  "content_url": "uri (ipfs:// | https:// | file://)",
  "content_hash": "sha256 hex",
  "embeddings": [float] (optional, for semantic search),
  "signature": "ed25519 signature",
  "acl": {
    "allowed_tenants": ["string"],
    "allowed_users": ["string"],
    "allowed_teams": ["string"]
  }
}

3.2 Field Descriptions

  • id: Unique identifier (UUID v4)
  • version: Protocol version (currently "1")
  • user_id: Creator's identifier (email, username, or DID)
  • tenant_id: Organization/project identifier
  • team: Optional subgroup within tenant
  • tags: List of keywords for discovery
  • source: Agent that generated the artifact (name + version)
  • timestamp: Creation time (UTC, ISO 8601)
  • format: MIME type category
  • visibility: Access control tier (see section 4)
  • title: Human-readable title
  • summary: Brief description (used in search results)
  • lineage: Provenance information
    • query: What question was answered
    • data_sources: Input data URIs
    • agent: Agent that performed analysis
    • parent_reports: Reports this builds upon
  • content_url: Where content is stored
  • content_hash: SHA-256 of content (for integrity)
  • embeddings: Vector representation (optional, for semantic search)
  • signature: Ed25519 signature of payload (excluding signature field)
  • acl: Fine-grained access control (overrides visibility)

4. Multi-Tenant Governance

4.1 Visibility Tiers

Tier Access Rule Example Use Case
public Anyone can read Whitepapers, open documentation
org Anyone in tenant_id can read Internal architecture docs
team Anyone in tenant_id + team can read Squad metrics, postmortems
private Only user_id + explicit ACL can read Draft analyses, sensitive data

4.2 Access Control List (ACL)

For fine-grained control beyond visibility tiers:

"acl": {
  "allowed_tenants": ["acme-corp", "partner-org"],
  "allowed_users": ["alice@example.com", "bob@example.com"],
  "allowed_teams": ["team:engineering", "team:data-science"]
}

Rules:

  • If acl is present, it overrides visibility
  • Access granted if user matches any of: allowed_users, allowed_teams, or allowed_tenants
  • Empty ACL = no additional permissions (fall back to visibility)

5. Lineage Tracking

5.1 Data Flow Model

Data Sources → Query/Agent → Knowledge Artifact → Decision
     ↓              ↓                 ↓               ↓
  [URIs]      [source field]    [this artifact]  [parent_reports]

5.2 Lineage Example

{
  "lineage": {
    "query": "Calculate average response time for API endpoints",
    "data_sources": [
      "prometheus://prod-cluster/metrics",
      "grafana://api/dashboards/xyz"
    ],
    "agent": "monitoring-agent-v2.1",
    "parent_reports": [
      "ab12cd34-...",  // Previous week's report
      "ef56gh78-..."   // Baseline performance report
    ]
  }
}

Traversal: By following parent_reports chains, systems can reconstruct full provenance graphs.


6. Discovery & Search

6.1 Query Syntax

GET /kcp/v1/artifacts?q=<keywords>&tenant_id=<tenant>&team=<team>&tags=<tag1,tag2>&from=<date>&to=<date>

Parameters:

  • q (optional): Full-text search query
  • tenant_id (optional): Filter by tenant
  • team (optional): Filter by team
  • tags (optional): Comma-separated tags
  • from, to (optional): Date range (ISO 8601)
  • limit, offset (optional): Pagination

6.2 Response Format

{
  "results": [
    {
      "id": "uuid",
      "title": "string",
      "summary": "string",
      "created_at": "ISO 8601",
      "relevance": 0.94,
      "preview": "string (first 200 chars)"
    }
  ],
  "total": 42,
  "query_time_ms": 12
}

6.3 Semantic Search (Optional)

If embeddings are provided in the payload:

  1. Client generates embedding for search query
  2. Server performs vector similarity search
  3. Results ranked by cosine similarity

7. Storage Layer

7.1 Requirements

  • Distributed: No single point of failure
  • Content-Addressed: Content hash = retrieval key
  • Encrypted: At-rest encryption (AES-256-GCM)
  • Signed: All artifacts must have valid Ed25519 signatures
  • Efficient: Support for large files (videos, datasets)

7.2 Supported Backends

Option A: IPFS + libp2p (Recommended for Phase 3)

  • Content-addressed by default
  • P2P discovery via DHT
  • Pinning for availability

Option B: libsql + SQLCipher (Recommended for MVP)

  • SQLite fork with replication
  • Encryption via SQLCipher
  • Familiar SQL interface

Option C: Custom KCP Native Format

  • Append-only log (Git-like)
  • Single-file database (.kcp extension)
  • Merkle tree for integrity
  • Full spec in Appendix A (future)

8. Security

8.1 Threat Model

Threats:

  1. Unauthorized access to private artifacts
  2. Data tampering (modifying existing artifacts)
  3. Impersonation (publishing as another user)
  4. Replay attacks
  5. Denial of service

8.2 Mitigations

Threat Mitigation
Unauthorized access Multi-tenant ACLs + encryption at rest
Tampering Content hashing (SHA-256) + signature verification
Impersonation Ed25519 signatures (user keypair)
Replay Timestamp validation (reject if > 5 min old)
DoS Rate limiting (per tenant_id, per user_id)

8.3 Key Management

User Keypair:

Private Key: ed25519:secret (NEVER transmitted)
Public Key: ed25519:public (stored in user profile)
Node ID: hex(public_key) (32 bytes = 64 hex chars)

Key Generation Methods:

  1. Mnemonic-based (Recommended) — BIP-39 compatible recovery phrase
  2. Random generation — Cryptographically secure random 32 bytes
  3. Import from backup — Restore from encrypted backup file

8.4 Identity Recovery (Mnemonic)

Users can generate their keypair from a 12-word recovery phrase (BIP-39 standard):

abandon ability able about above absent absorb abstract absurd abuse access accident

Derivation Process:

from mnemonic import Mnemonic
from cryptography.hazmat.primitives.asymmetric.ed25519 import Ed25519PrivateKey

# 1. Generate mnemonic (128 bits entropy = 12 words)
m = Mnemonic("english")
mnemonic = m.generate(128)  # "word1 word2 ... word12"

# 2. Derive 64-byte seed via PBKDF2-SHA512 (BIP-39 standard)
seed = m.to_seed(mnemonic, passphrase="")  # Optional passphrase

# 3. Use first 32 bytes as Ed25519 private key
private_key = Ed25519PrivateKey.from_private_bytes(seed[:32])
public_key = private_key.public_key()

# 4. Node ID = hex(public_key)
node_id = public_key.public_bytes_raw().hex()

Recovery:

  • Same mnemonic + passphrase = same keypair = same Node ID
  • Users can move between devices by memorizing/storing their 12 words
  • Security: Anyone with the mnemonic has full access to the identity

CLI Commands:

kcp identity create     # Generate new identity with recovery phrase
kcp identity recover    # Restore identity from recovery phrase
kcp identity show       # Display current Node ID and fingerprint
kcp identity export     # Export to encrypted backup file
kcp identity import     # Import from backup file

8.5 Signature Generation

import hashlib
import ed25519

# 1. Remove 'signature' field from payload
payload_without_sig = {k: v for k, v in payload.items() if k != 'signature'}

# 2. Canonical JSON (sorted keys, no whitespace)
canonical = json.dumps(payload_without_sig, sort_keys=True, separators=(',', ':'))

# 3. Sign with user's private key
signature = ed25519.sign(canonical.encode('utf-8'), user_private_key)

# 4. Add signature to payload
payload['signature'] = signature.hex()

8.6 Signature Verification

# Server retrieves user's public key
user_public_key = get_user_public_key(payload['user_id'])

# Reconstruct canonical payload
canonical = json.dumps({k: v for k, v in payload.items() if k != 'signature'}, 
                       sort_keys=True, separators=(',', ':'))

# Verify signature
ed25519.verify(bytes.fromhex(payload['signature']), 
               canonical.encode('utf-8'), 
               user_public_key)

9. Federation & P2P Sync

9.1 Architecture

┌──────────────┐      ┌──────────────┐      ┌──────────────┐
│  Node A      │◄────►│  Node B      │◄────►│  Node C      │
│  (acme-corp) │      │  (beta-inc)  │      │  (gamma-llc) │
└──────────────┘      └──────────────┘      └──────────────┘
       ↕                      ↕                      ↕
   [DHT: Distributed Hash Table for discovery]

9.2 Peer Discovery

  1. Each node announces itself to DHT: "Node A has reports with tags [X, Y, Z]"
  2. When searching, client queries DHT: "Who has reports matching tag X?"
  3. DHT returns list of peers
  4. Client fetches directly from peers (P2P)

9.3 Sync Protocol (Simplified)

1. Node A publishes report with ID=abc123
2. Node A announces to DHT: "I have abc123"
3. Node B queries DHT: "Who has abc123?"
4. DHT responds: "Node A"
5. Node B fetches from Node A (IPFS or HTTP)

10. Versioning

10.1 Protocol Versioning

  • Current version: v0.2
  • Version in URL: /kcp/v1/artifacts (major version only)
  • Backward compatibility: Servers MUST support all v1.x clients

10.2 Payload Versioning

  • Each payload has "version": "1" field
  • Future breaking changes increment version (2, 3, etc.)
  • Servers MUST reject unsupported versions with 400 Bad Request

11. Error Handling

11.1 HTTP Status Codes

Code Meaning Example
200 Success Report retrieved
201 Created Report published
400 Bad Request Invalid payload format
401 Unauthorized Invalid signature
403 Forbidden ACL violation
404 Not Found Report doesn't exist
409 Conflict Report ID already exists
429 Rate Limited Too many requests
500 Server Error Internal failure

11.2 Error Response Format

{
  "error": {
    "code": "INVALID_SIGNATURE",
    "message": "Ed25519 signature verification failed",
    "details": {
      "user_id": "alice@example.com",
      "report_id": "abc123"
    }
  }
}

12. Future Extensions

  • Collaborative Editing: Multiple users edit same artifact (CRDT-based)
  • Notifications: Subscribe to new reports matching tags
  • Analytics: Usage metrics (most viewed, most cited)
  • AI-to-AI Discovery: Agents autonomously discover relevant artifacts

Appendix A: Example Implementations

See /sdk directory for reference implementations in Python, TypeScript, and Go.


Appendix B: Comparison with Related Protocols

Feature KCP HTTP Git IPFS RDF/SPARQL
Knowledge artifacts
Lineage tracking Partial
Multi-tenant ACL
Semantic search
P2P federation
Content-addressed

Appendix C: References

  1. OSI Model — ISO/IEC 7498-1:1994
  2. Ed25519 — RFC 8032
  3. IPFS — https://docs.ipfs.tech
  4. libp2p — https://libp2p.io
  5. Semantic Web — https://www.w3.org/standards/semanticweb/

Document Status: Experimental Draft
Next Review: May 2026
Feedback: Open an issue at https://github.com/kcp-protocol/kcp