title

KCP Protocol Specification

description

Full specification of the Knowledge Context Protocol (KCP) — Layer 8 for persistent governance, discovery, and lineage tracking of AI-generated knowledge.

KCP Protocol Specification v0.2

Status: Draft
Date: March 2026
Author: Thiago Silva (contato@kcp-protocol.org)

Abstract

This document specifies the Knowledge Context Protocol (KCP), an application-layer protocol for persistent governance, discovery, and lineage tracking of knowledge outputs generated by AI agents. KCP introduces a new layer in the network stack (Layer 8: Context & Knowledge) that sits above the OSI Application Layer (Layer 7).

1. Introduction

1.1 Motivation

The proliferation of AI agents (LLMs, code assistants, analytics tools) has created an explosion of generated knowledge (reports, analyses, insights). However, current infrastructure treats these outputs as ephemeral data:

Knowledge disappears when sessions end
No mechanism for discovery ("has this been analyzed before?")
No lineage tracking (source → analysis → decision)
No multi-tenant governance (who can see what, based on business context)

KCP addresses these gaps by defining:

A standard payload format for knowledge artifacts
A protocol for publishing, discovering, and retrieving knowledge
A multi-tenant governance model
A lineage tracking mechanism
A federated P2P storage architecture

1.2 Terminology

Knowledge Artifact: Any output generated by an AI agent (report, analysis, visualization, code, etc.)
Tenant: An organization or isolated context (e.g., company, open-source project)
Team: A subgroup within a tenant (e.g., engineering team, data science team)
Lineage: The chain from data sources → queries → insights → decisions
Visibility Tier: Access control level (public, org, team, private)

2. Protocol Overview

2.1 Layer Model

┌─────────────────────────────────────────┐
│  Layer 8: Context & Knowledge (KCP)    │  ← New Layer
├─────────────────────────────────────────┤
│  Layer 7: Application (HTTP, etc.)     │
├─────────────────────────────────────────┤
│  Layers 1-6: Traditional Stack         │
└─────────────────────────────────────────┘

2.2 Core Operations

Operation	Method	Endpoint	Description
Publish	POST	`/kcp/v1/artifacts`	Submit a knowledge artifact
Search	GET	`/kcp/v1/artifacts?q=...`	Search by keywords/tags
Retrieve	GET	`/kcp/v1/artifacts/{id}`	Get artifact metadata
Download	GET	`/kcp/v1/artifacts/{id}/content`	Get artifact content
Delete	DELETE	`/kcp/v1/artifacts/{id}`	Soft-delete artifact

3. Payload Format

3.1 Core Schema

{
  "id": "uuid-v4",
  "version": "1",
  "user_id": "string",
  "tenant_id": "string",
  "team": "string (optional)",
  "tags": ["string"],
  "source": "string (agent identifier)",
  "timestamp": "ISO 8601 datetime",
  "format": "html | json | markdown | pdf | png",
  "visibility": "public | org | team | private",
  "title": "string",
  "summary": "string (max 500 chars)",
  "lineage": {
    "query": "string (human-readable description)",
    "data_sources": ["uri"],
    "agent": "string",
    "parent_reports": ["uuid"] 
  },
  "content_url": "uri (ipfs:// | https:// | file://)",
  "content_hash": "sha256 hex",
  "embeddings": [float] (optional, for semantic search),
  "signature": "ed25519 signature",
  "acl": {
    "allowed_tenants": ["string"],
    "allowed_users": ["string"],
    "allowed_teams": ["string"]
  }
}

3.2 Field Descriptions

id: Unique identifier (UUID v4)
version: Protocol version (currently "1")
user_id: Creator's identifier (email, username, or DID)
tenant_id: Organization/project identifier
team: Optional subgroup within tenant
tags: List of keywords for discovery
source: Agent that generated the artifact (name + version)
timestamp: Creation time (UTC, ISO 8601)
format: MIME type category
visibility: Access control tier (see section 4)
title: Human-readable title
summary: Brief description (used in search results)
lineage: Provenance information
- query: What question was answered
- data_sources: Input data URIs
- agent: Agent that performed analysis
- parent_reports: Reports this builds upon
content_url: Where content is stored
content_hash: SHA-256 of content (for integrity)
embeddings: Vector representation (optional, for semantic search)
signature: Ed25519 signature of payload (excluding signature field)
acl: Fine-grained access control (overrides visibility)

4. Multi-Tenant Governance

4.1 Visibility Tiers

Tier	Access Rule	Example Use Case
public	Anyone can read	Whitepapers, open documentation
org	Anyone in `tenant_id` can read	Internal architecture docs
team	Anyone in `tenant_id` + `team` can read	Squad metrics, postmortems
private	Only `user_id` + explicit ACL can read	Draft analyses, sensitive data

4.2 Access Control List (ACL)

For fine-grained control beyond visibility tiers:

"acl": {
  "allowed_tenants": ["acme-corp", "partner-org"],
  "allowed_users": ["alice@example.com", "bob@example.com"],
  "allowed_teams": ["team:engineering", "team:data-science"]
}

Rules:

If acl is present, it overrides visibility
Access granted if user matches any of: allowed_users, allowed_teams, or allowed_tenants
Empty ACL = no additional permissions (fall back to visibility)

5. Lineage Tracking

5.1 Data Flow Model

Data Sources → Query/Agent → Knowledge Artifact → Decision
     ↓              ↓                 ↓               ↓
  [URIs]      [source field]    [this artifact]  [parent_reports]

5.2 Lineage Example

{
  "lineage": {
    "query": "Calculate average response time for API endpoints",
    "data_sources": [
      "prometheus://prod-cluster/metrics",
      "grafana://api/dashboards/xyz"
    ],
    "agent": "monitoring-agent-v2.1",
    "parent_reports": [
      "ab12cd34-...",  // Previous week's report
      "ef56gh78-..."   // Baseline performance report
    ]
  }
}

Traversal: By following parent_reports chains, systems can reconstruct full provenance graphs.

6. Discovery & Search

6.1 Query Syntax

GET /kcp/v1/artifacts?q=<keywords>&tenant_id=<tenant>&team=<team>&tags=<tag1,tag2>&from=<date>&to=<date>

Parameters:

q (optional): Full-text search query
tenant_id (optional): Filter by tenant
team (optional): Filter by team
tags (optional): Comma-separated tags
from, to (optional): Date range (ISO 8601)
limit, offset (optional): Pagination

6.2 Response Format

{
  "results": [
    {
      "id": "uuid",
      "title": "string",
      "summary": "string",
      "created_at": "ISO 8601",
      "relevance": 0.94,
      "preview": "string (first 200 chars)"
    }
  ],
  "total": 42,
  "query_time_ms": 12
}

6.3 Semantic Search (Optional)

If embeddings are provided in the payload:

Client generates embedding for search query
Server performs vector similarity search
Results ranked by cosine similarity

7. Storage Layer

7.1 Requirements

Distributed: No single point of failure
Content-Addressed: Content hash = retrieval key
Encrypted: At-rest encryption (AES-256-GCM)
Signed: All artifacts must have valid Ed25519 signatures
Efficient: Support for large files (videos, datasets)

7.2 Supported Backends

Option A: IPFS + libp2p (Recommended for Phase 3)

Content-addressed by default
P2P discovery via DHT
Pinning for availability

Option B: libsql + SQLCipher (Recommended for MVP)

SQLite fork with replication
Encryption via SQLCipher
Familiar SQL interface

Option C: Custom KCP Native Format

Append-only log (Git-like)
Single-file database (.kcp extension)
Merkle tree for integrity
Full spec in Appendix A (future)

8. Security

8.1 Threat Model

Threats:

Unauthorized access to private artifacts
Data tampering (modifying existing artifacts)
Impersonation (publishing as another user)
Replay attacks
Denial of service

8.2 Mitigations

Threat	Mitigation
Unauthorized access	Multi-tenant ACLs + encryption at rest
Tampering	Content hashing (SHA-256) + signature verification
Impersonation	Ed25519 signatures (user keypair)
Replay	Timestamp validation (reject if > 5 min old)
DoS	Rate limiting (per tenant_id, per user_id)

8.3 Key Management

User Keypair:

Private Key: ed25519:secret (NEVER transmitted)
Public Key: ed25519:public (stored in user profile)
Node ID: hex(public_key) (32 bytes = 64 hex chars)

Key Generation Methods:

Mnemonic-based (Recommended) — BIP-39 compatible recovery phrase
Random generation — Cryptographically secure random 32 bytes
Import from backup — Restore from encrypted backup file

8.4 Identity Recovery (Mnemonic)

Users can generate their keypair from a 12-word recovery phrase (BIP-39 standard):

abandon ability able about above absent absorb abstract absurd abuse access accident

Derivation Process:

from mnemonic import Mnemonic
from cryptography.hazmat.primitives.asymmetric.ed25519 import Ed25519PrivateKey

# 1. Generate mnemonic (128 bits entropy = 12 words)
m = Mnemonic("english")
mnemonic = m.generate(128)  # "word1 word2 ... word12"

# 2. Derive 64-byte seed via PBKDF2-SHA512 (BIP-39 standard)
seed = m.to_seed(mnemonic, passphrase="")  # Optional passphrase

# 3. Use first 32 bytes as Ed25519 private key
private_key = Ed25519PrivateKey.from_private_bytes(seed[:32])
public_key = private_key.public_key()

# 4. Node ID = hex(public_key)
node_id = public_key.public_bytes_raw().hex()

Recovery:

Same mnemonic + passphrase = same keypair = same Node ID
Users can move between devices by memorizing/storing their 12 words
Security: Anyone with the mnemonic has full access to the identity

CLI Commands:

kcp identity create     # Generate new identity with recovery phrase
kcp identity recover    # Restore identity from recovery phrase
kcp identity show       # Display current Node ID and fingerprint
kcp identity export     # Export to encrypted backup file
kcp identity import     # Import from backup file

8.5 Signature Generation

import hashlib
import ed25519

# 1. Remove 'signature' field from payload
payload_without_sig = {k: v for k, v in payload.items() if k != 'signature'}

# 2. Canonical JSON (sorted keys, no whitespace)
canonical = json.dumps(payload_without_sig, sort_keys=True, separators=(',', ':'))

# 3. Sign with user's private key
signature = ed25519.sign(canonical.encode('utf-8'), user_private_key)

# 4. Add signature to payload
payload['signature'] = signature.hex()

8.6 Signature Verification

# Server retrieves user's public key
user_public_key = get_user_public_key(payload['user_id'])

# Reconstruct canonical payload
canonical = json.dumps({k: v for k, v in payload.items() if k != 'signature'}, 
                       sort_keys=True, separators=(',', ':'))

# Verify signature
ed25519.verify(bytes.fromhex(payload['signature']), 
               canonical.encode('utf-8'), 
               user_public_key)

9. Federation & P2P Sync

9.1 Architecture

┌──────────────┐      ┌──────────────┐      ┌──────────────┐
│  Node A      │◄────►│  Node B      │◄────►│  Node C      │
│  (acme-corp) │      │  (beta-inc)  │      │  (gamma-llc) │
└──────────────┘      └──────────────┘      └──────────────┘
       ↕                      ↕                      ↕
   [DHT: Distributed Hash Table for discovery]

9.2 Peer Discovery

Each node announces itself to DHT: "Node A has reports with tags [X, Y, Z]"
When searching, client queries DHT: "Who has reports matching tag X?"
DHT returns list of peers
Client fetches directly from peers (P2P)

9.3 Sync Protocol (Simplified)

1. Node A publishes report with ID=abc123
2. Node A announces to DHT: "I have abc123"
3. Node B queries DHT: "Who has abc123?"
4. DHT responds: "Node A"
5. Node B fetches from Node A (IPFS or HTTP)

10. Versioning

10.1 Protocol Versioning

Current version: v0.2
Version in URL: /kcp/v1/artifacts (major version only)
Backward compatibility: Servers MUST support all v1.x clients

10.2 Payload Versioning

Each payload has "version": "1" field
Future breaking changes increment version (2, 3, etc.)
Servers MUST reject unsupported versions with 400 Bad Request

11. Error Handling

11.1 HTTP Status Codes

Code	Meaning	Example
200	Success	Report retrieved
201	Created	Report published
400	Bad Request	Invalid payload format
401	Unauthorized	Invalid signature
403	Forbidden	ACL violation
404	Not Found	Report doesn't exist
409	Conflict	Report ID already exists
429	Rate Limited	Too many requests
500	Server Error	Internal failure

11.2 Error Response Format

{
  "error": {
    "code": "INVALID_SIGNATURE",
    "message": "Ed25519 signature verification failed",
    "details": {
      "user_id": "alice@example.com",
      "report_id": "abc123"
    }
  }
}

12. Future Extensions

Collaborative Editing: Multiple users edit same artifact (CRDT-based)
Notifications: Subscribe to new reports matching tags
Analytics: Usage metrics (most viewed, most cited)
AI-to-AI Discovery: Agents autonomously discover relevant artifacts

Appendix A: Example Implementations

See /sdk directory for reference implementations in Python, TypeScript, and Go.

Appendix B: Comparison with Related Protocols

Feature	KCP	HTTP	Git	IPFS	RDF/SPARQL
Knowledge artifacts	✅	❌	❌	❌	❌
Lineage tracking	✅	❌	Partial	❌	❌
Multi-tenant ACL	✅	❌	❌	❌	❌
Semantic search	✅	❌	❌	❌	✅
P2P federation	✅	❌	❌	✅	❌
Content-addressed	✅	❌	✅	✅	❌

Appendix C: References

OSI Model — ISO/IEC 7498-1:1994
Ed25519 — RFC 8032
IPFS — https://docs.ipfs.tech
libp2p — https://libp2p.io
Semantic Web — https://www.w3.org/standards/semanticweb/

Document Status: Experimental Draft
Next Review: May 2026
Feedback: Open an issue at https://github.com/kcp-protocol/kcp

FilesExpand file tree

SPEC.md

Latest commit

History