| title | Data Models |
|---|---|
| version | 1.0.0-draft |
| last_updated | 2026-01-17 |
| status | Draft |
| category | Data Schema |
Version: 1.0.0-draft
This document defines the data schemas for all entities in the LEAF Specification. Schemas are defined using JSON Schema format but are implementation-agnostic - you can map these to any database, ORM, or data structure.
Implementation Note: These are logical schemas defining the data contract. Internal storage format, database schema, indexing, and optimization are implementation details.
Represents a user account in the system.
{
"id": "string (unique identifier)",
"email": "string (valid email, unique)",
"displayName": "string (optional, defaults to email)",
"passwordHash": "string (never exposed in API responses)",
"preferences": {
"defaultDocumentScope": "string (enum: all|recent|tagged)",
"summaryLength": "string (enum: short|medium|long)",
"streamingEnabled": "boolean"
},
"memory": {
"facts": ["string (learned facts about user)"]
},
"usage": {
"documentsCount": "integer",
"conversationsCount": "integer",
"totalTokensUsed": "integer"
},
"createdAt": "string (ISO 8601 datetime)",
"updatedAt": "string (ISO 8601 datetime)"
}Field Requirements:
id- Required, unique, immutableemail- Required, unique, valid email formatdisplayName- Optional, defaults to email if not providedpasswordHash- Required for storage, never returned in API responsespreferences- Optional, defaults to system defaultsmemory.facts- Optional, empty array by defaultusage- System-calculated, read-onlycreatedAt- Required, set on creationupdatedAt- Required, updated on modification
Example:
{
"id": "usr_a8f3c92b",
"email": "alice@example.com",
"displayName": "Alice",
"preferences": {
"defaultDocumentScope": "all",
"summaryLength": "medium",
"streamingEnabled": true
},
"memory": {
"facts": [
"Researching machine learning for thesis",
"Prefers technical summaries"
]
},
"usage": {
"documentsCount": 23,
"conversationsCount": 5,
"totalTokensUsed": 45230
},
"createdAt": "2024-01-10T08:00:00Z",
"updatedAt": "2024-01-15T11:30:00Z"
}Represents an uploaded or created document.
{
"id": "string (unique identifier)",
"userId": "string (foreign key to User.id)",
"title": "string (1-200 characters)",
"content": "string (full text content, optional for binary files)",
"contentType": "string (MIME type)",
"size": "integer (bytes)",
"status": "string (enum: processing|ready|failed)",
"tags": ["string"],
"url": "string (URL to file if stored externally)",
"metadata": {
"author": "string (optional)",
"pages": "integer (optional)",
"language": "string (optional, ISO 639-1)"
},
"chunkCount": "integer (number of embedded chunks)",
"createdAt": "string (ISO 8601 datetime)",
"updatedAt": "string (ISO 8601 datetime)",
"processedAt": "string (ISO 8601 datetime, nullable)"
}Field Requirements:
id- Required, unique, immutableuserId- Required, must reference valid Usertitle- Required, 1-200 characterscontent- Optional (may not exist for binary files like PDFs)contentType- Required (e.g.,text/markdown,application/pdf)size- Required, file size in bytesstatus- Required, state machine:processing→readyorfailedtags- Optional, array of strings for categorizationurl- Optional, URL if file stored externallymetadata- Optional, extracted metadata from documentchunkCount- Required after processing, null during processingprocessedAt- Null until processing complete
Supported Content Types:
text/plaintext/markdownapplication/pdfapplication/vnd.openxmlformats-officedocument.wordprocessingml.document(DOCX)- Others at implementation discretion
Example:
{
"id": "doc_b7e2f91a",
"userId": "usr_a8f3c92b",
"title": "Neural Networks Fundamentals",
"contentType": "application/pdf",
"size": 2456789,
"status": "ready",
"tags": ["machine-learning", "research", "thesis"],
"url": "https://storage.example.com/documents/doc_b7e2f91a.pdf",
"metadata": {
"author": "Dr. Jane Smith",
"pages": 45,
"language": "en"
},
"chunkCount": 89,
"createdAt": "2024-01-12T14:20:00Z",
"updatedAt": "2024-01-12T14:20:00Z",
"processedAt": "2024-01-12T14:22:30Z"
}Represents a chat conversation thread.
{
"id": "string (unique identifier)",
"userId": "string (foreign key to User.id)",
"title": "string (1-200 characters)",
"documentIds": ["string (array of Document.id)"],
"messageCount": "integer",
"createdAt": "string (ISO 8601 datetime)",
"updatedAt": "string (ISO 8601 datetime)"
}Field Requirements:
id- Required, unique, immutableuserId- Required, must reference valid Usertitle- Required, 1-200 charactersdocumentIds- Optional, empty array means search all documentsmessageCount- System-calculated, read-onlycreatedAt- RequiredupdatedAt- Required, updated when messages added
Example:
{
"id": "conv_9d4c3f2a",
"userId": "usr_a8f3c92b",
"title": "Questions about Neural Networks",
"documentIds": ["doc_b7e2f91a", "doc_c3e1a45b"],
"messageCount": 8,
"createdAt": "2024-01-14T09:00:00Z",
"updatedAt": "2024-01-14T09:45:00Z"
}Represents a single message in a conversation.
{
"id": "string (unique identifier)",
"conversationId": "string (foreign key to Conversation.id)",
"role": "string (enum: user|assistant|system)",
"content": "string (message text)",
"reasoning": "string (optional, AI reasoning/thinking process)",
"citations": ["Citation (optional, only for assistant messages)"],
"relatedDocuments": ["string (array of Document.id, optional)"],
"tokenUsage": "TokenUsage (optional, only for assistant messages)",
"createdAt": "string (ISO 8601 datetime)"
}Field Requirements:
id- Required, unique, immutableconversationId- Required, must reference valid Conversationrole- Required, one of:user,assistant,systemcontent- Required, message text (markdown supported)reasoning- Optional, AI reasoning/thinking process (assistant only). See note below.citations- Optional, array of Citation objects (assistant only)relatedDocuments- Optional, suggested related documentstokenUsage- Optional, token metrics (assistant only)createdAt- Required, immutable
Reasoning Field: Implementations MAY store the AI's reasoning/thinking process separately from the final response content. This enables UI patterns like "Thought Accordion" where reasoning can be shown collapsed or hidden. Support depends on the underlying LLM's capabilities (e.g., OpenAI o1, Claude with extended thinking). Implementations without reasoning support simply omit this field.
Role Descriptions:
user- Message from the userassistant- AI-generated responsesystem- System messages (e.g., "Conversation started")
Example (User Message):
{
"id": "msg_1a2b3c4d",
"conversationId": "conv_9d4c3f2a",
"role": "user",
"content": "What are the main components of a neural network?",
"createdAt": "2024-01-14T09:15:00Z"
}Example (Assistant Message):
{
"id": "msg_2b3c4d5e",
"conversationId": "conv_9d4c3f2a",
"role": "assistant",
"content": "Based on your documents, the main components of a neural network are:\n\n1. **Input Layer** - Receives the initial data\n2. **Hidden Layers** - Process information through weighted connections\n3. **Output Layer** - Produces the final prediction\n\nEach layer consists of neurons that apply activation functions to transform the data.",
"citations": [
{
"documentId": "doc_b7e2f91a",
"documentTitle": "Neural Networks Fundamentals",
"chunkId": "chunk_42",
"excerpt": "A neural network consists of an input layer, one or more hidden layers, and an output layer. Each layer contains neurons that apply activation functions...",
"relevanceScore": 0.94,
"page": 12
}
],
"relatedDocuments": ["doc_c3e1a45b"],
"tokenUsage": {
"prompt": 1450,
"completion": 120,
"total": 1570
},
"createdAt": "2024-01-14T09:15:05Z"
}Example (Assistant Message with Reasoning):
{
"id": "msg_3c4d5e6f",
"conversationId": "conv_9d4c3f2a",
"role": "assistant",
"reasoning": "The user is asking about neural network components. Let me search the uploaded documents... Found 3 relevant chunks in 'Neural Networks Fundamentals'. Chunk 42 on page 12 has the most direct answer about architecture layers.",
"content": "Based on your documents, the main components of a neural network are:\n\n1. **Input Layer** - Receives the initial data\n2. **Hidden Layers** - Process information through weighted connections\n3. **Output Layer** - Produces the final prediction",
"citations": [
{
"documentId": "doc_b7e2f91a",
"documentTitle": "Neural Networks Fundamentals",
"chunkId": "chunk_42",
"excerpt": "A neural network consists of an input layer, one or more hidden layers, and an output layer...",
"relevanceScore": 0.94,
"page": 12
}
],
"tokenUsage": {
"prompt": 1450,
"completion": 180,
"total": 1630
},
"createdAt": "2024-01-14T09:20:05Z"
}Represents a reference to source material in a document.
{
"documentId": "string (foreign key to Document.id)",
"documentTitle": "string",
"chunkId": "string (identifier for specific chunk)",
"excerpt": "string (relevant text excerpt)",
"relevanceScore": "number (0-1, similarity score)",
"page": "integer (optional, page number if applicable)",
"section": "string (optional, section/chapter name)",
"metadata": {
"startChar": "integer (optional)",
"endChar": "integer (optional)"
}
}Field Requirements:
documentId- Required, must reference valid DocumentdocumentTitle- Required for displaychunkId- Required, identifies specific chunk in vector storeexcerpt- Required, 50-500 characters of relevant textrelevanceScore- Required, 0.0-1.0 similarity scorepage- Optional, page number if document has pagessection- Optional, section/chapter if availablemetadata- Optional, additional context
Example:
{
"documentId": "doc_b7e2f91a",
"documentTitle": "Neural Networks Fundamentals",
"chunkId": "chunk_42",
"excerpt": "A neural network consists of an input layer, one or more hidden layers, and an output layer. Each layer contains neurons that apply activation functions to transform the incoming data.",
"relevanceScore": 0.94,
"page": 12,
"section": "Chapter 2: Architecture",
"metadata": {
"startChar": 3450,
"endChar": 3680
}
}Represents a search result from semantic search.
{
"documentId": "string (foreign key to Document.id)",
"documentTitle": "string",
"chunkId": "string",
"content": "string (chunk content)",
"relevanceScore": "number (0-1)",
"metadata": {
"page": "integer (optional)",
"section": "string (optional)"
}
}Field Requirements:
documentId- RequireddocumentTitle- Required for displaychunkId- Requiredcontent- Required, full chunk textrelevanceScore- Required, 0.0-1.0metadata- Optional, context information
Example:
{
"documentId": "doc_b7e2f91a",
"documentTitle": "Neural Networks Fundamentals",
"chunkId": "chunk_56",
"content": "Backpropagation is the key algorithm used to train neural networks. It calculates gradients by propagating errors backward through the network, allowing weights to be adjusted to minimize loss.",
"relevanceScore": 0.89,
"metadata": {
"page": 28,
"section": "Chapter 4: Training"
}
}Represents a generated summary of one or more documents.
{
"id": "string (unique identifier)",
"userId": "string (foreign key to User.id)",
"documentIds": ["string (array of Document.id)"],
"query": "string (optional, focus query)",
"content": "string (summary text)",
"length": "string (enum: short|medium|long)",
"focus": "string (enum: general|key_points|technical|conclusions)",
"citations": ["Citation (optional)"],
"tokenUsage": "TokenUsage",
"createdAt": "string (ISO 8601 datetime)"
}Field Requirements:
id- Required, uniqueuserId- RequireddocumentIds- Required, array of documents summarizedquery- Optional, if summary focused on specific questioncontent- Required, summary textlength- Required, summary sizefocus- Required, summary typecitations- Optional, source referencestokenUsage- RequiredcreatedAt- Required
Length Guidelines:
short- ~100-200 wordsmedium- ~300-500 wordslong- ~600-1000 words
Example:
{
"id": "sum_7f8e9d0a",
"userId": "usr_a8f3c92b",
"documentIds": ["doc_b7e2f91a"],
"content": "This document provides a comprehensive overview of neural network fundamentals. Key topics include network architecture (input, hidden, and output layers), activation functions (ReLU, sigmoid, tanh), and training algorithms (backpropagation, gradient descent). The document emphasizes practical applications in image recognition and natural language processing.",
"length": "short",
"focus": "key_points",
"tokenUsage": {
"prompt": 3200,
"completion": 95,
"total": 3295
},
"createdAt": "2024-01-14T10:00:00Z"
}Represents LLM token consumption for an operation.
{
"prompt": "integer (input tokens)",
"completion": "integer (output tokens)",
"total": "integer (prompt + completion)"
}Field Requirements:
prompt- Required, non-negative integercompletion- Required, non-negative integertotal- Required, must equal prompt + completion
Example:
{
"prompt": 1450,
"completion": 120,
"total": 1570
}Standard pagination metadata for list endpoints.
{
"total": "integer (total items available)",
"limit": "integer (items per page)",
"offset": "integer (starting position)",
"hasMore": "boolean (more items available)"
}Field Requirements:
total- Required, total count of all itemslimit- Required, max items in current responseoffset- Required, starting position (0-indexed)hasMore- Required,trueif more pages exist
Example:
{
"total": 156,
"limit": 20,
"offset": 40,
"hasMore": true
}- Email: Valid email format (RFC 5322)
- Username: 3-30 characters, alphanumeric +
_and- - Title: 1-200 characters
- Content: 1-1000000 characters (1MB text limit)
- Tags: Each tag 1-50 characters, max 20 tags per document
- File Size: Max 50MB per document (implementation may vary)
- Page Limits: 1-100 for lists (default 20)
- Relevance Score: 0.0-1.0 (floating point)
- All dates in ISO 8601 format:
YYYY-MM-DDTHH:mm:ssZ - UTC timezone required
User 1:N Document
User 1:N Conversation
User 1:N Summary
Conversation 1:N Message
Conversation N:M Document (via documentIds)
Message N:M Citation
Citation N:1 Document
-
IDs: Use any unique identifier format (UUID, nanoid, ULID, auto-increment). Prefix with entity type is recommended (e.g.,
doc_,usr_) for debugging. -
Chunks: Document chunking is implementation-specific. Store chunk metadata to support citations.
-
Embeddings: Not explicitly modeled - internal to vector store implementation.
-
Soft Deletes: Consider soft delete patterns for Documents and Conversations to support undo/recovery.
-
Indexes: Implementations should index frequently queried fields (userId, status, createdAt, etc.).
-
Content Storage: Binary document files can be stored separately from metadata (blob storage, filesystem, etc.).
Next: See required-features.md for feature requirements.