| title | Required Features |
|---|---|
| version | 1.0.0-draft |
| last_updated | 2026-01-11 |
| status | Draft |
| category | Compliance Requirements |
Version: 1.0.0-draft
This document defines the minimum set of features that every LEAF spec implementation MUST support to be considered compliant with the specification.
Philosophy: Like the original RealWorld, this spec defines what features must exist, not how to implement them. All implementations must support these capabilities, but can use any technology stack, architecture, or algorithms.
- ✅ MUST - Required for compliance
- 🔶 SHOULD - Highly recommended, affects user experience
- ⭕ MAY - Optional enhancements
Requirements:
- Users can register with email and password
- Users can log in with email and password
- Authentication returns a token for API access
- Token must be included in subsequent requests
- Tokens should expire (recommended: 7-30 days)
API Endpoints:
POST /api/auth/registerPOST /api/auth/login
Test Criteria:
- Can create new user account
- Cannot register duplicate email
- Can log in with valid credentials
- Receive JWT or equivalent token
- Token works for authenticated endpoints
Requirements:
- View current user profile
- Update display name and preferences
- Track basic usage metrics (document count, token usage)
API Endpoints:
GET /api/userPUT /api/user
Requirements:
- Upload files via multipart/form-data
- Support at minimum: PDF, TXT, MD (Markdown)
- Maximum file size: 50MB (may be lower)
- Return document ID immediately
- Process asynchronously (status: processing → ready)
API Endpoints:
POST /api/documents
Test Criteria:
- Can upload PDF file
- Can upload text file
- Can upload markdown file
- Document has unique ID
- Document status transitions correctly
- Rejected if file too large
Requirements:
- Create document from text/markdown via JSON
- Same processing workflow as uploaded files
- Support titles and tags
API Endpoints:
POST /api/documents(JSON body)
Test Criteria:
- Can create text document via API
- Content stored correctly
- Processed and ready for queries
Requirements:
- Extract text from uploaded files (PDF parsing)
- Split documents into chunks for embedding
- Generate embeddings for each chunk
- Store embeddings in vector database
- Update document status when complete
- Handle processing failures gracefully
Implementation Details:
- Chunking strategy is implementation-specific
- Chunk size should be 200-1000 tokens (recommended)
- Overlap between chunks (recommended: 10-20%)
- Metadata preserved (page numbers, sections)
Test Criteria:
- Document moves from
processingtoready - Chunks are searchable
- Processing failures set status to
failed - Error messages provided for failures
Requirements:
- List all user's documents with pagination
- Filter by status and tags
- Sort by creation date, title, or update date
- Retrieve single document by ID with metadata
- Include processing status and chunk count
API Endpoints:
GET /api/documentsGET /api/documents/:id
Test Criteria:
- Returns paginated document list
- Filtering works correctly
- Sorting works correctly
- Document details include all required fields
Requirements:
- Update document title and tags
- Update content triggers reprocessing
- Delete document removes file and embeddings
- Deleted documents removed from conversations
API Endpoints:
PUT /api/documents/:idDELETE /api/documents/:id
Test Criteria:
- Can update metadata without reprocessing
- Content updates trigger reprocessing
- Deletion removes all traces
- Cannot access deleted documents
Requirements:
- Retrieve relevant document chunks for user queries
- Use vector similarity search (cosine, dot product, etc.)
- Return top-k most relevant chunks (k=3-10)
- Generate AI response using retrieved context
- Include document citations in response
Core Algorithm:
- Embed user query
- Search vector database for similar chunks
- Construct prompt with retrieved chunks
- Generate response with LLM
- Return response with citations
Test Criteria:
- Responses include information from documents
- Citations reference correct documents
- Irrelevant documents not cited
- Responses stay grounded in provided context
Requirements:
- Create new conversation threads
- List user's conversations
- Retrieve conversation with message history
- Update conversation title
- Delete conversations
API Endpoints:
POST /api/conversationsGET /api/conversationsGET /api/conversations/:idPUT /api/conversations/:idDELETE /api/conversations/:id
Test Criteria:
- Can create conversation
- Messages persist in conversation
- Conversation list shows recent activity
- Deletion removes all messages
Requirements:
- Maintain conversation context across multiple turns
- Include previous messages in context window
- Summarize or truncate old messages when context limit reached
- User can ask follow-up questions that reference prior messages
Implementation Notes:
- Context management strategy is implementation-specific
- Rolling window, summarization, or hybrid approaches acceptable
- Should handle at least 10-20 turns before truncation
Test Criteria:
- Follow-up questions work correctly
- System remembers conversation context
- No errors when conversation exceeds initial context
- Graceful degradation when context limit reached
Requirements:
- Stream AI responses token-by-token in real-time
- Support Server-Sent Events (SSE) or WebSockets
- Send citations after content generation
- Include token usage in final event
- Handle stream interruptions gracefully
Stream Events:
message_start- Begin generationcontent_delta- Incremental tokenscitations- Source referencesmessage_end- Completion metadataerror- If error occurs
API Endpoints:
POST /api/conversations/:id/messages(withstream: true)
Test Criteria:
- Responses stream in real-time
- Citations arrive with response
- Stream closes properly
- Errors handled without hanging
Requirements:
- Every AI response MUST include citations
- Citations reference specific document chunks
- Citations include:
- Document ID and title
- Text excerpt (50-500 chars)
- Relevance score
- Page number (if applicable)
- Multiple citations allowed per response
- Citations clickable/linkable to source
Test Criteria:
- All answers have at least one citation (or explicit "no relevant info found")
- Citations reference correct source material
- Excerpts match actual document content
- Relevance scores are reasonable (0.5-1.0)
Requirements:
- Search across all user documents
- Use same embedding model as RAG
- Return ranked results by relevance
- Filter by document IDs (optional)
- Minimum relevance threshold (optional)
- Pagination support
API Endpoints:
POST /api/search
Test Criteria:
- Search returns relevant results
- Results ranked by relevance
- Can limit to specific documents
- Irrelevant documents ranked lower
Requirements:
- Return chunk content with search results
- Include metadata (page, section)
- Support minimum relevance filtering
Requirements:
- Generate summary of single document
- Support length options: short, medium, long
- Support focus types: general, key_points, technical
- Include token usage
- Cache summaries (optional but recommended)
API Endpoints:
POST /api/documents/:id/summarize
Test Criteria:
- Summary captures document essence
- Length parameter affects output length
- Focus parameter affects content style
- Summaries are coherent and accurate
Requirements:
- Summarize across multiple documents
- Optional query to focus summary
- Include citations from source documents
- Handle large document sets (chunking if needed)
API Endpoints:
POST /api/summaries
Test Criteria:
- Summary synthesizes information across docs
- Query parameter focuses the summary
- Citations reference correct sources
- Works with 2-10+ documents
Requirements:
- Learn facts about user from conversations
- Store user preferences (summary length, streaming, etc.)
- Use memory to personalize responses
- User can view stored memory
API Endpoints:
GET /api/user(includes memory)PUT /api/user(update preferences)
Test Criteria:
- System remembers stated user preferences
- Responses reflect user context
- Memory visible in profile
Requirements:
- Consistent error response format
- Appropriate HTTP status codes
- User-friendly error messages
- Validation errors with field details
- Handle network failures
- Handle LLM API failures
Test Criteria:
- Invalid requests return 400/422 with details
- Unauthorized requests return 401
- Not found returns 404
- Rate limits return 429
- Server errors return 500
Requirements:
- Implement rate limiting per user
- Return rate limit headers
- Return 429 with retry-after on exceeded
- Reasonable limits (see API spec)
Test Criteria:
- Rate limits enforced
- Headers include limit info
- Error message includes retry time
Requirements:
- Users can only access their own data
- Document searches scoped to user
- Conversations scoped to user
- Authorization checks on all endpoints
Test Criteria:
- Cannot access other user's documents
- Cannot access other user's conversations
- Cannot search other user's data
Requirements:
- Passwords hashed (bcrypt, argon2, etc.)
- Tokens signed and verified
- HTTPS recommended for production
- Secure token storage practices
Test Criteria:
- Passwords never returned in responses
- Invalid tokens rejected
- Token expiration enforced
Recommended Targets:
- Document upload: < 1s to return ID
- Document processing: < 30s for 10-page PDF
- Message response (streaming): < 500ms to first token
- Search: < 1s for results
- Document list: < 500ms
Recommendations:
- Support at least 100 documents per user
- Support conversations with 50+ messages
- Handle concurrent requests
- Efficient vector search at scale
Requirements:
- Document upload interface
- Document library view
- Conversation interface with streaming
- Citation display (clickable to source)
- Search interface
- Basic error messaging
Requirements:
- Works on desktop (1024px+)
- Works on tablet (768px+)
- Works on mobile (320px+)
Requirements:
- Keyboard navigation
- Screen reader support
- ARIA labels
- Color contrast (WCAG AA)
To verify compliance, implementations must:
- Pass API Tests - All endpoints return correct responses
- Pass Feature Tests - All required features work as specified
- Pass Integration Tests - End-to-end workflows complete successfully
- Pass Quality Tests - AI responses meet quality thresholds
See test-suite.md for detailed test specifications.
- Folders/Organization - Hierarchical document organization
- Tags Management - Tag creation, editing, deletion
- Export - Export conversations or documents
- Sharing - Share documents with other users
- Voice Input - Speech-to-text for queries
- Image Support - Upload and query images
- Code Execution - Run code in responses
- Agents - Multi-step reasoning with tools
- Custom Embeddings - User-trained models
- Analytics - Usage dashboards
- Integrations - External service connectors
These features are excellent additions but NOT required for spec compliance.
An implementation is compliant when it supports:
- ✅ User registration and authentication
- ✅ Document upload (PDF, TXT, MD minimum)
- ✅ Text document creation
- ✅ Document processing with embeddings
- ✅ Document listing, updating, deleting
- ✅ RAG-based question answering
- ✅ Conversation management
- ✅ Multi-turn conversations
- ✅ Response streaming (SSE or WebSocket)
- ✅ Citations in every response
- ✅ Semantic search across documents
- ✅ Single document summarization
- ✅ Multi-document summarization
- ✅ Error handling with consistent format
- ✅ Rate limiting
- ✅ Data isolation and security
- ✅ Responsive frontend with streaming UI
- ✅ Pass the automated test suite
Next: See test-suite.md for validation criteria and ui-requirements.md for frontend specifications.