An OpenAI-Compatible Letta Proxy - Stateful Gateway for Persistent AI Agents
The Librarian is a stateful, OpenAI-compatible gateway that allows clients to interface with persistent Letta agents while speaking the standard OpenAI API protocol. This means any OpenAI-compatible client (LangChain, Autogen, Cursor, etc.) can route through The Librarian and transparently gain persistent context, tool access, and self-tuning behavior.
The Librarian serves as a middleware proxy that:
- Maintains Persistent Context: Uses Letta memory blocks to preserve conversation history across sessions
- Provides Tool Access: Enables SMCP/MCP toolchains through the agent interface
- Supports Self-Tuning Behavior: Leverages archival memory for pattern-aware responses
- Offers Provider Abstraction: Works with OpenAI, Anthropic, Venice, Ollama, and other LLM providers via Letta
- Maintains Full OpenAI Compatibility: Drop-in replacement for OpenAI API endpoints
- Python 3.10 or higher
- A self-hosted Letta server (cloud support may come in a future version)
# Clone the repository
git clone <repository-url>
cd librarian
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Copy configuration template
cp config.example config
# Edit config with your self-hosted Letta server details
# Set LETTA_BASE_URL (and LETTA_API_KEY if your Letta server requires authentication)Before running The Librarian, you need to create the agents in your Letta server:
# Bootstrap agents in Letta server
cd bootstrap
python bootstrap_librarian.py --config bootstrap.envThis creates a single agent:
librarian- The Librarian agent that handles all model names and dynamically switches between Worker Mode (procedural tasks) and Persona Mode (expressive responses) based on the request context
# Start the server
python main.py
# Or with uvicorn directly
uvicorn main:app --host 127.0.0.1 --port 8000The server will be available at http://127.0.0.1:8000
# Run integration tests
python tests/test_librarian_integration.py
# Test configuration
python tests/validate_config.py- Usage Guide - How to use The Librarian with OpenAI clients
- Configuration Guide - Complete configuration reference
- API Reference - OpenAI-compatible API endpoints
- Architecture - System architecture and design decisions
- Development Guide - Contributing and development setup
- Deployment Guide - Production deployment instructions
- Letta API Reference - Letta API integration details
- OpenAI-Letta Mapping - How OpenAI requests map to Letta
- Security Configuration - Security settings and best practices
- OpenAI API Compatibility: Full compatibility with
/v1/models,/v1/chat/completions, and/v1/completionsendpoints - Streaming Support: Real-time streaming responses via Server-Sent Events (SSE)
- Dual-Mode Operation: Automatic switching between Worker Mode (procedural) and Persona Mode (expressive)
- Persistent Memory: Conversation history maintained across sessions via Letta memory blocks
- Tool Synchronization: Dynamic tool attachment and management
- Load Management: Automatic request queuing and agent duplication for high concurrency
- Token Management: Accurate token counting and context window management
- Error Handling: Comprehensive error handling with automatic retry and summarization
- Context Window Management: Automatic context window adjustment and conversation summarization
- Per-Request Configuration: Dynamic temperature and max_tokens configuration per request
- Request Queuing: Buffered request queues with semaphore-based concurrency control
- Auto-Duplication: Automatic agent cloning for high-load scenarios
- API Call Indicators: All requests marked with
[API]indicator for agent awareness
External Client (OpenAI SDK / LangChain / Cursor)
β standard /v1/chat/completions
The Librarian Gateway (FastAPI middleware)
β persistent Letta agent (The Librarian)
β memory, reasoning, tools, archival store
β downstream LLM (OpenAI / Anthropic / Venice / etc.)
The Librarian acts as a transparent proxy, translating OpenAI API requests into Letta agent interactions while maintaining full compatibility with existing OpenAI clients.
GET /v1/models- List available modelsGET /v1/models/{model_id}- Get model information
POST /v1/chat/completions- Create chat completion (streaming and non-streaming)
POST /v1/completions- Legacy completion endpoint
GET /health- Health check endpointGET /- Root endpoint with service information
All endpoints maintain full OpenAI API compatibility. See API Reference for detailed documentation.
The Librarian is configured via environment variables. See config.example for all available options.
Key configuration areas:
- Server Configuration: Host, port, debug mode
- Letta Server: Base URL of your self-hosted Letta server (API key if authentication is required)
- Agent Configuration: Agent IDs and model mappings
Note: The Librarian currently requires a self-hosted Letta server. Cloud/hosted Letta support may be added in a future version.
- Security: IP filtering, API key authentication
- Performance: Concurrency limits, queue settings
- Logging: Log levels and formats
See Configuration Guide for complete details.
The Librarian supports multiple security features:
- IP Filtering: Allow/block specific IP addresses or ranges
- API Key Authentication: Optional API key requirement
- Rate Limiting: Configurable rate limits
- Request Validation: Input validation and sanitization
- Security Logging: Audit logging for security events
See Security Configuration for setup instructions.
# Run all unit tests (excludes integration/E2E to avoid burning tokens)
pytest -k "not integration and not e2e" tests/ -v
# Run all tests including integration/E2E (requires running server)
pytest tests/ -v
# Run only integration/E2E tests (requires running server)
pytest -m integration tests/ -v
# Run with coverage report
pytest --cov=src --cov-report=term-missing -k "not integration and not e2e" tests/
# Validate configuration
python tests/validate_config.pyTest Coverage: 93.97% unit test coverage. See Test Coverage Report for details.
librarian/
βββ main.py # FastAPI application entry point
βββ src/librarian/ # Core library components
β βββ model_registry.py # Model-to-agent mapping
β βββ message_translator.py # OpenAI-to-Letta message conversion
β βββ response_formatter.py # Letta-to-OpenAI response formatting
β βββ token_counter.py # Token counting and usage calculation
β βββ tool_synchronizer.py # Tool attachment and management
β βββ load_manager.py # Request queuing and load management
βββ bootstrap/ # Agent bootstrap scripts
β βββ bootstrap_librarian.py # Agent creation script
βββ tests/ # Test suites
βββ docs/ # Documentation
βββ config.example # Configuration template
Contributions are welcome! Please see Development Guide for:
- Development setup
- Code style guidelines
- Testing requirements
- Pull request process
- Code: Licensed under AGPL-3.0
- Documentation: Licensed under CC-BY-SA-4.0
See the LICENSE files for full terms.
The Librarian is part of the Sanctum and Animus ecosystem, providing persistent intelligence and context continuity for AI applications. This project is built on and integrates with the Letta ecosystem.
For issues, questions, or contributions:
- Check the documentation first
- Review existing issues
- Open a new issue with detailed information
The Librarian - Preserving context, maintaining continuity, enabling persistent intelligence.