feat: multi-tenant gRPC storage with embedded Rust in NativeAOT#48
Merged
feat: multi-tenant gRPC storage with embedded Rust in NativeAOT#48
Conversation
Define the StorageService gRPC interface for multi-tenant document storage: - Session lifecycle (load, save, delete, list, exists) - Index operations for session metadata - WAL operations with streaming support for large entries - Checkpoint management - Distributed lock operations with TTL All operations include TenantContext for multi-tenant isolation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New docx-mcp-storage crate implementing multi-tenant storage:
- Cargo workspace setup with proto compilation (tonic-build)
- StorageBackend trait with LocalStorage implementation
- LockManager trait with FileLock implementation
- gRPC service supporting TCP and Unix socket transports
- Tenant-aware file organization: {base}/{tenant_id}/sessions/
Supports all storage operations: sessions, WAL, checkpoints, index, locks.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Major refactor of .NET components for multi-tenant architecture: New DocxMcp.Grpc project: - IStorageClient interface for storage abstraction - StorageClient implementation with gRPC streaming - GrpcLauncher for auto-launching local gRPC server - TenantContextHelper with AsyncLocal for per-request tenant - StorageClientOptions for configuration SessionManager rewrite: - All operations now tenant-aware via TenantContextHelper - Delegates storage to IStorageClient (no local persistence) - Removed direct file system access Removed local storage code: - Deleted SessionStore.cs (replaced by gRPC) - Deleted MappedWal.cs (WAL managed by storage server) - Deleted SessionLock.cs (locks managed by storage server) CLI updates: - Global --tenant flag support - Auto-launch gRPC server via Unix socket Test infrastructure: - MockStorageClient for unit testing without gRPC - Updated project references Version bump to 1.6.0 across all projects. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New docx-mcp-proxy crate for remote MCP client access: - Axum-based HTTP server with Streamable HTTP transport - Configuration for D1 database credentials - Environment-based configuration for PAT validation - Placeholder for D1 PAT validation and tenant routing The proxy validates Bearer tokens against Cloudflare D1 and extracts tenant_id for multi-tenant request routing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Multi-stage Dockerfiles for production deployment: Main Dockerfile (docx-mcp + storage): - Rust builder stage for docx-mcp-storage - .NET builder stage with NativeAOT - Runtime stage with all binaries docx-mcp-storage/Dockerfile: - Standalone gRPC storage server - TCP transport on port 50051 - Health check via grpc_health_probe docx-mcp-proxy/Dockerfile: - SSE/HTTP proxy server - HTTP port 8080 - Health check via curl All images use non-root users for security. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Docker Compose configuration for local development: Services: - storage: gRPC storage server (port 50051) - mcp: MCP stdio server (interactive) - cli: CLI tool (profile: cli) - proxy: SSE/HTTP proxy (profile: proxy, port 8080) Volumes: - storage-data: persistent session storage - sessions-data: MCP session data Usage examples in comments for common scenarios. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit moves locking from the client (.NET) to the server (Rust) by replacing explicit lock RPCs with atomic index operations that handle concurrency internally. Changes: - Remove AcquireLock/ReleaseLock/RenewLock RPCs from proto - Add atomic index operations: AddSessionToIndex, UpdateSessionInIndex, RemoveSessionFromIndex (server acquires/releases locks internally) - Remove WithLockedIndex methods and _holderId/_indexLock from SessionManager - Rename DTO types with Dto suffix to avoid proto-generated type conflicts (SessionInfoDto, WalEntryDto, CheckpointInfoDto, SessionIndexEntryDto) - Fix GrpcLauncher to find Rust binary via correct relative paths - Update .gitignore for Rust artifacts and Claude Code files This fixes the ParallelCreation_NoLostSessions race condition where parallel session creation could lose sessions due to client-side lock/load/save/unlock races. Server-side atomic operations ensure index updates are serialized correctly. All 428 tests pass. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add build-storage job that builds docx-mcp-storage for 6 targets: linux-x64, linux-arm64, macos-x64, macos-arm64, windows-x64, windows-arm64 - Tests now download linux-x64 storage server before running - Windows installer downloads platform-specific storage server - macOS installer downloads both arch binaries and creates universal binary - Implement fork/join semantics: parent kills child via ProcessExit event - Add unique PID-based socket paths to prevent conflicts - Add parent death monitoring (prctl on Linux, polling fallback on macOS/Windows) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Only trigger Build, Test & Release workflow when: - src/, tests/, crates/ code changes - Cargo.toml/Cargo.lock changes - Dockerfile, docker-compose files change - installers/ or publish.sh changes - Workflow itself changes Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add protobuf compiler installation for all platforms: - Linux: apt-get install protobuf-compiler - macOS: brew install protobuf - Windows: choco install protoc Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add #[cfg(unix)] to UnixListener import and Unix transport handling - Define SYNCHRONIZE constant locally to avoid Windows feature issues - Return error on Windows when Unix transport is requested - Add Win32_Security feature to windows-sys Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The native GitHub paths filter evaluates against entire PR diff, not per-push. Use dorny/paths-filter to check actual changes in each push and skip website build when unrelated files change. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Ensure session storage path is consistent between .NET and Rust: - Add LocalStorageDir to StorageClientOptions - Support both LOCAL_STORAGE_DIR and DOCX_SESSIONS_DIR env vars - Pass --local-storage-dir when launching storage server - Default: LocalApplicationData/docx-mcp/sessions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tests now use a unique temp directory for session storage, ensuring complete isolation from production data and other test runs. The temp directory is cleaned up when DisposeStorageAsync is called. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The file doesn't exist in the repo - was likely removed previously. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…llers - Windows: Add docx-mcp-storage.exe to Inno Setup script - macOS: Add docx-mcp-storage to PKG installer and sign it - Update documentation in installers to mention storage server Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…collision Fixes Docker build error due to unstable_name_collisions warning. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change default tenant from "local" to "" (empty string) so sessions
are stored directly in {base_dir}/sessions/ rather than
{base_dir}/local/sessions/, maintaining compatibility with the
legacy session storage layout.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The Rust gRPC storage server now correctly reads and writes files in the exact same format as the .NET code: **WAL format (.wal files)**: - 8-byte little-endian i64 header = data length (NOT including header) - JSONL content: each entry is a JSON line ending with \n - Raw .NET WalEntry JSON bytes are stored/retrieved as-is **Session/Checkpoint DOCX files**: - Strip 8-byte .NET header prefix when loading (detects PK signature) - Returns pure DOCX content starting with PK\x03\x04 **Session Index (index.json)**: - Changed from HashMap to Vec<SessionIndexEntry> to match .NET format - Added version field, id per entry, docx_file, wal_count, cursor_position - Uses serde aliases for field name compatibility (modified_at/last_modified_at) **Other changes**: - Added serde_bytes for efficient binary serialization of patch_json - Added tonic-reflection for gRPC service introspection - Allow empty tenant_id for backward compatibility with legacy paths - Comprehensive tests for .NET format compatibility Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- SessionIndex now uses List<SessionIndexEntry> instead of Dictionary - Added Id field to SessionIndexEntry for array-based format - Added helper methods: GetById, TryGetValue, ContainsKey, Upsert, Remove - Fixed checkpoint positions to use int (matching .NET WAL format) - Added WalPosition property for backward compatibility - StorageClientOptions: clarified base directory vs sessions directory - SessionManager: handle legacy WAL formats gracefully during restore Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The truncate_wal function was using "keep_from" semantics (keep entries from position N onwards) but .NET expected "keep_count" semantics (keep first N entries). This caused the Undo_ThenNewPatch_DiscardsRedoHistory test to fail because: - After undo to position 1, cursor = 1 - Applying new patch called truncate_wal(1) - Old behavior: keep entries with position >= 1 (all entries kept) - New behavior: keep entries with position <= 1 (only first entry kept) Changes: - Renamed parameter from keep_from to keep_count - Changed partition logic: keep entries where position <= keep_count - Updated test to use correct value (1 instead of 2) - Updated trait documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ching service, sync service
Owner
Author
|
missing: the whole is working into docker compose it integrates into claude desktop / claude.ai |
…cloudflare S3 client Provision R2 bucket, KV namespace, D1 database, and R2 API token via Pulumi Python. Import existing resources (D1 auth, KV session). Add env-setup.sh to source all Cloudflare env vars from Pulumi outputs. Fix aws-sdk-s3 BehaviorVersion panic in storage-cloudflare. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… limits send_with_retry wraps all KV HTTP calls with up to 5 retries, starting at 200ms and doubling each attempt. Prevents cascading failures under heavy load from Cloudflare KV rate limiting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Scale-to-zero applied via `koyeb services update --min-scale 0` since the Pulumi provider incorrectly requires routes for scale-to-zero (Koyeb API/CLI accepts it on mesh-only services). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…s to CLAUDE.md Add comprehensive operational documentation: Koyeb CLI cheat sheet, mcptools usage for local and production proxy testing via mcp-remote, Dockerfile local testing workflow, and Koyeb container debugging. Install grpcurl in mcp-http Dockerfile for gRPC debugging inside containers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
/health now only checks that the proxy itself is running — no upstream dependency. Koyeb health checks were failing when mcp-http was slow to start, causing the edge to return 502 for all traffic. Added /upstream-health for deep health checks (proxy + mcp-http backend). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace axum::serve (HTTP/1.1 only) with hyper-util auto::Builder which negotiates HTTP/1.1 or HTTP/2 (h2c) per connection. This fixes 502 errors on Koyeb where the edge may connect via HTTP/2. Also split /health (liveness, no upstream dep) from /upstream-health (deep check including mcp-http backend). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause: all 4 services in the same Koyeb app shared route "/" on the same domain. Koyeb's edge routed traffic to the wrong service (e.g. gRPC storage instead of the HTTP proxy) → 502. Fix: internal services (mcp-http, storage, gdrive) now use protocol=tcp with no public routes — they are only reachable via Koyeb service mesh. Only the proxy keeps protocol=http with route "/". - infra/__main__.py: public→http+route, internal→tcp+no routes+min=1 - infra/koyeb-fix-routes.sh: script to fix routes via Koyeb API - CLAUDE.md: document tcp/mesh architecture, PAT token warning, 502 debug - Cargo.lock: hyper/hyper-util/tower deps for dual-stack h2c proxy Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Complete multi-tenant architecture with gRPC storage, SSE proxy with PAT auth, Google Drive integration, and cloud source support — replacing direct filesystem access with a proper storage abstraction layer.
Architecture
IHistoryStorage(sessions/WAL/index/checkpoints — can be remote R2) +ISyncStorage(file sync/watch — always local embedded or GDrive)sync.MaybeAutoSave()aftersessions.AppendWal(). SessionManager and SyncManager are fully independent.docx-storage-local): local filesystem backend with tenant-aware paths, file locking, SHA256-based change detectiondocx-storage-localbinary over TCP/Unix socket for server deploymentsdocx-storage-cloudflare): R2-only storage with ETag-based optimistic locking (CAS), no KV dependencyKey changes
IStorageClient/StorageClient→IHistoryStorage/HistoryStorageClient+ISyncStorage/SyncStorageClientIHistoryStorage(removed all sync/watch/tracker logic)ISyncStorage, handlesRegisterAndWatch,MaybeAutoSave,Save,StopWatchlib.rsexposes C entry points (docx_storage_init,docx_pipe_read/write/flush,docx_storage_shutdown), .NET calls via P/Invoke (NativeStorage.cs,InMemoryPipeStream.cs)register_sourcefix: creates SessionIndexEntry if absent (dual-server mode:AddSessionToIndexgoes to remote, butRegisterSourcegoes to local)Phase H — SSE Proxy multi-tenant ✅
X-Tenant-IdinjectionSessionManagerPoolfor multi-tenant single-process MCPproxy → mcp-http → storage(local) orproxy → mcp-http → storage + gdrive(cloud)Phase G — Google Drive gRPC server ✅
docx-storage-gdrive: SourceSyncService + ExternalWatchService for Google Driveoauth_connectiontable), auto-refresh viaTokenManagergdrive://{connection_id}/{file_id}ConnectionsManagercomponent in dashboard for browsing external storageCloud Source Support ✅
DocumentTools.DocumentOpensupports cloud sources (Google Drive) viasource_type+connection_id+file_idResolveSourceType()infers source type and blocks local sources in cloud modeSyncManager.ReadSourceBytes()abstracts local vs cloud file readsExternalChangeGate+ExternalChangeToolswork with cloud sources (not just local disk)DocumentSavepreserves existing source type on save-asExternalChangeGatestate persisted in gRPC storage index (survives backend restart)Website ✅
Pending
Phase D — Validation
Phase W — WAL/Sessions viewer
Phase K — Déploiement Koyeb
Infra
Relates to
Test plan
STORAGE_GRPC_URL→ Cloudflare R2)🤖 Generated with Claude Code