Skip to content

alexkranias/curserve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Curserve: High-Performance Multi-Tenant Coding Agent Infrastructure

Scale coding agents from 1 user to 100+ users on a single server

Eliminate the subprocess bottleneck. Co-locate codebases with inference. Memory-map everything.

License: MIT


🎯 The Problem

Traditional coding agent architectures suffer from fundamental performance bottlenecks:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              TRADITIONAL ARCHITECTURE                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Client    β”‚ ◄─────────────────────► β”‚   LLM Server     β”‚
β”‚  (Laptop)   β”‚    Network Latency      β”‚   (vLLM/GPU)     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      β”‚
      β”‚  Local file system
      β”‚  
      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Codebase   β”‚
β”‚   (Disk)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Problems:

  1. Network Overhead: Every tool call (grep, ls, read) requires a round-trip to the client
  2. Process Spawn Overhead: Each grep spawns a subprocess (~10-15ms overhead)
  3. No Multi-Tenancy: Can't efficiently serve 100+ users on one GPU server
  4. Poor Locality: Code is on client, inference is on server

Example: A single agent turn with 10 tool calls

LLM β†’ grep    β†’ 15ms process spawn + network RTT
LLM β†’ grep    β†’ 15ms process spawn + network RTT
LLM β†’ read    β†’ network RTT
LLM β†’ grep    β†’ 15ms process spawn + network RTT
...
Total: 150-300ms overhead per turn (just for tooling!)

πŸ’‘ The Solution: Curserve

Curserve inverts the traditional architecture by co-locating codebases with inference and eliminating subprocess overhead through memory-mapped in-process operations.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  CURSERVE ARCHITECTURE                        β”‚
β”‚                                                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚              vLLM Process (GPU)                        β”‚  β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚  β”‚
β”‚  β”‚  β”‚  Request 1 β”‚  β”‚  Request 2 β”‚  β”‚  Request N β”‚       β”‚  β”‚
β”‚  β”‚  β”‚  (User A)  β”‚  β”‚  (User B)  β”‚  β”‚  (User C)  β”‚       β”‚  β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜       β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚           β”‚               β”‚               β”‚                 β”‚
β”‚           β”‚  Unix Domain  β”‚  Unix Domain  β”‚                 β”‚
β”‚           β”‚  Socket IPC   β”‚  Socket IPC   β”‚                 β”‚
β”‚           β–Ό               β–Ό               β–Ό                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚        Memory Search Service (Rust Daemon)            β”‚ β”‚
β”‚  β”‚                                                        β”‚ β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚  β”‚  β”‚ Codebase A   β”‚  β”‚ Codebase B   β”‚  β”‚ Codebase C  β”‚ β”‚ β”‚
β”‚  β”‚  β”‚ (mmap'd RAM) β”‚  β”‚ (mmap'd RAM) β”‚  β”‚ (mmap'd RAM)β”‚ β”‚ β”‚
β”‚  β”‚  β”‚              β”‚  β”‚              β”‚  β”‚             β”‚ β”‚ β”‚
β”‚  β”‚  β”‚ β€’ In-memory  β”‚  β”‚ β€’ In-memory  β”‚  β”‚ β€’ In-memory β”‚ β”‚ β”‚
β”‚  β”‚  β”‚   grep       β”‚  β”‚   grep       β”‚  β”‚   grep      β”‚ β”‚ β”‚
β”‚  β”‚  β”‚ β€’ 0.5-3ms    β”‚  β”‚ β€’ 0.5-3ms    β”‚  β”‚ β€’ 0.5-3ms   β”‚ β”‚ β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β–²
                       β”‚ SSH + Binary Invocation
                       β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚   curserve      β”‚
              β”‚   [workspace]   β”‚
              β”‚   [prompt]      β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     Clients

Benefits:

  • βœ… No process spawn overhead: grep/ls/read execute in-process
  • βœ… Memory-mapped I/O: 10-50x faster than subprocess ripgrep
  • βœ… Multi-tenant ready: One daemon serves 100+ concurrent agent sessions
  • βœ… Zero network latency: Tools run in the same datacenter as LLM
  • βœ… Scales to hundreds of users: Share GPU + storage infrastructure

πŸ—οΈ System Architecture

High-Level Data Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 1. Client SSH Connection                                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

ssh user@curserve-server
$ curserve ~/my-codebase "fix the authentication bug"

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 2. qwen-code-ipc Initialization                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

qwen-code-ipc starts
β”œβ”€ Connects to /tmp/mem_search_service_requests.sock
β”œβ”€ Sends: {"type": "alloc_pid", "pid": 12345, "repo_dir_path": "..."}
└─ Memory Search Service memory-maps entire codebase into RAM

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 3. Agent Execution Loop                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  vLLM (Qwen3)   β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚ "grep for         β”‚
                    β”‚  authentication   β”‚
                    β”‚  functions"       β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚   qwen-code-ipc Tool Layer  β”‚
              β”‚                             β”‚
              β”‚ grep() intercepts process   β”‚
              β”‚ spawn, calls IPC instead    β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚ IPC Request
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚ Memory Search Service       β”‚
              β”‚                             β”‚
              β”‚ Searches in-memory files    β”‚
              β”‚ Returns: "auth.py:42:..."   β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚ 0.5-3ms
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚     qwen-code-ipc           β”‚
              β”‚                             β”‚
              β”‚  Formats response for LLM   β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚    vLLM           β”‚
                    β”‚ "Found auth bug   β”‚
                    β”‚  on line 42..."   β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Repeat for 10-20 tool calls per agent turn
Total overhead: ~10-30ms (vs 150-300ms traditional)

Component Breakdown

curserve/
β”‚
β”œβ”€β”€ mem-search-service/          # Rust daemon for in-memory operations
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ lib.rs              # MmapCache: memory-mapped file management
β”‚   β”‚   β”œβ”€β”€ service.rs          # Unix socket IPC server
β”‚   β”‚   └── benchmark.rs        # Performance comparison tools
β”‚   β”œβ”€β”€ Cargo.toml              # Dependencies: ripgrep, memmap2, etc.
β”‚   └── target/release/
β”‚       └── mem-search-service  # Compiled daemon binary
β”‚
└── qwen-code-ipc/              # Modified qwen-code framework
    β”œβ”€β”€ packages/
    β”‚   β”œβ”€β”€ core/
    β”‚   β”‚   └── src/
    β”‚   β”‚       β”œβ”€β”€ tools/
    β”‚   β”‚       β”‚   └── ripGrep.ts     # Intercepted grep tool
    β”‚   β”‚       └── utils/
    β”‚   β”‚           └── ipcClient.ts   # Unix socket IPC client
    β”‚   └── cli/
    β”‚       └── src/
    β”‚           └── index.ts           # Entry point
    └── dist/
        └── cli.js                     # Compiled qwen-code binary

πŸ”§ Key Innovations

1. Memory-Mapped Codebases

Traditional coding agents spawn ripgrep subprocesses that read from disk on every search.

// mem-search-service/src/lib.rs

pub struct MmapCache {
    pub files: Vec<MmappedFile>,
}

impl MmapCache {
    pub fn new(root_path: &Path) -> Result<Self> {
        // Walk directory tree (respecting .gitignore)
        // Memory-map EVERY text file into RAM
        // ~10-20MB for typical codebases
    }
    
    pub fn search(&self, pattern: &str) -> Vec<Match> {
        // Search directly in RAM using ripgrep internals
        // No subprocess spawn, no disk I/O
        // 0.5-3ms vs 10-15ms subprocess
    }
}

Performance Impact:

Codebase Size Subprocess grep In-Memory grep Speedup
Small (100 files) ~10ms ~0.5ms 20x
Medium (500 files) ~15ms ~1ms 15x
Large (1000+ files) ~20ms ~3ms 7x

2. IPC-Based Tool Interception

We forked qwen-code and modified the tool layer to use IPC instead of spawning processes.

Before (qwen-code):

// Spawns new process for every grep call
async function grep(pattern: string) {
  const child = spawn('rg', [pattern, ...args]);
  return await collectOutput(child); // ~10-15ms overhead
}

After (qwen-code-ipc):

// packages/core/src/tools/ripGrep.ts

async function performRipgrepSearch(options) {
  try {
    // Try IPC first
    const output = await requestGrepIPC(
      workspacePath,
      pattern,
      [absolutePath],
      ipcOptions
    );
    return parseRipgrepOutput(output); // ~0.5-3ms
  } catch (error) {
    // Graceful fallback to subprocess if IPC unavailable
    return performRipgrepSearchDirect(options);
  }
}

IPC Protocol (Unix Domain Sockets):

Client                              Memory Search Service
  β”‚                                          β”‚
  β”œβ”€ Connect to /tmp/mem_search_...         β”‚
  β”‚                                          β”‚
  β”œβ”€ Send: {"type": "alloc_pid", ...} ────► β”‚
  β”‚                                          β”œβ”€ mmap codebase
  β”‚                                          β”œβ”€ create /tmp/qwen_code_response_12345.sock
  β”‚ ◄──── {"response_status": 1} ────────────
  β”‚                                          β”‚
  β”œβ”€ Send: {"type": "request_ripgrep",...}─►│
  β”‚                                          β”œβ”€ search in-memory files
  β”‚ ◄──── {"text": "file.py:42:..."} ────────
  β”‚                                          β”‚

3. Multi-Tenant Architecture

One mem-search-service daemon handles requests from 100+ concurrent qwen-code-ipc instances.

// mem-search-service/src/service.rs

struct ServiceState {
    codebases: HashMap<u32, MmapCache>,      // PID β†’ memory-mapped codebase
    response_sockets: HashMap<u32, UnixStream>, // PID β†’ response channel
}

// Three-threaded architecture:
// 1. Request listener: Accepts new connections
// 2. Connection acceptor: Manages per-client response sockets
// 3. Request worker: Executes searches in-memory

fn request_worker(rx: Receiver<Request>, state: Arc<Mutex<ServiceState>>) {
    loop {
        let (request, stream) = rx.recv().unwrap();
        
        match request {
            Request::AllocPid { pid, repo_dir_path } => {
                // Memory-map entire codebase
                let cache = MmapCache::new(&repo_dir_path)?;
                state.codebases.insert(pid, cache);
            }
            Request::RequestRipgrep { pid, pattern, .. } => {
                // Search in-memory, no I/O
                let results = state.codebases[&pid].search(&pattern)?;
                send_response(results);
            }
        }
    }
}

Resource Usage:

  • Memory: ~15-30MB per codebase (text files only, binaries skipped)
  • CPU: Shared across all users (ripgrep is already parallelized)
  • Storage: All codebases on fast NVMe (or network-attached if needed)

πŸ“Š Performance Benchmarks

Tool Call Latency

$ ./target/release/benchmark ~/linux-kernel "static inline" 100

Results:

================================================================================
Memory-Mapped Search (Curserve)
================================================================================
Average: 2.1ms
Min:     1.8ms
Max:     3.4ms
Matches: 1,247

================================================================================
Subprocess Ripgrep (Traditional)
================================================================================
Average: 14.3ms
Min:     12.1ms
Max:     18.7ms
Matches: 1,247

================================================================================
SPEEDUP: 6.8x faster
TIME SAVED: 12.2ms per search
================================================================================

Agent Turn Latency

Scenario: Fix a bug (10 grep calls, 3 file reads)

Architecture Tool Overhead LLM Inference Total Turn Time
Traditional (laptop + remote LLM) 150ms (10Γ—15ms) 500ms 650ms
Curserve (co-located) 10ms (10Γ—1ms) 500ms 510ms
Improvement 93% faster - 22% faster

Multi-Tenant Scalability

Setup: Single server with 1x H100 GPU, 100 concurrent users

Metric Traditional Curserve
Supported users ~10-20 (network bottleneck) 100+
GPU utilization 40-60% (waiting on I/O) 85-95%
Tool latency (p50) 15ms 2ms
Tool latency (p99) 80ms 8ms

πŸš€ Getting Started

Prerequisites

  • Rust (1.70+): For building mem-search-service
  • Node.js (20+): For building qwen-code-ipc
  • Git submodules: Both components are in this repo

1. Clone and Initialize

git clone https://github.com/your-org/curserve.git
cd curserve
git submodule update --init --recursive

2. Build Memory Search Service

cd mem-search-service
cargo build --release

Binary will be at: ./target/release/mem-search-service

3. Build qwen-code-ipc

cd ../qwen-code-ipc
npm install
npm run build

Binary will be at: ./dist/cli.js

4. Start the Memory Search Service

./mem-search-service/target/release/mem-search-service

Output:

================================================================================
CURSERVE Memory Search Service
================================================================================

Request listener started on /tmp/mem_search_service_requests.sock
Worker thread started
Service running. Press Ctrl+C to stop.

5. Run a Coding Agent Session

node qwen-code-ipc/dist/cli.js ~/my-codebase

Or create a shell alias:

alias curserve='node /path/to/curserve/qwen-code-ipc/dist/cli.js'

Then:

curserve ~/my-project
> Find all TODO comments and prioritize them by importance

πŸ§ͺ Testing and Validation

Test IPC Communication

cd mem-search-service
cargo test

Benchmark Grep Performance

./target/release/benchmark /path/to/codebase "search pattern" 100

Integration Test

# Terminal 1: Start daemon
./target/release/mem-search-service

# Terminal 2: Run qwen-code-ipc
cd ../qwen-code-ipc
npm test

πŸ“– IPC Protocol Specification

Socket Paths

  • Request socket (shared): /tmp/mem_search_service_requests.sock
  • Response sockets (per-client): /tmp/qwen_code_response_{pid}.sock

Request Types

1. Allocate PID

Request:

{
  "type": "alloc_pid",
  "pid": 12345,
  "repo_dir_path": "/home/user/my-codebase"
}

Response:

{
  "response_status": 1,
  "text": "Allocated 347 files"
}

2. Ripgrep Search

Request:

{
  "type": "request_ripgrep",
  "pid": 12345,
  "pattern": "fn\\s+\\w+",
  "paths": ["/home/user/my-codebase/src"],
  "options": {
    "line_number": true,
    "ignore_case": false,
    "threads": 4
  }
}

Response:

{
  "response_status": 1,
  "text": "src/main.rs:42:fn main() {\nsrc/lib.rs:10:fn search() {"
}

Error Responses

{
  "response_status": 0,
  "error": "PID 12345 has no allocated codebase. Call alloc_pid first."
}

πŸ”’ Security Considerations

Isolation

  • Per-user codebases: Each client gets an isolated memory-mapped view
  • Unix permissions: Socket access controlled by filesystem permissions
  • Process isolation: qwen-code-ipc runs as the user's process (SSH session)

Resource Limits

// Future work: Implement codebase eviction
// When RAM > threshold:
//   - Evict least-recently-used codebases
//   - Re-allocate on next grep request

Recommended Deployment

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         Curserve Server                     β”‚
β”‚                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  mem-search-service (privileged)      β”‚ β”‚
β”‚  β”‚  β€’ Runs as root or dedicated user     β”‚ β”‚
β”‚  β”‚  β€’ Socket permissions: 0770           β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  SSH Access (per-user)                β”‚ β”‚
β”‚  β”‚  β€’ Users SSH in                       β”‚ β”‚
β”‚  β”‚  β€’ Run: curserve [workspace] [prompt] β”‚ β”‚
β”‚  β”‚  β€’ qwen-code-ipc runs as their user   β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  vLLM (GPU inference)                 β”‚ β”‚
β”‚  β”‚  β€’ Shared across all users            β”‚ β”‚
β”‚  β”‚  β€’ Rate limiting per user/team        β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ› οΈ Troubleshooting

Socket Already Exists

$ ./mem-search-service
Error: Address already in use

Solution:

rm /tmp/mem_search_service_requests.sock
./mem-search-service

IPC Connection Failed

qwen-code-ipc: Failed to connect to socket

Check:

  1. Is mem-search-service running?
  2. Socket permissions: ls -l /tmp/mem_search_service_requests.sock
  3. Firewall/SELinux blocking Unix sockets?

High Memory Usage

$ ps aux | grep mem-search-service
user     12345  2.5  8.3  8472192 ...

Analysis:

  • Each codebase uses ~15-30MB (text files only)
  • 100 codebases = ~2-3GB RAM (very manageable)
  • Future: Implement LRU eviction for 1000+ users

πŸ—ΊοΈ Roadmap

Phase 1: Core Infrastructure βœ…

  • Memory-mapped grep in Rust
  • Unix domain socket IPC
  • qwen-code fork with IPC integration
  • Basic multi-tenancy support

Phase 2: Production Readiness (Q2 2025)

  • File watching & auto-reload on changes
  • Codebase eviction/LRU caching
  • Rate limiting per user/team
  • Monitoring & telemetry (Prometheus/Grafana)
  • Docker deployment support

Phase 3: Advanced Features (Q3 2025)

  • Distributed codebases (multi-node)
  • Copy-on-write sharing (same codebase, multiple users)
  • Incremental updates (git pull without full reload)
  • Advanced tools: find, tree, analyze via IPC

Phase 4: Optimization (Q4 2025)

  • Suffix tree indexing for hot files
  • Predictive codebase loading
  • GPU-accelerated search (cuDF/RAPIDS)
  • Zero-copy IPC (shared memory)

πŸ“š Documentation

  • Memory Search Service: See mem-search-service/README.md
  • qwen-code-ipc: See qwen-code-ipc/README.md
  • IPC Protocol: See docs/ipc-protocol.md (coming soon)
  • Deployment Guide: See docs/deployment.md (coming soon)

🀝 Contributing

We welcome contributions! Key areas:

  • Performance: Optimize search algorithms, memory usage
  • Reliability: Error handling, crash recovery
  • Scalability: Better multi-tenancy, distributed support
  • Tools: Add more IPC-backed tools (find, tree, etc.)

See CONTRIBUTING.md for guidelines.


πŸ“„ License

MIT License - see LICENSE for details.


πŸ™ Acknowledgments

  • ripgrep by BurntSushi: Core search engine
  • qwen-code by QwenLM: Base agent framework
  • vLLM: High-performance inference engine

πŸ“§ Contact


Built with ❀️ for the coding agent community

Star ⭐ this repo if you find it useful!

About

faster tool calling for server-side multi-tenant coding agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors