Skip to content

alvgeppetto/sancho

sancho

CI License: MIT Rust

Data-structure-powered tooling for LLM agents.

sancho is a Rust workspace that augments coding agents with classical advanced data structures. The primary product is sancho-mcp, an MCP server that gives Copilot (and any MCP-compatible agent) deduplication, context trimming, claim verification, pattern lookup, and session metrics — all backed by battle-tested data structures from Brass's Advanced Data Structures.

Why sancho?

Problem sancho solution Data structure
Agent re-explores files it already saw check_seen / cache_response dedup Cuckoo filter + Count-Min Sketch
Context window fills up with stale text trim_context removes low-value tokens Count-Min Sketch (frequency)
Agent claims "code does X" without checking register_claim / verify_claim pipeline Compressed Trie + Persistent RB-tree
Repeated prefix searches across files find_pattern for O(m) lookup Dynamic Suffix Tree (Ukkonen)
No checkpoint/rollback across agent turns checkpoint / session versioning Persistent Red-Black Tree

Architecture

┌──────────────────────────────────────────────────┐
│  Agent (Copilot / any MCP client)                │
│  ↕ stdio JSON-RPC 2.0                           │
├──────────────────────────────────────────────────┤
│  sancho-mcp                                      │
│  ┌────────────┐ ┌──────────┐ ┌────────────────┐ │
│  │ Dedup tools│ │ Trim/Find│ │ Claim/Contract │ │
│  │  (Bloom +  │ │ (CountMin│ │ Verification   │ │
│  │  CMS)      │ │ + Suffix)│ │ (Trie + RBTree)│ │
│  └────────────┘ └──────────┘ └────────────────┘ │
├──────────────────────────────────────────────────┤
│  sancho-core  (zero I/O, no async, pure DS)      │
└──────────────────────────────────────────────────┘

Quickstart

Prerequisites

  • Rust stable (MSRV: 1.80)

Build and test

cargo test --workspace

Run the MCP server

cargo run -p sancho-mcp

Wire into VS Code / Copilot

This repo ships .vscode/mcp.json — open the workspace and the MCP server is auto-discovered.

Or add to your editor's MCP config:

{
  "servers": {
    "sancho": {
      "type": "stdio",
      "command": "cargo",
      "args": ["run", "-p", "sancho-mcp"]
    }
  }
}

Workspace crates

Crate Purpose Publish
sancho-core Pure data structures (suffix tree, sketches, filters, persistent trees) ✅ crates.io
sancho-mcp MCP server — 25 tools over stdio JSON-RPC 2.0 ✅ crates.io
sancho-proxy Ollama-compatible inference proxy (reference architecture) internal
sancho-cli Proxy binary entrypoint internal
sancho-candle Experimental Candle inference runner (research) internal

Core data structures

All implementations cite the relevant chapter from Brass, Advanced Data Structures (Cambridge University Press):

  • Cuckoo / Counting Bloom filter — [Brass Ch 11] — probabilistic membership
  • Count-Min Sketch — [Brass Ch 11] — frequency estimation
  • Compressed Trie — [Brass Ch 8.1] — Patricia / compressed prefix tree
  • Persistent Red-Black Tree — [Brass Ch 7.2] — fully persistent ordered map
  • Dynamic Suffix Tree — [Brass Ch 8.4] — Ukkonen's online construction

Every data structure has property-based tests via proptest and NEON SIMD acceleration on Apple Silicon where applicable.

MCP tools (14 total)

Tool What it does
check_seen Dedup check — has the agent seen this input before?
cache_response Store a response for future dedup hits
trim_context Remove low-frequency tokens to fit context window
find_pattern O(m) suffix-tree pattern search
classify_task Route task to appropriate handler via trie
checkpoint Save/restore session state (persistent RB-tree)
register_claim Declare what code/tool does
register_contract Define constraints a claim must satisfy
record_observed_effects Log runtime side effects as evidence
ingest_trace_summary Import execution trace as evidence
verify_claim Compare claim against contract + evidence
explain_mismatch Human-readable explanation of verification failures
set_rollout_mode Control tool activation policy
session_stats Session-level metrics and hit rates

Observer backends (runtime verification)

The verification pipeline supports multiple evidence backends:

  • inproc (recommended): unprivileged, adapter-supplied side effects
  • dtrace (optional): macOS-only, privileged local diagnostics
  • dry-run: synthetic evidence for CI and smoke testing
# In-process observation (default)
python3 scripts/mcp_observer_pipeline.py \
  --observer-backend inproc \
  --claim-id claim-1 \
  --contract-id contract-1 \
  --inproc-effect file.open:/tmp/out.txt

# Capture effects from a running command
python3 scripts/inproc_observe_command.py \
  --run-pipeline --emit-spawn-effect \
  -- python3 your_script.py

Language adapter helpers: Python, TypeScript, and Node.js adapters in scripts/inproc_adapters/.

Python client

from sancho_py import SanchoClient

async with SanchoClient() as client:
    result = await client.call_tool("check_seen", {"input_hash": "abc123"})

See docs/python-adapter.md for full documentation.

Stability and release policy

  • Versioning follows SemVer.
  • Breaking changes only in major versions.
  • Release notes in CHANGELOG.md.

Security

Please report security issues via GitHub Security Advisories. See SECURITY.md for details.

Contributing

See CONTRIBUTING.md and CODE_OF_CONDUCT.md.

License

MIT — see LICENSE-MIT.

About

Sancho is a Rust-powered MCP server that adds algorithmic intelligence to Copilot workflows—deduplicating prompts, trimming context, verifying claims against runtime evidence, and enforcing rollout policies with measurable quality metrics.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors