Data-structure-powered tooling for LLM agents.
sancho is a Rust workspace that augments coding agents with classical advanced data structures. The primary product is sancho-mcp, an MCP server that gives Copilot (and any MCP-compatible agent) deduplication, context trimming, claim verification, pattern lookup, and session metrics — all backed by battle-tested data structures from Brass's Advanced Data Structures.
| Problem | sancho solution | Data structure |
|---|---|---|
| Agent re-explores files it already saw | check_seen / cache_response dedup |
Cuckoo filter + Count-Min Sketch |
| Context window fills up with stale text | trim_context removes low-value tokens |
Count-Min Sketch (frequency) |
| Agent claims "code does X" without checking | register_claim / verify_claim pipeline |
Compressed Trie + Persistent RB-tree |
| Repeated prefix searches across files | find_pattern for O(m) lookup |
Dynamic Suffix Tree (Ukkonen) |
| No checkpoint/rollback across agent turns | checkpoint / session versioning |
Persistent Red-Black Tree |
┌──────────────────────────────────────────────────┐
│ Agent (Copilot / any MCP client) │
│ ↕ stdio JSON-RPC 2.0 │
├──────────────────────────────────────────────────┤
│ sancho-mcp │
│ ┌────────────┐ ┌──────────┐ ┌────────────────┐ │
│ │ Dedup tools│ │ Trim/Find│ │ Claim/Contract │ │
│ │ (Bloom + │ │ (CountMin│ │ Verification │ │
│ │ CMS) │ │ + Suffix)│ │ (Trie + RBTree)│ │
│ └────────────┘ └──────────┘ └────────────────┘ │
├──────────────────────────────────────────────────┤
│ sancho-core (zero I/O, no async, pure DS) │
└──────────────────────────────────────────────────┘
- Rust stable (MSRV: 1.80)
cargo test --workspacecargo run -p sancho-mcpThis repo ships .vscode/mcp.json — open the workspace and the MCP server is auto-discovered.
Or add to your editor's MCP config:
{
"servers": {
"sancho": {
"type": "stdio",
"command": "cargo",
"args": ["run", "-p", "sancho-mcp"]
}
}
}| Crate | Purpose | Publish |
|---|---|---|
sancho-core |
Pure data structures (suffix tree, sketches, filters, persistent trees) | ✅ crates.io |
sancho-mcp |
MCP server — 25 tools over stdio JSON-RPC 2.0 | ✅ crates.io |
sancho-proxy |
Ollama-compatible inference proxy (reference architecture) | internal |
sancho-cli |
Proxy binary entrypoint | internal |
sancho-candle |
Experimental Candle inference runner (research) | internal |
All implementations cite the relevant chapter from Brass, Advanced Data Structures (Cambridge University Press):
- Cuckoo / Counting Bloom filter — [Brass Ch 11] — probabilistic membership
- Count-Min Sketch — [Brass Ch 11] — frequency estimation
- Compressed Trie — [Brass Ch 8.1] — Patricia / compressed prefix tree
- Persistent Red-Black Tree — [Brass Ch 7.2] — fully persistent ordered map
- Dynamic Suffix Tree — [Brass Ch 8.4] — Ukkonen's online construction
Every data structure has property-based tests via proptest and NEON SIMD acceleration on Apple Silicon where applicable.
| Tool | What it does |
|---|---|
check_seen |
Dedup check — has the agent seen this input before? |
cache_response |
Store a response for future dedup hits |
trim_context |
Remove low-frequency tokens to fit context window |
find_pattern |
O(m) suffix-tree pattern search |
classify_task |
Route task to appropriate handler via trie |
checkpoint |
Save/restore session state (persistent RB-tree) |
register_claim |
Declare what code/tool does |
register_contract |
Define constraints a claim must satisfy |
record_observed_effects |
Log runtime side effects as evidence |
ingest_trace_summary |
Import execution trace as evidence |
verify_claim |
Compare claim against contract + evidence |
explain_mismatch |
Human-readable explanation of verification failures |
set_rollout_mode |
Control tool activation policy |
session_stats |
Session-level metrics and hit rates |
The verification pipeline supports multiple evidence backends:
inproc(recommended): unprivileged, adapter-supplied side effectsdtrace(optional): macOS-only, privileged local diagnosticsdry-run: synthetic evidence for CI and smoke testing
# In-process observation (default)
python3 scripts/mcp_observer_pipeline.py \
--observer-backend inproc \
--claim-id claim-1 \
--contract-id contract-1 \
--inproc-effect file.open:/tmp/out.txt
# Capture effects from a running command
python3 scripts/inproc_observe_command.py \
--run-pipeline --emit-spawn-effect \
-- python3 your_script.pyLanguage adapter helpers: Python, TypeScript, and Node.js adapters in scripts/inproc_adapters/.
from sancho_py import SanchoClient
async with SanchoClient() as client:
result = await client.call_tool("check_seen", {"input_hash": "abc123"})See docs/python-adapter.md for full documentation.
- Versioning follows SemVer.
- Breaking changes only in major versions.
- Release notes in
CHANGELOG.md.
Please report security issues via GitHub Security Advisories. See SECURITY.md for details.
See CONTRIBUTING.md and CODE_OF_CONDUCT.md.
MIT — see LICENSE-MIT.