From 8f1bab051e249cbe6df965c708d03ff94ea5c633 Mon Sep 17 00:00:00 2001
From: Lance Paine <lance.paine@semanticpartners.com>
Date: Thu, 12 Mar 2026 18:44:29 +0000
Subject: [PATCH] docs: explain MCP vs CLI token tradeoff in README and
 explainer
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Both approaches are supported — MCP for frameworks that expect it,
CLI for keeping the context window free. The lazy CLI pattern uses
94% fewer tokens (~1.2k vs ~42k) by loading tool catalog on demand.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 README.md                 |  26 ++++++++-
 docs/explainer/index.html | 117 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 140 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 88d87ad..aa8c566 100644
--- a/README.md
+++ b/README.md
@@ -15,9 +15,13 @@ cd spai && ./install.sh
 
 Installs `spai` and `spai-edit` to `~/.local/bin/`. Requires [babashka](https://babashka.org/) (`bb`). [ripgrep](https://github.com/BurntSushi/ripgrep) (`rg`) is optional — falls back to grep.
 
-## Claude Code (MCP)
+## Two ways to connect: MCP or CLI
 
-spai works as a native MCP tool server for [Claude Code](https://docs.anthropic.com/en/docs/claude-code). Add to your project's `.mcp.json`:
+spai supports both MCP (tool schemas) and plain CLI (shell commands). Both expose the same tools. The difference is how your agent discovers them.
+
+### MCP (eager loading)
+
+MCP dumps full tool schemas into the agent's context at session start — ~42k tokens for the full spai toolkit. Use this if your framework expects MCP or you want native tool integration.
 
 ```json
 {
@@ -37,7 +41,23 @@ Or register globally:
 claude mcp add --transport stdio spai -- bb ~/.local/share/spai/spai-mcp.bb
 ```
 
-This gives Claude direct access to `shape`, `blast`, `context`, `who`, `related`, `drift`, `narrative`, and `errors_rust` as native tools — no shell pipelines, no output parsing.
+### CLI (lazy loading) — recommended
+
+The CLI approach loads nothing upfront. The agent calls `spai help` when it needs the catalog (~1,200 tokens for 35+ tools), or `spai search "question"` to find the right command via natural language using a local model. 94% fewer tokens, same capabilities.
+
+```bash
+spai help                          # Compact tool catalog (~1.2k tokens)
+spai search "find class predicates" # NL search via local qwen (optional)
+spai shape src/                     # Just run the command
+```
+
+This is the [follow-your-nose](https://en.wikipedia.org/wiki/Follow-your-nose_(computing)) pattern from RDF/Linked Data: don't download the whole schema, follow links to what you need. The agent discovers tools on demand, not all at once.
+
+### Why this matters
+
+MCP's eager loading made sense when tool catalogs were small. At 35+ tools with rich schemas, the upfront cost is significant — tokens the agent pays whether or not it uses the tools. The CLI approach lets the agent keep its context window for thinking instead of caching tool descriptions it may never need.
+
+Both approaches are fully supported. Use whichever fits your setup.
 
 ## spai — Code Exploration
 
diff --git a/docs/explainer/index.html b/docs/explainer/index.html
index 054578b..ce44eed 100644
--- a/docs/explainer/index.html
+++ b/docs/explainer/index.html
@@ -669,6 +669,85 @@
     display: block;
   }
 
+  /* MCP vs CLI */
+  .mcp-vs-cli {
+    margin: 5rem 0;
+  }
+  .mcp-compare {
+    display: grid;
+    grid-template-columns: 1fr 1fr;
+    gap: 2rem;
+    max-width: 900px;
+    margin: 0 auto 2rem;
+  }
+  .mcp-card {
+    background: var(--surface);
+    border: 1px solid var(--border);
+    border-radius: 12px;
+    padding: 2rem 1.75rem;
+  }
+  .mcp-card.preferred {
+    border-color: var(--accent);
+    box-shadow: 0 4px 20px var(--glow);
+  }
+  .mcp-card h3 {
+    font-family: 'JetBrains Mono', monospace;
+    font-size: 1.1rem;
+    margin-bottom: 0.75rem;
+  }
+  .mcp-card.preferred h3 { color: var(--accent); }
+  .mcp-card .stat {
+    font-family: 'JetBrains Mono', monospace;
+    font-size: 2.2rem;
+    font-weight: 700;
+    margin-bottom: 0.5rem;
+    display: block;
+  }
+  .mcp-card.preferred .stat { color: var(--accent); }
+  .mcp-card .stat-label {
+    font-size: 0.8rem;
+    color: var(--muted);
+    display: block;
+    margin-bottom: 1rem;
+  }
+  .mcp-card p, .mcp-card ul {
+    font-size: 0.9rem;
+    color: var(--muted);
+    line-height: 1.7;
+  }
+  .mcp-card ul {
+    list-style: none;
+    padding: 0;
+  }
+  .mcp-card ul li {
+    padding: 0.2rem 0;
+  }
+  .mcp-card ul li::before {
+    content: '>';
+    font-family: 'JetBrains Mono', monospace;
+    color: var(--muted);
+    margin-right: 0.5rem;
+    font-size: 0.8rem;
+  }
+  .mcp-card.preferred ul li::before {
+    color: var(--accent);
+  }
+  .mcp-note {
+    max-width: 700px;
+    margin: 0 auto;
+    text-align: center;
+    font-size: 0.95rem;
+    color: var(--muted);
+    line-height: 1.7;
+  }
+  .mcp-note code {
+    color: var(--accent);
+  }
+
+  @media (max-width: 768px) {
+    .mcp-compare { grid-template-columns: 1fr; }
+  }
+
   /* FAQ */
   .faq {
     margin: 5rem 0;
@@ -1168,6 +1247,44 @@ <h3>Memory across sessions</h3>
     </div>
   </section>
 
+  <!-- MCP vs CLI -->
+  <section class="mcp-vs-cli">
+    <div class="section-head">
+      <h2>MCP support. <span>And why you might skip it.</span></h2>
+      <p>spai ships an MCP server. It also ships a cheaper alternative.</p>
+    </div>
+
+    <div class="mcp-compare">
+      <div class="mcp-card">
+        <h3>MCP (standard)</h3>
+        <span class="stat">~42k</span>
+        <span class="stat-label">tokens dumped upfront into every conversation</span>
+        <ul>
+          <li>Full tool schemas loaded at session start</li>
+          <li>Every tool description, every parameter, every type</li>
+          <li>Pays the cost whether or not you use the tools</li>
+          <li>Works with any MCP-compatible agent</li>
+        </ul>
+      </div>
+      <div class="mcp-card preferred">
+        <h3>CLI (lazy)</h3>
+        <span class="stat">~1.2k</span>
+        <span class="stat-label">tokens for the full catalog, loaded on demand</span>
+        <ul>
+          <li><code>spai help</code> &mdash; compact catalog when you need it</li>
+          <li><code>spai search "question"</code> &mdash; NL search via local model</li>
+          <li>Agent discovers tools as needed, not all at once</li>
+          <li>Works with any agent that can run shell commands</li>
+        </ul>
+      </div>
+    </div>
+
+    <div class="mcp-note">
+      <p>MCP loads tool schemas eagerly &mdash; the agent gets everything before it asks for anything. The CLI follows a lazier pattern: <code>spai help</code> returns a compact index (~1,200 tokens for 35+ tools), and <code>spai search</code> uses a local model to recommend the right command from natural language. Same tools, 94% fewer tokens, and the agent only pays for what it uses.</p>
+      <p style="margin-top: 1rem;">Both approaches are supported. Use MCP if your framework expects it. Use the CLI if you'd rather keep your context window for thinking.</p>
+    </div>
+  </section>
+
   <!-- FAQ -->
   <section class="faq">
     <div class="section-head">