The browser engine for agents.
HTML in. Semantic Object Model out.
Website · Docs · Benchmarks · Crates.io · npm · PyPI
Plasmate compiles HTML into a Semantic Object Model (SOM), a structured representation that LLMs can reason about directly. It runs JavaScript via V8, supports Puppeteer via CDP, and produces output that is 10-800x smaller than raw HTML.
| Plasmate | Lightpanda | Chrome | |
|---|---|---|---|
| Per page | 4-5 ms | 23 ms | 252 ms |
| Memory (100 pages) | ~30 MB | ~2.4 GB | ~20 GB |
| Binary | 43 MB | 59-111 MB | 300-500 MB |
| Output | SOM (10-800x smaller) | Raw HTML | Raw HTML |
| License | Apache-2.0 | AGPL-3.0 | Chromium |
curl -fsSL https://plasmate.app/install.sh | shOr via package managers:
cargo install plasmate # Rust
npm install -g plasmate # Node.js
pip install plasmate # Pythonplasmate fetch https://news.ycombinator.comReturns SOM JSON: structured regions, interactive elements with stable IDs, and content, typically 10x smaller than the raw HTML.
plasmate serve --protocol cdp --host 127.0.0.1 --port 9222Then connect with Puppeteer:
import puppeteer from 'puppeteer-core';
const browser = await puppeteer.connect({
browserWSEndpoint: 'ws://127.0.0.1:9222',
protocolTimeout: 10000,
});
const page = await browser.newPage();
await page.goto('https://example.com');
const title = await page.evaluate(() => document.title);
console.log(title);
await browser.close();plasmate serve --protocol awp --host 127.0.0.1 --port 9222AWP has 7 methods: navigate, snapshot, click, type, scroll, select, extract. That's the entire protocol.
plasmate mcpThis exposes Plasmate over stdio as MCP tools:
fetch_page- get structured SOM from any URLextract_text- get clean readable textopen_page- start an interactive session (returns session_id + SOM)evaluate- run JavaScript in the page contextclick- click elements by SOM element IDclose_page- end a session
Example Claude Desktop config:
{
"mcpServers": {
"plasmate": {
"command": "plasmate",
"args": ["mcp"]
}
}
}The DOM was built for rendering. SOM was built for reasoning.
Wikipedia homepage:
DOM → 47,000 tokens
SOM → 4,500 tokens (10.4x compression)
accounts.google.com:
DOM → ~300,000 tokens
SOM → ~350 tokens (864x compression)
SOM strips layout, styling, scripts, SVGs, and boilerplate. It keeps structure, content, and interactive elements with stable IDs that agents can reference in actions.
| Site | HTML | SOM | Compression |
|---|---|---|---|
| accounts.google.com | 1.2 MB | 1.4 KB | 864x |
| x.com | 239 KB | 1.5 KB | 159x |
| linear.app | 2.2 MB | 21 KB | 105x |
| bing.com | 157 KB | 1.7 KB | 93x |
| google.com | 194 KB | 2.6 KB | 74x |
| vercel.com | 941 KB | 22 KB | 43x |
| ebay.com | 831 KB | 33 KB | 25x |
| Wikipedia | 1.7 MB | 70 KB | 25x |
Median compression: 10.2x across 38 sites. Full results.
Plasmate embeds V8 and executes page JavaScript, including:
- Inline and external
<script>tags fetch()andXMLHttpRequestwith real HTTP requestssetTimeout/setIntervalwith timer draining- DOM mutations (createElement, appendChild, textContent, innerHTML, etc.)
- DOMContentLoaded and load events
- Promise resolution and microtask pumping
The JS pipeline runs during plasmate fetch and CDP page.goto(). The resulting DOM mutations are serialized back to HTML before SOM compilation, so JS-rendered content is captured.
Plasmate passes Lightpanda's Puppeteer benchmark (campfire-commerce). Supported CDP methods:
page.goto(),page.content(),page.title()page.evaluate(),page.waitForFunction()browser.newPage(),browser.createBrowserContext()Runtime.evaluate,Runtime.callFunctionOnDOM.getDocument,DOM.querySelector,DOM.querySelectorAllInput.dispatchMouseEvent,Input.dispatchKeyEvent- Target management (create, attach, close)
CDP is a compatibility layer. AWP is the native protocol, designed for agents rather than debuggers.
HTML → Network (reqwest) → HTML Parser (html5ever)
→ JS Pipeline (V8: scripts, fetch, XHR, timers, DOM mutations)
→ DOM Serialization → SOM Compiler → JSON output
- Network: reqwest with TLS, HTTP/2, redirects, compression; cookie jar supported, cookie APIs and proxy configuration are still limited
- JS Runtime: V8 with DOM shim (80+ methods), blocking fetch bridge
- SOM Compiler: semantic region detection, element ID generation, interactive element preservation, smart truncation, deduplication
- Protocols: AWP (native, 7 methods) and CDP (Puppeteer compatibility)
git clone https://github.com/plasmate-labs/plasmate.git
cd plasmate
cargo build --release
./target/release/plasmate fetch https://example.comRequirements: Rust 1.75+, V8 (fetched automatically by rusty_v8).
Prebuilt multi-arch images (linux/amd64 and linux/arm64) are published to GHCR:
# Server mode (CDP or AWP)
docker run --rm -p 9222:9222 ghcr.io/plasmate-labs/plasmate:latest
# One-shot fetch
docker run --rm ghcr.io/plasmate-labs/plasmate:latest fetch https://example.comBuild locally:
docker build -t plasmate .
docker run --rm -p 9222:9222 plasmatecargo test --workspace # 252 testsRun the built-in benchmark against cached pages:
cargo run --release -- bench --urls bench/urls.txtOr test against live sites:
plasmate fetch https://en.wikipedia.org/wiki/Rust_(programming_language) | jq '.regions | length'See plasmate.app/compare for the full comparison with Lightpanda and Chrome.
- MCP server mode (
plasmate mcpover stdio) - MCP Phase 2: stateful tools (open_page, click, evaluate, close_page)
- Docker image (GHCR multi-arch)
- Full V8 DOM mutation bridge (re-snapshot SOM after JS changes)
- Network interception (Fetch domain)
- Expose cookie APIs (CDP Network.getCookies/setCookies, MCP cookie import/export)
- Proxy support (per-session config, SOCKS)
- Real-world top-100 site coverage testing
- Web Platform Tests integration
Apache-2.0. See LICENSE.
Built by Plasmate Labs.
