A minimal template showing how to use Plasmate from Node.js. Fetch web pages and get back a structured Semantic Object Model (SOM) instead of raw HTML.
Install Plasmate:
cargo install plasmate| Script | Description |
|---|---|
fetch-page.mjs |
Fetch a single URL and print the semantic content |
batch-fetch.mjs |
Fetch multiple URLs and save results as JSON |
extract-structured-data.mjs |
Extract headings, links, images, and text from a page |
# Clone this template
gh repo create my-scraper --template plasmate-labs/quickstart-node --clone
cd my-scraper
# Fetch a page
node fetch-page.mjs https://news.ycombinator.com
# Extract structured data
node extract-structured-data.mjs https://github.com/trending
# Batch fetch
node batch-fetch.mjs https://example.com https://example.orgPlasmate fetches web pages and returns a Semantic Object Model — a structured JSON representation of the page content.
import { execSync } from "node:child_process";
const output = execSync('plasmate fetch "https://example.com"', { encoding: "utf-8" });
const som = JSON.parse(output);
// som = {
// title: "Example Domain",
// lang: "en",
// regions: [
// {
// role: "main",
// id: "content",
// elements: [
// { role: "heading", text: "Example Domain", level: 1 },
// { role: "text", text: "This domain is for use in illustrative examples..." },
// { role: "link", text: "More information...", href: "https://www.iana.org/domains/example" }
// ]
// }
// ]
// }MIT