A minimal template showing how to use Plasmate from Python. Fetch web pages and get back a structured Semantic Object Model (SOM) instead of raw HTML.
Install Plasmate:
cargo install plasmate| Script | Description |
|---|---|
fetch_page.py |
Fetch a single URL and print the semantic content |
batch_fetch.py |
Fetch multiple URLs and save results as JSON |
extract_links.py |
Extract all links from a page using the SOM |
# Clone this template
gh repo create my-scraper --template plasmate-labs/quickstart-python --clone
cd my-scraper
# Fetch a page
python fetch_page.py https://news.ycombinator.com
# Extract links
python extract_links.py https://github.com/trending
# Batch fetch
python batch_fetch.py https://example.com https://example.orgPlasmate fetches web pages and returns a Semantic Object Model — a structured JSON representation of the page content organized by semantic regions (navigation, main content, sidebars, etc.) and elements (headings, links, text, images).
import subprocess
import json
result = subprocess.run(["plasmate", "fetch", "https://example.com"], capture_output=True, text=True)
som = json.loads(result.stdout)
# som = {
# "title": "Example Domain",
# "lang": "en",
# "regions": [
# {
# "role": "main",
# "id": "content",
# "elements": [
# {"role": "heading", "text": "Example Domain", "level": 1},
# {"role": "text", "text": "This domain is for use in illustrative examples..."},
# {"role": "link", "text": "More information...", "href": "https://www.iana.org/domains/example"}
# ]
# }
# ]
# }MIT