Web scraping, analysis & content extraction for AI agents.
Scrape pages, crawl sites, extract UI/brand/SEO data. MCP server + CLI + HTTP API. Local-first, self-hosted.
Part of the Neural* ecosystem. NeuralScraper handles web scraping & analysis — but it doesn't work alone. It pairs with NeuralVaultCore (persistent memory), NeuralVaultSkill (session automation), and NeuralVaultFlow (dev workflow orchestration). Each component has its own repository and documentation. See the Neural* Ecosystem section at the bottom.
NeuralScraper gives AI agents (and humans) a clean, structured way to extract data from the web — no fluff, no cloud dependency.
| Capability | Description |
|---|---|
| Scrape | Scrape page — web + PDF |
| Screenshot | Full-page PNG capture |
| Crawl | Multi-page scraping with depth and limit control |
| Map | Fast internal URL discovery |
| UI Analysis | Layout structure, components, spacing, typography |
| Brand Extraction | Dominant colors, fonts, logos |
| SEO Audit | Meta tags, headings, OG, schema markup, scoring |
| Analyze | Scrape + screenshot + UI + brand + SEO in one command |
| Search | Web search via SearXNG + scrape results |
| Extract | Structured data extraction with LLM (Ollama) and custom schema |
| Interact | Browser actions (click, type, wait) + scrape |
| Batch | Process a list of URLs from a file |
git clone https://github.com/getobyte/NeuralScraper.git
cd NeuralScraper
npm install
npx playwright install chromium
npm run buildMake the CLI globally available:
npm link
# Now you can run: ns scrape https://example.comStart the MCP server:
node dist/mcp-server.jsgit clone https://github.com/getobyte/NeuralScraper.git
cd NeuralScraper
cp .env.example .env
docker compose up -dMCP server starts on port 9996 inside container NeuralScraper.
Verify:
docker ps | grep NeuralScraper
docker logs NeuralScraperAdd to ~/.claude.json or .claude/settings.json in your project:
{
"mcpServers": {
"neuralscraper": {
"command": "node",
"args": ["D:/path/to/NeuralScraper/dist/mcp-server.js"]
}
}
}Restart Claude Code. The following 12 tools will be available:
ns_scrape · ns_screenshot · ns_crawl · ns_map · ns_ui · ns_brand · ns_seo · ns_analyze · ns_search · ns_extract · ns_interact · ns_batch
NeuralScraper exposes a REST API when running as a server.
| Method | Endpoint |
|---|---|
GET |
/health |
POST |
/scrape |
POST |
/screenshot |
POST |
/crawl |
POST |
/map |
POST |
/ui |
POST |
/brand |
POST |
/seo |
POST |
/analyze |
POST |
/search |
POST |
/extract |
POST |
/interact |
POST |
/batch |
NeuralScraper's ns extract command uses Ollama to run a local LLM for structured data extraction — no cloud, no API keys.
Windows / macOS: Download the installer from ollama.com/download and run it.
Linux:
curl -fsSL https://ollama.com/install.sh | shVerify:
ollama --versionollama pull qwen3:14b
qwen3:14b— 9.3 GB, 40K context, native tool use support. Recommended forns extractflows.
ollama run qwen3:14bOllama runs as a local API server on http://localhost:11434. No internet required after the initial pull.
# Scrape a page (web or PDF)
ns scrape https://example.com
# Full-page screenshot
ns screenshot https://example.com
# Crawl a site
ns crawl https://example.com --depth 2 --limit 20
# Discover URLs
ns map https://example.com
# UI analysis
ns ui https://example.com
# Brand extraction
ns brand https://example.com
# SEO audit
ns seo https://example.com
# Full analysis (scrape + screenshot + UI + brand + SEO)
ns analyze https://example.com
# Web search via SearXNG + scrape results
ns search "best react libs" --limit 5
# Structured extraction with LLM (Ollama)
ns extract https://example.com --schema '{"price":"string"}'
# Browser automation (click, type, wait) + scrape
ns interact https://example.com --actions '[{"click":".btn"}]'
# Batch processing from a file
ns batch urls.txt| Option | Commands | Default |
|---|---|---|
-o, --output <dir> |
all | ./ns-output |
-d, --depth <n> |
crawl |
2 |
-l, --limit <n> |
crawl, search |
20 / 5 |
--no-screenshot |
scrape, crawl, batch |
— |
-s, --schema <json> |
extract |
— |
-p, --prompt <text> |
extract |
— |
-a, --actions <json> |
interact |
[] |
--no-scrape |
search |
— |
--no-scrape-after |
interact |
— |
Single page scrape:
ns-output/
example.com/
2026-03-28T14-30-00/
page.md
page.html
metadata.json
links.json
screenshot.png
ui-analysis.json
brand.json
seo-audit.json
manifest.json
Crawl job:
ns-output/
example.com/
crawl-2026-03-28T14-30-00/
manifest.json
pages.json
pages/
001-home/
002-about/
...
src/
browser/
playwright.ts # Browser pool management
screenshot.ts # Full-page screenshot
extractors/
markdown.ts # HTML → Markdown (readability + turndown)
metadata.ts # Meta tags, OG, Twitter cards
links.ts # Link extraction & classification
ui-analyzer.ts # Layout, components, spacing, fonts
brand.ts # Colors, fonts, logos
seo.ts # SEO audit with scoring
storage/
writer.ts # File output & manifest generation
tools/
scrape.ts
screenshot.ts
crawl.ts
map.ts
ui.ts
brand.ts
seo.ts
analyze.ts
search.ts
extract.ts
interact.ts
batch.ts
cli.ts # CLI entry point (commander)
mcp-server.ts # MCP server entry point (stdio)
index.ts # Library exports
| Runtime | Node.js 20+ |
| Language | TypeScript 5.8 |
| Browser | Playwright (Chromium) |
| HTML → MD | @mozilla/readability + turndown |
| HTML parsing | cheerio |
| MCP | @modelcontextprotocol/sdk |
| CLI | commander |
| Build | tsup |
NeuralScraper is a standalone tool — but it's designed to work alongside the rest of the Neural* family. Each component lives in its own repo with its own docs.
| Component | Role | Repo |
|---|---|---|
| NeuralScraper (you are here) | Web scraping & analysis | — |
| NeuralVaultCore | Persistent memory for AI agents | → GitHub |
| NeuralVaultSkill | Session memory automation | → GitHub |
| NeuralVaultFlow | Dev workflow orchestration | → GitHub |
NeuralScraper v2.0 — Cyber-Draco Legacy Built by getobyte