Lightweight DuckDuckGo web search for AI agents. Fetches results concurrently, extracts clean page content, and re-ranks by relevance using BM25 — all from the command line.
- DuckDuckGo search — scrapes
html.duckduckgo.comwith region and recency filters - Concurrent page fetching — async HTTP/2 with connection pooling; configurable concurrency
- Intelligent content extraction — removes navigation, ads, and boilerplate; weights headings and body text using SEO heuristics; deduplicates lines
- BM25 re-ranking — indexes full page text and re-ranks results by relevance to the original query
- Agent-friendly output — clean JSON on stdout, progress logs on stderr; pipe-safe for agent tool calls
- Skill management — installs a
SKILL.mdfor Claude Code and GitHub Copilot in VS Code so agents discover and invoke the tool automatically; tracks installations and supports clean uninstall
# System-wide with uv (recommended)
uv tool install duckduckscrape
# Or with pip
pip install duckduckscrape# Basic search — top 5 results ranked by BM25
duckduckscrape search "python dataclasses"
# Clean JSON for agent/programmatic use (progress on stderr)
duckduckscrape search "rust ownership" --output json
# Limit results
duckduckscrape search "climate change" -k 3
# Skip full-page fetching for a fast keyword-only search
duckduckscrape search "openai news" --no-fetch
# Filter by recency: d (day), w (week), m (month), y (year)
duckduckscrape search "breaking news" --time-filter d
# Filter by region
duckduckscrape search "local elections" --region us-en
# Tune performance
duckduckscrape search "machine learning" --concurrency 8 --timeout 15 --content-limit 3000Each result in --output json is an object:
| Field | Type | Description |
|---|---|---|
title |
string | Page title |
url |
string | Full URL |
display_url |
string | Shortened display URL |
snippet |
string | DuckDuckGo snippet |
content |
string | null | Extracted page text prefixed with Source: <url> |
bm25_score |
number | BM25 relevance score (omitted when --no-rank) |
duckduckscrape can install a SKILL.md so that Claude Code and GitHub Copilot in VS Code automatically know when and how to invoke it.
# Install (prompts for agent and scope)
duckduckscrape skill install
# Install non-interactively
duckduckscrape skill install --agent both --scope global
# Uninstall (interactive, reads from ~/.duckduckscrape/config.toml)
duckduckscrape skill uninstallInstallation paths are recorded in ~/.duckduckscrape/config.toml. The uninstall command reads this file to present a selection dialogue — no manual path hunting required.
| Agent | Scope | Path |
|---|---|---|
| Claude Code | project | .claude/skills/duckduckscrape/SKILL.md |
| Claude Code | global | ~/.claude/skills/duckduckscrape/SKILL.md |
| VS Code Copilot | project | .github/skills/duckduckscrape/SKILL.md |
| VS Code Copilot | global | ~/.copilot/skills/duckduckscrape/SKILL.md |
~/.claude/skills/ is recognised by both Claude Code and VS Code Copilot, so a single global Claude install covers both agents.
The SKILL.md is generated dynamically from the live CLI — options, defaults, and examples stay in sync with the installed version automatically.
- Agent web search — drop into any Claude Code or Copilot workflow; the agent calls
duckduckscrape search "..." --output jsonand gets structured, ranked results it can reason over - Research pipelines — pipe JSON output into
jq, Python scripts, or other tools for further processing - Content monitoring — combine
--time-filter dwith cron to watch a topic for recent developments - Fast keyword lookup —
--no-fetchreturns DDG snippets in under a second, useful when full content isn't needed
- POST to
https://html.duckduckgo.com/html/and parsediv.resultelements with BeautifulSoup + lxml - Fetch each result URL concurrently with a shared
httpx.AsyncClient(HTTP/2, bounded semaphore) - Extract main content: strip nav/header/footer/ads, narrow to
<main>/<article>, weight headings, deduplicate lines - Build a
BM25Okapiindex over fetched content and re-rank; discard zero-score results - Return top-k results as JSON
git clone https://github.com/your-username/duckduckscrape
cd duckduckscrape
uv sync
uv run duckduckscrape search "test"