Most content strategies are based on gut feel. This tool replaces guesswork with data — showing exactly what ranks, what's missing, and what to write.
It automates the research behind every content piece a demand gen team would produce: capturing live SERP data, scraping competitor content, and generating AI-powered content briefs with specific gap analysis and differentiation opportunities.
┌─────────────────────────────────────────────────────────┐
│ Layer 1: SERP Capture (Playwright) │
│ - Full-page screenshot │
│ - Ads, featured snippets, PAA, AI overview detection │
│ - Organic result extraction │
└──────────────────────┬──────────────────────────────────┘
│ URLs + SERP features
▼
┌─────────────────────────────────────────────────────────┐
│ Layer 2: Content Scraper (Firecrawl / BeautifulSoup) │
│ - Full page content extraction │
│ - Heading structure analysis │
│ - Word counts, meta data, link analysis │
└──────────────────────┬──────────────────────────────────┘
│ Structured content data
▼
┌─────────────────────────────────────────────────────────┐
│ Layer 3: Gap Analyzer (Claude API) │
│ - Table stakes identification │
│ - Content gap detection │
│ - Differentiation opportunities │
│ - Full content brief generation │
└─────────────────────────────────────────────────────────┘
- Python 3.10+
- An Anthropic API key (required)
- A Firecrawl API key (optional — falls back to BeautifulSoup)
# Clone the repo
git clone https://github.com/yourusername/serp-content-gap-analyzer.git
cd serp-content-gap-analyzer
# Install dependencies
pip install -r requirements.txt
playwright install chromium
# Configure API keys
cp .env.example .env
# Edit .env with your keys# Basic usage
python -m src.main "website visitor identification"
# With options
python -m src.main "B2B intent data" --max-results 10 --verbose
# Custom output directory
python -m src.main "demand generation strategy" --output-dir my_reportsEach run produces three files in the output directory:
| File | Description |
|---|---|
serp_*.png |
Full-page SERP screenshot |
serp_*.json |
Structured SERP data (ads, snippets, PAA, organic results) |
brief_*.md |
AI-generated content brief with gap analysis |
See sample_output.md for a realistic example of the tool's output analyzing the keyword "website visitor identification."
Uses Playwright with stealth mode to load Google search results and extract:
- Ads — count, domains, top ad copy (reveals commercial intent)
- Featured Snippet — text and source domain (content format signals)
- AI Overview — presence and preview text
- People Also Ask — question list (content gap signals)
- Organic Results — rank, title, URL, description
- Video Carousel, Image Pack, Knowledge Panel — presence detection
Anti-detection: randomized user agents, stealth browser fingerprinting, human-like delays. CAPTCHA detection with graceful degradation to partial results.
Two scraping paths with automatic selection:
- Firecrawl (if API key provided) — higher quality markdown extraction with metadata
- BeautifulSoup (fallback) — direct HTML parsing with requests
Each page yields: title, meta description, word count, nested heading tree, content text, link counts. Error isolation ensures one failed scrape doesn't stop the pipeline.
Sends structured SERP + content data to Claude with a specialized SEO strategist prompt. The analysis covers:
- Table Stakes — what every competitor covers (must-include)
- Content Gaps — topics competitors miss or cover poorly
- Differentiation Opportunities — unique angles, PAA coverage, commercial intent signals
- SERP Feature Opportunities — how to target snippets, PAA, AI overview
- Content Brief — title, word count target, full outline, key points, CTA strategy
- SERP scraping is fragile — Google frequently changes its HTML structure. Selectors may need updating.
- CAPTCHA risk — running many queries in succession may trigger Google's bot detection. The tool detects this and returns partial results.
- Rate limits — both Firecrawl and Anthropic APIs have rate limits. The tool handles 429 errors with retry/fallback logic.
- Content truncation — competitor page content is truncated to ~3000 chars per page in the analysis prompt to stay within token limits.
- No JavaScript rendering in BS4 path — the BeautifulSoup fallback won't capture JS-rendered content. Firecrawl handles this better.