SERP Content Gap Analyzer

Most content strategies are based on gut feel. This tool replaces guesswork with data — showing exactly what ranks, what's missing, and what to write.

It automates the research behind every content piece a demand gen team would produce: capturing live SERP data, scraping competitor content, and generating AI-powered content briefs with specific gap analysis and differentiation opportunities.

Architecture

┌─────────────────────────────────────────────────────────┐
│  Layer 1: SERP Capture (Playwright)                     │
│  - Full-page screenshot                                 │
│  - Ads, featured snippets, PAA, AI overview detection   │
│  - Organic result extraction                            │
└──────────────────────┬──────────────────────────────────┘
                       │ URLs + SERP features
                       ▼
┌─────────────────────────────────────────────────────────┐
│  Layer 2: Content Scraper (Firecrawl / BeautifulSoup)   │
│  - Full page content extraction                         │
│  - Heading structure analysis                           │
│  - Word counts, meta data, link analysis                │
└──────────────────────┬──────────────────────────────────┘
                       │ Structured content data
                       ▼
┌─────────────────────────────────────────────────────────┐
│  Layer 3: Gap Analyzer (Claude API)                     │
│  - Table stakes identification                          │
│  - Content gap detection                                │
│  - Differentiation opportunities                        │
│  - Full content brief generation                        │
└─────────────────────────────────────────────────────────┘

Prerequisites

Python 3.10+
An Anthropic API key (required)
A Firecrawl API key (optional — falls back to BeautifulSoup)

Installation

# Clone the repo
git clone https://github.com/yourusername/serp-content-gap-analyzer.git
cd serp-content-gap-analyzer

# Install dependencies
pip install -r requirements.txt
playwright install chromium

# Configure API keys
cp .env.example .env
# Edit .env with your keys

Usage

# Basic usage
python -m src.main "website visitor identification"

# With options
python -m src.main "B2B intent data" --max-results 10 --verbose

# Custom output directory
python -m src.main "demand generation strategy" --output-dir my_reports

Output Files

Each run produces three files in the output directory:

File	Description
`serp_*.png`	Full-page SERP screenshot
`serp_*.json`	Structured SERP data (ads, snippets, PAA, organic results)
`brief_*.md`	AI-generated content brief with gap analysis

Sample Output

See sample_output.md for a realistic example of the tool's output analyzing the keyword "website visitor identification."

How It Works

Layer 1: SERP Capture

Uses Playwright with stealth mode to load Google search results and extract:

Ads — count, domains, top ad copy (reveals commercial intent)
Featured Snippet — text and source domain (content format signals)
AI Overview — presence and preview text
People Also Ask — question list (content gap signals)
Organic Results — rank, title, URL, description
Video Carousel, Image Pack, Knowledge Panel — presence detection

Anti-detection: randomized user agents, stealth browser fingerprinting, human-like delays. CAPTCHA detection with graceful degradation to partial results.

Layer 2: Content Scraping

Two scraping paths with automatic selection:

Firecrawl (if API key provided) — higher quality markdown extraction with metadata
BeautifulSoup (fallback) — direct HTML parsing with requests

Each page yields: title, meta description, word count, nested heading tree, content text, link counts. Error isolation ensures one failed scrape doesn't stop the pipeline.

Layer 3: Gap Analysis

Sends structured SERP + content data to Claude with a specialized SEO strategist prompt. The analysis covers:

Table Stakes — what every competitor covers (must-include)
Content Gaps — topics competitors miss or cover poorly
Differentiation Opportunities — unique angles, PAA coverage, commercial intent signals
SERP Feature Opportunities — how to target snippets, PAA, AI overview
Content Brief — title, word count target, full outline, key points, CTA strategy

Limitations

SERP scraping is fragile — Google frequently changes its HTML structure. Selectors may need updating.
CAPTCHA risk — running many queries in succession may trigger Google's bot detection. The tool detects this and returns partial results.
Rate limits — both Firecrawl and Anthropic APIs have rate limits. The tool handles 429 errors with retry/fallback logic.
Content truncation — competitor page content is truncated to ~3000 chars per page in the analysis prompt to stay within token limits.
No JavaScript rendering in BS4 path — the BeautifulSoup fallback won't capture JS-rendered content. Firecrawl handles this better.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.claude		.claude
output		output
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
sample_output.md		sample_output.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SERP Content Gap Analyzer

Architecture

Prerequisites

Installation

Usage

Output Files

Sample Output

How It Works

Layer 1: SERP Capture

Layer 2: Content Scraping

Layer 3: Gap Analysis

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SERP Content Gap Analyzer

Architecture

Prerequisites

Installation

Usage

Output Files

Sample Output

How It Works

Layer 1: SERP Capture

Layer 2: Content Scraping

Layer 3: Gap Analysis

Limitations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages