Skip to content

dayscape/serp-content-gap-analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SERP Content Gap Analyzer

Most content strategies are based on gut feel. This tool replaces guesswork with data — showing exactly what ranks, what's missing, and what to write.

It automates the research behind every content piece a demand gen team would produce: capturing live SERP data, scraping competitor content, and generating AI-powered content briefs with specific gap analysis and differentiation opportunities.

Architecture

┌─────────────────────────────────────────────────────────┐
│  Layer 1: SERP Capture (Playwright)                     │
│  - Full-page screenshot                                 │
│  - Ads, featured snippets, PAA, AI overview detection   │
│  - Organic result extraction                            │
└──────────────────────┬──────────────────────────────────┘
                       │ URLs + SERP features
                       ▼
┌─────────────────────────────────────────────────────────┐
│  Layer 2: Content Scraper (Firecrawl / BeautifulSoup)   │
│  - Full page content extraction                         │
│  - Heading structure analysis                           │
│  - Word counts, meta data, link analysis                │
└──────────────────────┬──────────────────────────────────┘
                       │ Structured content data
                       ▼
┌─────────────────────────────────────────────────────────┐
│  Layer 3: Gap Analyzer (Claude API)                     │
│  - Table stakes identification                          │
│  - Content gap detection                                │
│  - Differentiation opportunities                        │
│  - Full content brief generation                        │
└─────────────────────────────────────────────────────────┘

Prerequisites

Installation

# Clone the repo
git clone https://github.com/yourusername/serp-content-gap-analyzer.git
cd serp-content-gap-analyzer

# Install dependencies
pip install -r requirements.txt
playwright install chromium

# Configure API keys
cp .env.example .env
# Edit .env with your keys

Usage

# Basic usage
python -m src.main "website visitor identification"

# With options
python -m src.main "B2B intent data" --max-results 10 --verbose

# Custom output directory
python -m src.main "demand generation strategy" --output-dir my_reports

Output Files

Each run produces three files in the output directory:

File Description
serp_*.png Full-page SERP screenshot
serp_*.json Structured SERP data (ads, snippets, PAA, organic results)
brief_*.md AI-generated content brief with gap analysis

Sample Output

See sample_output.md for a realistic example of the tool's output analyzing the keyword "website visitor identification."

How It Works

Layer 1: SERP Capture

Uses Playwright with stealth mode to load Google search results and extract:

  • Ads — count, domains, top ad copy (reveals commercial intent)
  • Featured Snippet — text and source domain (content format signals)
  • AI Overview — presence and preview text
  • People Also Ask — question list (content gap signals)
  • Organic Results — rank, title, URL, description
  • Video Carousel, Image Pack, Knowledge Panel — presence detection

Anti-detection: randomized user agents, stealth browser fingerprinting, human-like delays. CAPTCHA detection with graceful degradation to partial results.

Layer 2: Content Scraping

Two scraping paths with automatic selection:

  • Firecrawl (if API key provided) — higher quality markdown extraction with metadata
  • BeautifulSoup (fallback) — direct HTML parsing with requests

Each page yields: title, meta description, word count, nested heading tree, content text, link counts. Error isolation ensures one failed scrape doesn't stop the pipeline.

Layer 3: Gap Analysis

Sends structured SERP + content data to Claude with a specialized SEO strategist prompt. The analysis covers:

  1. Table Stakes — what every competitor covers (must-include)
  2. Content Gaps — topics competitors miss or cover poorly
  3. Differentiation Opportunities — unique angles, PAA coverage, commercial intent signals
  4. SERP Feature Opportunities — how to target snippets, PAA, AI overview
  5. Content Brief — title, word count target, full outline, key points, CTA strategy

Limitations

  • SERP scraping is fragile — Google frequently changes its HTML structure. Selectors may need updating.
  • CAPTCHA risk — running many queries in succession may trigger Google's bot detection. The tool detects this and returns partial results.
  • Rate limits — both Firecrawl and Anthropic APIs have rate limits. The tool handles 429 errors with retry/fallback logic.
  • Content truncation — competitor page content is truncated to ~3000 chars per page in the analysis prompt to stay within token limits.
  • No JavaScript rendering in BS4 path — the BeautifulSoup fallback won't capture JS-rendered content. Firecrawl handles this better.

About

Captures live SERP data, scrapes competitor content, and generates AI-powered content briefs with gap analysis and differentiation opportunities.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages