Skip to content

OpenClaw: Deep browser scraping for LinkedIn profiles and career pages #4

@madhavcodez

Description

@madhavcodez

Feature

SecretAIRY has an OpenClaw integration stub (backend/app/services/research/openclaw_scrape.py) but the actual OpenClaw API integration needs testing and hardening.

Current State

  • ✅ OpenClaw container running (openclaw-sbx-agent-main)
  • ✅ Fallback to direct HTTP fetch with HTML stripping
  • ✅ Career page URL pattern matching
  • ⚠️ OpenClaw API endpoint discovery needs verification
  • ❌ LinkedIn profile scraping (LinkedIn blocks most scrapers)
  • ❌ Org chart extraction
  • ❌ Recruiter email/phone extraction from career pages

What OpenClaw Should Do

  1. Scrape company career pages — find recruiter contact info, team pages, hiring manager names
  2. LinkedIn profile enrichment — verify discovered contacts, get current titles, mutual connections
  3. Company org chart — understand reporting structure to identify decision makers
  4. Job posting validation — verify a role is still open before calling about it

Architecture

Research Engine
  ├── Gemini + Google Search (company intel) ✅
  ├── Exa API (contact discovery on LinkedIn) ✅
  └── OpenClaw (deep browser scraping) ⚠️ IN PROGRESS
        ├── Career page scraper
        ├── LinkedIn profile enricher  
        └── Org chart extractor

Labels

feature, openclaw, research

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions