A comprehensive business discovery and outreach automation tool designed for local businesses. The tool combines web scraping, website analysis, and automated email generation to streamline business prospecting and outreach.
Enjoy!
- Multi-source Business Search: Searches for local businesses using Google Places API with configurable categories
- Grid-based Search: Implements intelligent grid search to maximize coverage and find more businesses
- Website Performance Analysis: Tests website speed and SEO using Google PageSpeed Insights API
- Domain Intelligence: Extracts domain age information for better business insights
- Email Extraction: Automatically finds email addresses from business websites using multiple techniques
- Contact Information Mining: Extracts additional contact details including phone numbers and social media links
- Data Validation: Built-in validation to ensure data quality and completeness
- AI-Powered Email Drafts: Uses OpenAI GPT-4 to generate personalized outreach emails in Dutch
- Context-Aware Messaging: Tailors email content based on website analysis (no website, slow site, medium performance)
- Professional Templates: Includes proven email templates for web development services
- Bulk Generation: Processes multiple businesses with intelligent rate limiting
- Proxy Support: Built-in proxy rotation to avoid rate limiting
- User-Agent Rotation: Random user-agent selection for better scraping reliability
- Concurrent Processing: Multi-threaded processing for faster data collection
- Rate Limiting: Intelligent rate limiting to respect API quotas
- Progress Tracking: Real-time progress indicators with tqdm
- Comprehensive Logging: Detailed logging for monitoring and debugging
- Multiple Export Formats: Saves results in Excel, CSV, and JSON formats
- Python: 3.7 or higher
- API Keys Required:
GOOGLE_PLACES_API_KEY(Required)PAGESPEED_API_KEY(Optional, can use same as Google Places)OPENAI_API_KEYor 'GEMINI_API_KEY' (Required for email generation)
-
Clone the repository
git clone <repository-url> cd outreach-tool
-
Install dependencies
pip install -r docs/requirements.txt
-
Configure environment variables Create a
.envfile in the project root:GOOGLE_PLACES_API_KEY=your_google_places_api_key PAGESPEED_API_KEY=your_pagespeed_api_key # Optional OPENAI_API_KEY=your_openai_api_key # For email generation or GEMINI_API_KEY=your_gemini_api_key # Better email generation alternative (in my opinion)
- Visit the Google Cloud Console
- Create a new project or select existing one
- Enable the Places API, PageSpeed Insights API and Gemini API
- Create credentials (API Key)
- Restrict the key to specific APIs for security
- Create an account at OpenAI
- Navigate to API keys section
- Create a new API key
- Ensure you have sufficient credits for email generation
Basic usage:
python src/business_scraper.pyAdvanced options:
python src/business_scraper.py -t restaurant,cafe -r 3000 -c 10 -g 2 -o custom_outputAvailable parameters:
-t, --types TYPE1,TYPE2,...- Business types to search for (default: auto-related businesses)-r, --radius METERS- Search radius in meters (default: 5000)-c, --concurrent N- Maximum concurrent workers (default: 15)-g, --grid-size N- Search grid size for coverage (default: 3)-o, --output DIRECTORY- Output directory (default: ./output)-h, --help- Show help message
The email generator works as a two-step process that builds on the business scraper output:
Step 1: Run the business scraper
python src/business_scraper.pyStep 2: Generate personalized emails
python src/email_generator.py-
Data Input: Automatically loads the most recent JSON file from business scraping
-
Business Classification: Analyzes each business and categorizes them into:
no_website: Businesses without a functioning websiteslow_site: Websites with PageSpeed score < 50medium_site: Websites with PageSpeed score 50-80skip: High-performing sites (score > 80) - no email generated
-
AI-Powered Personalization: For each qualifying business:
- Uses GPT-4o-mini to generate personalized Dutch emails
- Tailors messaging based on classification and website analysis
- Includes specific performance issues found during website analysis
- References the business owner's web development experience
-
Output Generation:
- Saves individual email drafts as
.txtfiles inoutput/emails/ - Creates comprehensive tracking report in
output/email_results.csv - Includes retry logic and error handling for API failures
- Saves individual email drafts as
The generator uses three distinct templates:
- No Website Template: Emphasizes missing online presence and lost customers
- Slow Site Template: References specific PageSpeed issues and performance problems
- Medium Site Template: Suggests optimization opportunities for already-functioning sites
Configuration via environment variables:
MAX_EMAILS=50- Limit number of emails to generate (0 = no limit)EMAIL_DELAY=0.5- Delay between API calls in seconds
The tool generates comprehensive output in multiple formats:
- Excel:
businesses_YYYYMMDD_HHMMSS.xlsx - CSV:
businesses_YYYYMMDD_HHMMSS.csv - JSON:
businesses_YYYYMMDD_HHMMSS.json
- Individual Drafts:
output/emails/<business_name>.txt - Campaign Report:
output/email_results.csv
- Business name, category, address, phone numbers
- Website URL and performance metrics (PageSpeed scores)
- Email addresses and additional contact information
- Core Web Vitals (LCP, CLS, FCP, TTI)
- SEO and accessibility scores
- Domain age and business status
- Opening hours and ratings
- Built-in intelligent rate limiting for all APIs
- Exponential backoff for failed requests
- Proxy rotation to distribute load
- Configurable delays and timeouts
- Concurrent Workers: Start with 10-15 workers, adjust based on API responses
- Grid Size: Use 2-3 for dense areas, 4-5 for sparse coverage
- Timing: Run during off-peak hours for better API performance
- Monitoring: Check
scraper.logfor detailed execution logs
"Missing API Keys"
- Verify your
.envfile exists and contains valid keys - Check API key permissions in Google Cloud Console
"Rate Limiting Errors"
- Reduce concurrent workers with
-cparameter - Increase delays between requests
- Check API quota limits in respective consoles
"No Results Found"
- Verify location coordinates are correct
- Try different business types or larger radius
- Check if the area has businesses in your selected categories
"Email Generation Fails"
- Verify OpenAI API key or GEMINI_API_KEY and sufficient credits
- Check internet connection and API status
- Review rate limits for OpenAI API or Gemini API
- Check
scraper.logfor detailed execution information - Use verbose logging to track API calls and responses
- Monitor output directory for partial results
outreach-tool/
βββ src/
β βββ business_scraper.py # Main scraping engine
β βββ email_generator.py # AI email generation
βββ output/ # Generated data and emails
βββ docs/
β βββ main_README.md # This file
β βββ requirements.txt # Python dependencies
βββ user_agents.txt # User agent rotation list
βββ .env # Environment variables (create this)
MIT