Skip to content

AndresRoosen/Outreach-Tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Business Outreach Tool ~ by Andres Roosen

A comprehensive business discovery and outreach automation tool designed for local businesses. The tool combines web scraping, website analysis, and automated email generation to streamline business prospecting and outreach.

Enjoy!

🌟 Features

Business Discovery & Analysis

  • Multi-source Business Search: Searches for local businesses using Google Places API with configurable categories
  • Grid-based Search: Implements intelligent grid search to maximize coverage and find more businesses
  • Website Performance Analysis: Tests website speed and SEO using Google PageSpeed Insights API
  • Domain Intelligence: Extracts domain age information for better business insights
  • Email Extraction: Automatically finds email addresses from business websites using multiple techniques
  • Contact Information Mining: Extracts additional contact details including phone numbers and social media links
  • Data Validation: Built-in validation to ensure data quality and completeness

Smart Email Generation

  • AI-Powered Email Drafts: Uses OpenAI GPT-4 to generate personalized outreach emails in Dutch
  • Context-Aware Messaging: Tailors email content based on website analysis (no website, slow site, medium performance)
  • Professional Templates: Includes proven email templates for web development services
  • Bulk Generation: Processes multiple businesses with intelligent rate limiting

Technical Features

  • Proxy Support: Built-in proxy rotation to avoid rate limiting
  • User-Agent Rotation: Random user-agent selection for better scraping reliability
  • Concurrent Processing: Multi-threaded processing for faster data collection
  • Rate Limiting: Intelligent rate limiting to respect API quotas
  • Progress Tracking: Real-time progress indicators with tqdm
  • Comprehensive Logging: Detailed logging for monitoring and debugging
  • Multiple Export Formats: Saves results in Excel, CSV, and JSON formats

πŸ“‹ Requirements

  • Python: 3.7 or higher
  • API Keys Required:
    • GOOGLE_PLACES_API_KEY (Required)
    • PAGESPEED_API_KEY (Optional, can use same as Google Places)
    • OPENAI_API_KEY or 'GEMINI_API_KEY' (Required for email generation)

πŸš€ Setup

  1. Clone the repository

    git clone <repository-url>
    cd outreach-tool
  2. Install dependencies

    pip install -r docs/requirements.txt
  3. Configure environment variables Create a .env file in the project root:

    GOOGLE_PLACES_API_KEY=your_google_places_api_key
    PAGESPEED_API_KEY=your_pagespeed_api_key  # Optional
    OPENAI_API_KEY=your_openai_api_key        # For email generation
    or
    GEMINI_API_KEY=your_gemini_api_key        # Better email generation alternative (in my opinion)

πŸ”‘ API Keys Setup

Google Places API

  1. Visit the Google Cloud Console
  2. Create a new project or select existing one
  3. Enable the Places API, PageSpeed Insights API and Gemini API
  4. Create credentials (API Key)
  5. Restrict the key to specific APIs for security

OpenAI API

  1. Create an account at OpenAI
  2. Navigate to API keys section
  3. Create a new API key
  4. Ensure you have sufficient credits for email generation

πŸ’» Usage

Business Scraping

Basic usage:

python src/business_scraper.py

Advanced options:

python src/business_scraper.py -t restaurant,cafe -r 3000 -c 10 -g 2 -o custom_output

Available parameters:

  • -t, --types TYPE1,TYPE2,... - Business types to search for (default: auto-related businesses)
  • -r, --radius METERS - Search radius in meters (default: 5000)
  • -c, --concurrent N - Maximum concurrent workers (default: 15)
  • -g, --grid-size N - Search grid size for coverage (default: 3)
  • -o, --output DIRECTORY - Output directory (default: ./output)
  • -h, --help - Show help message

Email Generation

The email generator works as a two-step process that builds on the business scraper output:

Step 1: Run the business scraper

python src/business_scraper.py

Step 2: Generate personalized emails

python src/email_generator.py

Email Generation Process

  1. Data Input: Automatically loads the most recent JSON file from business scraping

  2. Business Classification: Analyzes each business and categorizes them into:

    • no_website: Businesses without a functioning website
    • slow_site: Websites with PageSpeed score < 50
    • medium_site: Websites with PageSpeed score 50-80
    • skip: High-performing sites (score > 80) - no email generated
  3. AI-Powered Personalization: For each qualifying business:

    • Uses GPT-4o-mini to generate personalized Dutch emails
    • Tailors messaging based on classification and website analysis
    • Includes specific performance issues found during website analysis
    • References the business owner's web development experience
  4. Output Generation:

    • Saves individual email drafts as .txt files in output/emails/
    • Creates comprehensive tracking report in output/email_results.csv
    • Includes retry logic and error handling for API failures

Email Templates

The generator uses three distinct templates:

  • No Website Template: Emphasizes missing online presence and lost customers
  • Slow Site Template: References specific PageSpeed issues and performance problems
  • Medium Site Template: Suggests optimization opportunities for already-functioning sites

Configuration via environment variables:

  • MAX_EMAILS=50 - Limit number of emails to generate (0 = no limit)
  • EMAIL_DELAY=0.5 - Delay between API calls in seconds

πŸ“Š Output Files

The tool generates comprehensive output in multiple formats:

Business Data

  • Excel: businesses_YYYYMMDD_HHMMSS.xlsx
  • CSV: businesses_YYYYMMDD_HHMMSS.csv
  • JSON: businesses_YYYYMMDD_HHMMSS.json

Email Campaigns

  • Individual Drafts: output/emails/<business_name>.txt
  • Campaign Report: output/email_results.csv

Data Fields Collected

  • Business name, category, address, phone numbers
  • Website URL and performance metrics (PageSpeed scores)
  • Email addresses and additional contact information
  • Core Web Vitals (LCP, CLS, FCP, TTI)
  • SEO and accessibility scores
  • Domain age and business status
  • Opening hours and ratings

⚑ Performance & Optimization

Rate Limiting

  • Built-in intelligent rate limiting for all APIs
  • Exponential backoff for failed requests
  • Proxy rotation to distribute load
  • Configurable delays and timeouts

Best Practices

  • Concurrent Workers: Start with 10-15 workers, adjust based on API responses
  • Grid Size: Use 2-3 for dense areas, 4-5 for sparse coverage
  • Timing: Run during off-peak hours for better API performance
  • Monitoring: Check scraper.log for detailed execution logs

πŸ› οΈ Troubleshooting

Common Issues

"Missing API Keys"

  • Verify your .env file exists and contains valid keys
  • Check API key permissions in Google Cloud Console

"Rate Limiting Errors"

  • Reduce concurrent workers with -c parameter
  • Increase delays between requests
  • Check API quota limits in respective consoles

"No Results Found"

  • Verify location coordinates are correct
  • Try different business types or larger radius
  • Check if the area has businesses in your selected categories

"Email Generation Fails"

  • Verify OpenAI API key or GEMINI_API_KEY and sufficient credits
  • Check internet connection and API status
  • Review rate limits for OpenAI API or Gemini API

Logs and Debugging

  • Check scraper.log for detailed execution information
  • Use verbose logging to track API calls and responses
  • Monitor output directory for partial results

πŸ“ˆ Project Structure

outreach-tool/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ business_scraper.py    # Main scraping engine
β”‚   └── email_generator.py     # AI email generation
β”œβ”€β”€ output/                    # Generated data and emails
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ main_README.md        # This file
β”‚   └── requirements.txt      # Python dependencies
β”œβ”€β”€ user_agents.txt           # User agent rotation list
└── .env                      # Environment variables (create this)

πŸ“ License

MIT

About

Business scraper and email generator -- for cold-outreach or specific business data applications

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages