A powerful web-based lead generation application that automates the extraction of business contact information from public sources based on location and industry criteria.
π Live Demo: https://web-production-8914.up.railway.app
Businesses and sales professionals face significant challenges when trying to generate leads:
- Manual Research is Time-Consuming: Finding potential clients requires hours of manual searching across multiple platforms (Google Maps, Yellow Pages, business directories)
- Data Fragmentation: Business information is scattered across different websites, making it difficult to compile comprehensive lead lists
- Inconsistent Data Quality: Manual data entry leads to errors, missing fields, and incomplete contact information
- Scalability Issues: Manually collecting hundreds of leads is impractical and not scalable
- No Centralized Solution: Existing tools are either expensive, require technical expertise, or don't provide a simple web interface
- Sales Teams: Need to quickly identify potential customers in specific geographic areas
- Marketing Agencies: Require comprehensive business databases for targeted campaigns
- Startups: Looking to build their initial customer base in specific markets
- Business Development: Identifying partners, suppliers, or clients in particular industries
We've built a comprehensive web application that solves these challenges by:
- Automated Web Scraping: Uses Playwright to programmatically extract business data from Google Maps and Yellow Pages
- User-Friendly Interface: Simple web form where users only need to specify:
- Location (e.g., "New York, NY")
- Business Type/Industry (e.g., "restaurants", "law firms", "tech companies")
- Comprehensive Data Extraction: Automatically collects:
- Business name
- Full address
- Phone number
- Email address (extracted from websites)
- Website URL
- Business category
- Ratings (when available)
- Data Export: Download results in CSV or JSON format for further analysis
- Scalable Architecture: Can process multiple leads efficiently with proper rate limiting
- Backend: Flask (Python) with async Playwright for browser automation
- Frontend: Modern HTML/CSS/JavaScript with responsive design
- Scraping Engine: Playwright with intelligent selectors and error handling
- Data Processing: Automatic email extraction from business websites using regex patterns
- Export Functionality: Server-side CSV/JSON generation with proper encoding
- π― Simple Interface: Just enter location and business type - no technical knowledge required
- π Comprehensive Data: Extracts 7+ data fields per business
- π₯ Multiple Export Formats: Download as CSV or JSON
- π Fast & Efficient: Optimized scraping with proper delays to avoid blocking
- π¨ Modern UI: Beautiful, responsive design that works on all devices
- π Fallback Mechanisms: Automatically tries alternative sources if primary fails
- π‘οΈ Error Handling: Robust error handling with user-friendly messages
- Python 3.8 or higher
- pip (Python package manager)
- Internet connection
- Clone the repository:
git clone https://github.com/yourusername/lead-scraper.git
cd lead-scraper- Install Python dependencies:
pip install -r requirements.txt- Install Playwright browsers:
playwright install chromium- Start the server:
python app.py-
Open your browser: Navigate to
http://localhost:5000 -
Start scraping:
- Enter a location (e.g., "New York, NY")
- Enter business type (e.g., "restaurants")
- Set maximum results (default: 50)
- Click "Start Scraping"
-
Export your data:
- View results in the table
- Click "Export CSV" or "Export JSON" to download
| Field | Description | Example |
|---|---|---|
| Name | Business name | "Joe's Pizza" |
| Address | Full business address | "123 Main St, New York, NY 10001" |
| Phone | Contact phone number | "(555) 123-4567" |
| Business email | "contact@joespizza.com" | |
| Website | Business website URL | "https://www.joespizza.com" |
| Category | Business category/type | "Italian Restaurant" |
| Rating | Google Maps rating | "4.5 stars" |
lead-scraper/
βββ app.py # Flask backend with scraping logic
βββ index.html # Frontend web interface
βββ requirements.txt # Python dependencies
βββ setup.sh # Automated setup script
βββ README.md # This file
-
Flask API (
app.py):/- Serves the web interface/api/scrape- Handles scraping requests/api/export/csv- Exports data as CSV/api/export/json- Exports data as JSON
-
LeadScraper Class:
scrape_google_maps()- Primary scraping methodscrape_yellow_pages()- Fallback scraping methodextract_email_from_website()- Email extraction logic
-
Frontend (
index.html):- Form for user input
- Results table display
- Export functionality
- Loading states and error handling
- User Input: User enters location and business type in the web form
- API Request: Frontend sends POST request to
/api/scrapeendpoint - Browser Automation: Playwright launches headless browser
- Search Execution: Navigates to Google Maps with search query
- Data Extraction:
- Scrolls through results to load more listings
- Clicks on each business to get detailed information
- Extracts name, address, phone, website, rating, category
- Visits business website to extract email (if available)
- Data Compilation: All extracted data is structured into JSON format
- Response: Data is sent back to frontend
- Display: Results are shown in an interactive table
- Export: User can download data in CSV or JSON format
- Async/Await: Uses Python's asyncio for efficient concurrent operations
- Smart Selectors: Multiple fallback selectors to handle website changes
- Rate Limiting: Built-in delays to respect website resources
- Error Recovery: Continues scraping even if individual businesses fail
- Email Extraction: Regex-based email finding with filtering
This tool is designed for legitimate business research purposes only. Users must:
- β Respect websites' Terms of Service
- β Check robots.txt before scraping
- β Comply with GDPR, CCPA, and other data protection laws
- β Use scraped data responsibly and ethically
- β Not use for spam or unsolicited marketing
- β Respect rate limits and website resources
Disclaimer: This tool is for educational and legitimate business purposes. Users are responsible for ensuring their use complies with all applicable laws and regulations.
No results found:
- Try different search terms or locations
- Some locations may have limited business listings
- Check your internet connection
Slow scraping:
- This is normal - scraping includes delays to avoid being blocked
- Larger result sets take more time
- Be patient, especially for 50+ results
Missing emails:
- Not all businesses have publicly available emails
- Email extraction depends on website structure
- Some websites use contact forms instead of direct emails
Browser errors:
- Ensure Playwright browsers are installed:
playwright install chromium - Check that you have sufficient system resources
- Try running with
headless=Falsein app.py for debugging
Potential improvements for future versions:
- Support for multiple data sources (LinkedIn, Yelp, etc.)
- Database storage for lead management
- Email verification functionality
- Scheduled scraping jobs
- API rate limiting and queue system
- User authentication and saved searches
- Advanced filtering options
- Bulk export with custom fields
Contributions are welcome! Please feel free to submit a Pull Request.
This project is for educational and legitimate business purposes only.
Built with β€οΈ for efficient lead generation
Note: Always use this tool responsibly and in compliance with all applicable laws and website terms of service.