A Python program that scrapes player statistics from Perfect Game team pages. Extracts comprehensive player data including batting statistics (At Bats, Batting Average, OPS, Slugging) and pitching statistics (ERA, WHIP, Strike %, Innings Pitched).
✅ FULLY WORKING - Successfully extracts statistics with subscription authentication!
# Scrape a team (just use the team ID!)
python3 pg_scraper.py 967917 --username "your_email" --password "your_password"
# Test a single player
python3 pg_scraper.py --test-profile "https://www.perfectgame.org/Players/Playerprofile.aspx?ID=1161417" --username "your_email" --password "your_password"- Scrapes Perfect Game team pages for player statistics
- Supports authentication with Perfect Game credentials
- Extracts comprehensive baseball statistics: At Bats, Batting Average, OPS, Slugging, ERA, WHIP, Strike %, Innings Pitched
- Saves data to CSV format
- Command-line interface for easy usage
- Respectful scraping with delays and proper headers
- Comprehensive error handling and logging
- Clone or download this repository
- Run the installation script:
./install.shThis will automatically create a virtual environment, install all dependencies including pandas and lxml, and set up the scraper.
- Clone or download this repository
- Create a Python virtual environment:
# Create virtual environment
python3 -m venv venv
# Activate virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
# venv\Scripts\activate- Install the required dependencies:
pip install -r requirements.txt- Make the script executable (macOS/Linux):
chmod +x pg_scraper.pyThe scraper now accepts just the team ID instead of the full URL! Simply use the team ID number from the Perfect Game team page.
Old way (still works):
https://www.perfectgame.org/Events/Tournaments/Teams/Default.aspx?team=967917
New simplified way:
967917
The scraper automatically constructs the full URL from the team ID.
# If using virtual environment, make sure it's activated first:
source venv/bin/activate # On macOS/Linux
# venv\Scripts\activate # On Windows
python pg_scraper.py 967917# If using virtual environment, make sure it's activated first:
source venv/bin/activate # On macOS/Linux
# venv\Scripts\activate # On Windows
python pg_scraper.py 967917 -u your_email -p your_passwordteam_id: The Perfect Game team ID (e.g., 967917) - much simpler than full URL!-u, --username: Perfect Game email address for authentication (optional)-p, --password: Perfect Game password for authentication (optional)-o, --output: Output CSV filename (default: team_stats.csv)-v, --verbose: Enable verbose logging-h, --help: Show help message
Backward Compatibility: You can still provide full URLs if needed - the script detects and handles both formats.
# Activate virtual environment first (if using one)
source venv/bin/activate # On macOS/Linux
# Basic scraping without authentication
python pg_scraper.py 967917
# With authentication and custom output file
python pg_scraper.py 967917 -u myemail@example.com -p mypassword -o my_team_stats.csv
# With verbose logging
python pg_scraper.py 967917 -vThe program can extract different types of data depending on the page type:
For tournament roster pages (like the example URL), the scraper now:
-
Extracts roster information:
- Name: Player's name
- Height: Player's height
- Weight: Player's weight
- Bats/Throws: Batting and throwing hand (R/R, L/R, etc.)
- Graduation Year: Expected graduation year
- School: Current school
- Hometown: Player's hometown
-
Follows each player's profile link to get detailed statistics:
- At Bats: Number of at-bats from player profile
- Batting Average: Actual batting average from player profile
- OPS: On-base Plus Slugging from player profile
- Slugging: Slugging percentage from player profile
- ERA: Earned Run Average from player profile (for pitchers)
- WHIP: Walks and Hits per Inning Pitched from player profile (for pitchers)
- Strike %: Strike percentage from player profile (for pitchers)
- Innings Pitched: Total innings pitched from player profile (for pitchers)
Note: The scraper automatically traverses each player's profile page to extract their actual batting and pitching statistics, providing comprehensive player data beyond just roster information.
For pages with actual game statistics, the scraper extracts:
- Name: Player's name
- At Bats: Number of at-bats
- Batting Average: Player's batting average
- OPS: On-base Plus Slugging percentage
- Slugging: Slugging percentage
- ERA: Earned Run Average (for pitchers)
- WHIP: Walks and Hits per Inning Pitched (for pitchers)
- Strike %: Strike percentage (for pitchers)
- Innings Pitched: Total innings pitched (for pitchers)
Data is saved to a CSV file and also displayed in the terminal.
requests: For HTTP requests and session managementbeautifulsoup4: For HTML parsing with lxml backendpandas: For efficient data manipulation and CSV exportlxml: High-performance XML/HTML parser (required)urllib3: URL handling utilities
Note: Selenium is NOT required. The scraper uses pure HTTP requests with session management for authentication.
- Authentication: If credentials are provided, the scraper attempts to log into Perfect Game
- Page Retrieval: Fetches the team page using proper browser headers
- Roster Parsing: Extracts player names and profile links from tournament roster tables
- Profile Traversal: For each player, visits their individual profile page to get detailed statistics
- Statistics Extraction:
- Uses advanced HTML parsing to extract statistics from specific span elements
- Batting stats: At Bats (
ctl04_lblAB), Batting Average (lbl_A_PB_S_AVG), OPS (lbl_A_PB_S_OPS), Slugging (lbl_A_PB_S_SLG) - Pitching stats: ERA (
ctl04_lblERA), WHIP (ctl04_lblWHIP), Strike % (ctl04_lblSPercent), Innings Pitched (ctl04_lblIP) - Robust validation to ensure only numeric statistics are captured
- Falls back to general table parsing for other page formats
- Data Integration: Combines roster information with detailed statistics from player profiles
- Output: Saves comprehensive player data to CSV and displays results
- Network errors are caught and reported
- Missing data fields are marked as 'N/A'
- Invalid URLs are validated before processing
- Authentication failures are handled gracefully
- Comprehensive logging for troubleshooting
- The scraper includes delays between requests to be respectful to the Perfect Game servers
- It uses proper browser headers to avoid being blocked
- The parsing logic is designed to handle various table formats that might be used on Perfect Game
- Authentication is optional but may be required for accessing certain team pages
-
Installation hangs on "building wheel":
- This typically happens with pandas or lxml packages
- lxml specific fix: If lxml fails to install, try:
STATIC_DEPS=true pip3 install lxml - Alternative: Try installing with pre-built binaries only:
pip install --only-binary=:all: -r requirements.txt - On macOS: Install Xcode command line tools:
xcode-select --install - On Linux: Install build essentials:
sudo apt-get install build-essential python3-dev
-
Virtual environment issues:
- Make sure to activate the virtual environment:
source venv/bin/activate - If activation fails, try recreating it:
rm -rf venv && ./install.sh - Use the convenience script:
source activate.sh
- Make sure to activate the virtual environment:
-
Import errors:
- Ensure you've activated the virtual environment before running the script
- Verify all dependencies are installed:
pip list - Both pandas and lxml are now required dependencies
-
No data found: Check that the URL is correct and the page contains player statistics
-
Authentication issues: Verify your username and password are correct
-
Network errors: Check your internet connection and try again
-
Parsing errors: Use the
-vflag for verbose logging to see detailed information
- This tool is for educational and personal use only
- Respect Perfect Game's robots.txt and terms of service
- Use reasonable delays between requests
- Don't overload their servers with excessive requests
- Consider reaching out to Perfect Game for official API access for commercial use
This project is for educational purposes. Please respect the terms of service of the Perfect Game website.
