This project is a simple Python script that scrapes a Wikipedia page to extract a list of the largest companies in the United States by revenue. The data is then stored in a structured CSV format for further analysis or reference.
- Python
- BeautifulSoup (bs4) β For HTML parsing and web scraping
- Pandas β For data manipulation and exporting to CSV
- Requests β For sending HTTP requests
- Sends a request to the Wikipedia page:
List of largest companies in the United States by revenue - Parses the first HTML table on the page
- Extracts all rows and cleans the data
- Saves the result as a CSV file locally
The scraped data is saved to: local storage
You can modify this path as needed for your environment.
This script assumes the target table is the first one on the page. If Wikipedia changes the structure, the script may need to be updated.
Always follow Wikipedia's Terms of Use when scraping.
This project is a part of my data analytics portfolio and highlights my Python proficiency relevant to data analyst roles.