Skip to content

This Python project scrapes Wikipedia for the largest U.S. companies by revenue, converting the data into a clean CSV using BeautifulSoup and pandas. It simplifies data collection for analysis and research.

Notifications You must be signed in to change notification settings

aakk23/wiki-webscrapper-python

Repository files navigation

πŸ“Š Wikipedia Scraper: Largest U.S. Companies by Revenue

This project is a simple Python script that scrapes a Wikipedia page to extract a list of the largest companies in the United States by revenue. The data is then stored in a structured CSV format for further analysis or reference.


🧰 Tech Stack

  • Python
  • BeautifulSoup (bs4) – For HTML parsing and web scraping
  • Pandas – For data manipulation and exporting to CSV
  • Requests – For sending HTTP requests

πŸ” What It Does


πŸ“ Output

The scraped data is saved to: local storage

You can modify this path as needed for your environment.


πŸ“Œ Notes

This script assumes the target table is the first one on the page. If Wikipedia changes the structure, the script may need to be updated.

Always follow Wikipedia's Terms of Use when scraping.

πŸ“Œ Author: Aakkash Aswin

This project is a part of my data analytics portfolio and highlights my Python proficiency relevant to data analyst roles.

Connect with me on LinkedIn

About

This Python project scrapes Wikipedia for the largest U.S. companies by revenue, converting the data into a clean CSV using BeautifulSoup and pandas. It simplifies data collection for analysis and research.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages