This project focuses on extracting and organizing data about U.S. National Parks from the National Park Service website. The data includes park names, categories, descriptions, addresses, phone numbers, and social media links, gathered programmatically using Python and web scraping techniques.
- Extract links to all U.S. states from the National Park Service website.
- Collect information about each park in every state, including:
- Park Name
- Category (e.g., National Monument, National Park, etc.)
- Description
- Address (split into multiple lines)
- City, State, and Zip Code
- Phone Number
- Social Media Links (Facebook, Twitter, Instagram, YouTube, Flickr)
- Store the extracted data in a CSV file with a standardized schema.
- Programming Language: Python
- Libraries:
requestsfor sending HTTP requests.BeautifulSoup(frombs4) for web scraping.pandasfor data manipulation and storage.
- Scraped the main National Park Service website to extract links for all states using the dropdown menu.
- For each state, navigated to its page to extract the list of parks, including:
- Name, category, and description.
- Links to individual park pages.
- From each park’s page, extracted detailed information such as:
- Address (Line 1, Line 2, Line 3).
- City, state, and zip code.
- Phone number and available social media links.
- All data was cleaned and stored in a CSV file (
All_Parks_Data.csv) with the following columns:- Name
- Category
- Description
- Street Address Line 1
- Line 2
- Line 3
- City
- State
- Zip Code
- Phone Number
- YouTube
- Flickr
- Clone the repository:
git clone https://github.com/<your-username>/National-Parks-Data-Extraction.git