National Parks Data Extraction Project

Overview

This project focuses on extracting and organizing data about U.S. National Parks from the National Park Service website. The data includes park names, categories, descriptions, addresses, phone numbers, and social media links, gathered programmatically using Python and web scraping techniques.

Objectives

Extract links to all U.S. states from the National Park Service website.
Collect information about each park in every state, including:
- Park Name
- Category (e.g., National Monument, National Park, etc.)
- Description
- Address (split into multiple lines)
- City, State, and Zip Code
- Phone Number
- Social Media Links (Facebook, Twitter, Instagram, YouTube, Flickr)
Store the extracted data in a CSV file with a standardized schema.

Tools and Technologies

Programming Language: Python
Libraries:
- requests for sending HTTP requests.
- BeautifulSoup (from bs4) for web scraping.
- pandas for data manipulation and storage.

Steps

1. Retrieve Links to States

Scraped the main National Park Service website to extract links for all states using the dropdown menu.

2. Extract Park Data

For each state, navigated to its page to extract the list of parks, including:
- Name, category, and description.
- Links to individual park pages.

3. Extract Detailed Park Information

From each park’s page, extracted detailed information such as:
- Address (Line 1, Line 2, Line 3).
- City, state, and zip code.
- Phone number and available social media links.

4. Store Data in CSV

All data was cleaned and stored in a CSV file (All_Parks_Data.csv) with the following columns:
- Name
- Category
- Description
- Street Address Line 1
- Line 2
- Line 3
- City
- State
- Zip Code
- Phone Number
- Facebook
- Twitter
- Instagram
- YouTube
- Flickr

How to Run

Clone the repository:

git clone https://github.com/<your-username>/National-Parks-Data-Extraction.git

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
All_Parks_Data(Alishbah Fahad).csv		All_Parks_Data(Alishbah Fahad).csv
DASC5301DataScienceFall2021-PA1-1.pdf		DASC5301DataScienceFall2021-PA1-1.pdf
List_of_Parks(Alishbah Fahad).csv		List_of_Parks(Alishbah Fahad).csv
National_Parks.ipynb - Colaboratory.pdf		National_Parks.ipynb - Colaboratory.pdf
Parks_Info(Alishbah Fahad).csv		Parks_Info(Alishbah Fahad).csv
Parks_Links(Alishbah Fahad).csv		Parks_Links(Alishbah Fahad).csv
README.md		README.md
dasc5301_fall2021_p1_solution.ipynb		dasc5301_fall2021_p1_solution.ipynb
list_of_states(Alishbah Fahad).csv		list_of_states(Alishbah Fahad).csv
national_parks(Alishbah Fahad).py		national_parks(Alishbah Fahad).py
national_parks.py		national_parks.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

National Parks Data Extraction Project

Overview

Objectives

Tools and Technologies

Steps

1. Retrieve Links to States

2. Extract Park Data

3. Extract Detailed Park Information

4. Store Data in CSV

How to Run

About

Uh oh!

Releases

Packages

Languages

Ashfadi/Web-Scraping

Folders and files

Latest commit

History

Repository files navigation

National Parks Data Extraction Project

Overview

Objectives

Tools and Technologies

Steps

1. Retrieve Links to States

2. Extract Park Data

3. Extract Detailed Park Information

4. Store Data in CSV

How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages