Weather Station Data Scraper

Overview

This project provides tools to scrape, process, and analyze weather station metadata. It extracts information about weather stations including their names, coordinates, and other metadata from HTML pages.

DISCLAIMER: This tool is for educational purposes only. Users are responsible for ensuring compliance with the terms of service of any website they interact with. Always check robots.txt and respect rate limits when scraping websites.

Features

Web scraping of weather station metadata
Data cleaning and processing
Coordinate extraction and standardization
CSV export functionality

Project Structure

weather-data-scraper/
├── README.md
├── requirements.txt
├── data/              # Directory for input/output data files
├── src/               # Source code
│   ├── scraper.py     # Web scraping functionality
│   ├── data_processor.py # Data processing and cleaning
│   └── utils.py       # Utility functions
└── tests/             # Test directory

Installation

Clone this repository:

git clone https://github.com/yourusername/weather-data-scraper.git
cd weather-data-scraper

Create a virtual environment (optional but recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Usage

Basic Usage

from src.scraper import extract_data
from src.data_processor import enrich_data, clean_data

# Extract data from a single URL
data = extract_data("example-url.com/station/123")

# Process a CSV file containing station URLs
enriched_df = enrich_data("data/station_list.csv")

# Clean the data
cleaned_df = clean_data(enriched_df)

# Save to CSV
cleaned_df.to_csv("data/cleaned_station_list.csv", index=False)

Example Workflow

Prepare a CSV file with station URLs
Run the enrichment process to extract metadata
Clean the data to remove empty columns and standardize formats
Analyze the resulting dataset

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
examples		examples
src		src
tests		tests
.gitignore.txt		.gitignore.txt
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Weather Station Data Scraper

Overview

Features

Project Structure

Installation

Usage

Basic Usage

Example Workflow

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Weather Station Data Scraper

Overview

Features

Project Structure

Installation

Usage

Basic Usage

Example Workflow

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages