Skip to content

creavill/NBDC_data_collector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Weather Station Data Scraper

Overview

This project provides tools to scrape, process, and analyze weather station metadata. It extracts information about weather stations including their names, coordinates, and other metadata from HTML pages.

DISCLAIMER: This tool is for educational purposes only. Users are responsible for ensuring compliance with the terms of service of any website they interact with. Always check robots.txt and respect rate limits when scraping websites.

Features

  • Web scraping of weather station metadata
  • Data cleaning and processing
  • Coordinate extraction and standardization
  • CSV export functionality

Project Structure

weather-data-scraper/
├── README.md
├── requirements.txt
├── data/              # Directory for input/output data files
├── src/               # Source code
│   ├── scraper.py     # Web scraping functionality
│   ├── data_processor.py # Data processing and cleaning
│   └── utils.py       # Utility functions
└── tests/             # Test directory

Installation

  1. Clone this repository:
git clone https://github.com/yourusername/weather-data-scraper.git
cd weather-data-scraper
  1. Create a virtual environment (optional but recommended):
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

Usage

Basic Usage

from src.scraper import extract_data
from src.data_processor import enrich_data, clean_data

# Extract data from a single URL
data = extract_data("example-url.com/station/123")

# Process a CSV file containing station URLs
enriched_df = enrich_data("data/station_list.csv")

# Clean the data
cleaned_df = clean_data(enriched_df)

# Save to CSV
cleaned_df.to_csv("data/cleaned_station_list.csv", index=False)

Example Workflow

  1. Prepare a CSV file with station URLs
  2. Run the enrichment process to extract metadata
  3. Clean the data to remove empty columns and standardize formats
  4. Analyze the resulting dataset

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages