A Go application that scrapes daily river flow data from H2OLine and automatically updates a PostgreSQL (Supabase) database.
It detects scheduled release days, prevents duplicate entries, and ensures the latest water flow information is always recorded.
This tool automates data collection from H2OLine and updates a Supabase database with the latest:
- Flow rate (CFS)
- Posting time
- Forecast summary
- Expiration date
- Whether today is a release day
It can be run manually or on a schedule (e.g., via cron).
Web Scraping — Uses colly and goquery
Smart Parsing — Extracts clean text from messy HTML
Release Detection — Flags if the current day is a planned release
Database Integration — Inserts or updates data in Supabase PostgreSQL
Error Handling — Logs failed scrapes and prevents duplicate writes
| Library | Purpose |
|---|---|
github.com/gocolly/colly |
Web scraping |
github.com/PuerkitoBio/goquery |
HTML parsing |
github.com/jackc/pgx/v4 |
PostgreSQL driver |
github.com/joho/godotenv |
Load environment variables |
-
Scrape HTML using Colly
-
Parse content with GoQuery to extract:
- Publish date
- Expiration date
- Flow rate (CFS)
- Posting time
- Forecast text
-
Compare with most recent database row
-
Update or insert based on changes
-
Mark release days using predefined map in
isRelease()
- Split logic into smaller files (
scrape.go,db.go, etc.) - Add unit tests for parsing functions
- Add Slack or email alerts for failed scrapes
- Support multiple source URLs
- Use environment variable validation
- Implement as a Cron job and integrate to larger, What's the Flow application