A Go-based CLI tool to scrape electrical usage data from utility websites (NYSEG and Con Edison). Uses browser automation to extract hourly kWh data and stores it in a local SQLite database. Publishes historical data to Home Assistant for detailed energy tracking and analysis.
- Automatic authentication: Configure username/password for automatic login
- Hourly usage data: Fetches detailed hourly kWh readings (24 readings per day)
- Browser automation: Uses chromedp for JavaScript-rendered pages
- Duplicate prevention: Won't re-scrape existing data
- SQLite storage: Local database for tracking usage over time
- Home Assistant integration: Publishes historical hourly data via AppDaemon
- Smart publishing: Tracks published records to avoid re-uploading
- Statistics generation: Automatic compilation of energy and cost statistics for Home Assistant Energy dashboard
- Automated sync: Daily cron script for fetch → publish → statistics workflow
- Debug mode: Inspect pages to troubleshoot scraping issues
# Clone the repository
git clone https://github.com/jgoulah/gridscraper.git
cd gridscraper
# Build the binary
make build
# Optional: Install to system (installs binary and sync script)
sudo make installSee the Makefile for additional targets (make help).
Add your credentials to config.yaml:
cookies:
# NYSEG credentials
nyseg_username: your-username
nyseg_password: your-password
# Con Edison credentials
coned_username: your-username
coned_password: your-password
coned_challenge_answer: "Answer to your security question"The application will automatically log in and refresh authentication as needed when fetching data.
Alternative: Manual Login (optional)
If you prefer not to store credentials in the config file, you can manually capture cookies:
# Login to NYSEG (opens browser for manual login)
gridscraper login nyseg
# Login to Con Edison (opens browser for manual login)
gridscraper login conedThis will open a browser window, wait for you to log in, then extract and save the cookies to config.yaml.
Fetch your hourly usage data:
# Fetch NYSEG data (last 90 days by default)
gridscraper fetch nyseg
# Fetch Con Edison data
gridscraper fetch conedThis will:
- Automatically log in if needed (using credentials or existing cookies)
- Download hourly usage data (NYSEG: API calls, Con Edison: CSV export via browser automation)
- Store hourly readings (24 per day) in
./data.db - Skip any timestamps that already exist (duplicate prevention)
- Fetch the last 90 days by default (configurable via
days_to_fetch,nyseg_days_to_fetch, orconed_days_to_fetchin config) - For Con Edison: Downloads 15-minute intervals and aggregates to hourly data
List all stored usage data:
# View all data (all services)
gridscraper list
# View only NYSEG data
gridscraper list --service nyseg
# View only Con Edison data
gridscraper list --service conedOutput example:
NYSEG Usage Data:
----------------------------------------
Date kWh
----------------------------------------
2024-11-01 45.23
2024-11-02 52.10
2024-11-03 48.75
----------------------------------------
Total: 146.08 kWh (3 records)
To publish your historical hourly data to Home Assistant, you need to set up AppDaemon:
- In Home Assistant, go to Settings → Add-ons → Add-on Store
- Search for "AppDaemon" and install it
- Start the AppDaemon add-on
Find your AppDaemon directory:
From your Home Assistant host, look for the AppDaemon add-on directory:
ls -la /addon_configs/ | grep appdaemonIt will typically be something like /addon_configs/a0d7b954_appdaemon/ (where a0d7b954 is the add-on slug).
Create the backfill script:
Copy the template script from scripts/appdaemon/backfill_state.py in this repository to /addon_configs/{addon_slug}_appdaemon/apps/backfill_state.py on your Home Assistant host.
This script provides three HTTP endpoints:
/api/appdaemon/backfill_state- Stores individual hourly consumption values/api/appdaemon/generate_statistics- Generates energy statistics for the Energy dashboard/api/appdaemon/generate_cost_statistics- Generates cost statistics for the Energy dashboard
Create or edit /addon_configs/{addon_slug}_appdaemon/apps/apps.yaml:
backfill_state:
module: backfill_state
class: BackfillStateRestart the AppDaemon add-on to load the new script. Check the logs to verify you see:
Backfill State API endpoints registered:
- /api/appdaemon/backfill_state
- /api/appdaemon/generate_statistics
- /api/appdaemon/generate_cost_statistics
Add template sensors to your Home Assistant configuration.yaml:
template:
- sensor:
# NYSEG sensor
- name: "NYSEG Energy Usage Direct"
unique_id: "nyseg_energy_usage_direct"
unit_of_measurement: 'kWh'
device_class: energy
state_class: total_increasing
state: "{{ float(0) }}"
# Con Edison sensor
- name: "ConEd Energy Usage Direct"
unique_id: "coned_energy_usage_direct"
unit_of_measurement: 'kWh'
device_class: energy
state_class: total_increasing
state: "{{ float(0) }}"Important Notes:
- The
entity_idis generated from thenamefield (e.g., "ConEd Energy Usage Direct" becomessensor.coned_energy_usage_direct) - The
entity_idmust match theentity_idin yourconfig.yaml - The
statefield is just a placeholder; historical data is backfilled directly to the database state_class: total_increasingis required for the Energy dashboard
After adding, restart Home Assistant or reload template entities (Developer Tools → YAML → Template Entities).
- Go to Settings → Dashboards → Energy
- Click "Add Consumption"
- Select your sensor (e.g.,
sensor.nyseg_energy_usage_directorsensor.coned_energy_usage_direct) - Save
This creates the necessary statistics_meta entry for the generate-stats command to work.
After fetching data, publish it to Home Assistant:
# Publish unpublished NYSEG data (default behavior)
gridscraper publish --service nyseg
# Publish to all services (nyseg and coned)
gridscraper publish
# Force republish ALL nyseg data (ignoring published flag)
gridscraper publish --service nyseg --all
# Publish data from the last 7 days
gridscraper publish --service nyseg --since 7d
# Publish specific date range
gridscraper publish --service nyseg --since 2024-01-01 --until 2024-01-31
# Publish limited number of records (for testing)
gridscraper publish --service nyseg --limit 10The publish command:
- By default, only publishes records that haven't been published yet (tracked in local database)
- Use
--allflag to force republish all records (ignoring published status) - Sends hourly kWh readings to Home Assistant with proper timestamps
- Marks each record as published after successful upload
- Subsequent runs without
--allare instant if there's no new data
Note: Home Assistant integration must be configured in config.yaml first (see Configuration section below).
After publishing data to Home Assistant, you need to generate statistics for the Energy dashboard:
# Generate statistics for NYSEG (default)
gridscraper generate-stats --service nyseg
# Generate statistics for Con Edison
gridscraper generate-stats --service coned
# Generate with manual cost rate (useful for new cost sensors)
gridscraper generate-stats --service nyseg --rate 0.20102749This command:
- Calls the AppDaemon
generate_statisticsandgenerate_cost_statisticsendpoints - Compiles hourly energy and cost statistics from individual consumption values
- Auto-calculates cost rate from existing data, or uses manual
--rateif provided - Populates the Home Assistant statistics tables for the Energy dashboard
- Reads entity_id and AppDaemon URL from your
config.yaml
Note: For new cost sensors, use the --rate flag to specify your cost per kWh. After the first backfill, future runs can auto-calculate the rate from existing statistics.
For automated daily data fetching and publishing, use the provided sync script and Makefile:
# Build and install the binary and sync script
sudo make installThis installs:
/usr/local/bin/gridscraper- the main binary/usr/local/bin/gridscraper-sync.sh- automated sync script (supports nyseg, coned, or both)
Copy your configuration and database to the standard location:
sudo mkdir -p /usr/local/etc/gridscraper
sudo cp config.yaml /usr/local/etc/gridscraper/config.yaml
sudo cp data.db /usr/local/etc/gridscraper/data.dbImportant: Update days_to_fetch in your production config.yaml:
# Global default (fallback: 90)
days_to_fetch: 90
# Service-specific overrides (optional)
nyseg_days_to_fetch: 15 # Fetch last 15 days for NYSEG
coned_days_to_fetch: 365 # Fetch last year for Con EdisonAdd to your crontab:
# Edit crontab
crontab -e
# Option 1: Sync all services (runs daily at 6 AM)
0 6 * * * /usr/local/bin/gridscraper-sync.sh >> /usr/local/etc/gridscraper/sync.log 2>&1
# Option 2: Sync specific services at different times
0 6 * * * /usr/local/bin/gridscraper-sync.sh nyseg >> /usr/local/etc/gridscraper/nyseg.log 2>&1
0 7 * * * /usr/local/bin/gridscraper-sync.sh coned >> /usr/local/etc/gridscraper/coned.log 2>&1The sync script automatically:
- Fetches new data from the specified utility (or all utilities if no argument)
- Publishes new records to the appropriate Home Assistant instance
- Generates statistics for the Energy dashboard
If data extraction fails, use debug mode:
# Open visible browser to inspect page
gridscraper debug nyseg --visible
# Save HTML to file for inspection
gridscraper debug nyseg --output output.htmlDefault: ./config.yaml (in current directory)
Override with: --config /path/to/config.yaml
Default: ./data.db (in current directory)
Override with: --db /path/to/data.db
Note: Both config.yaml and data.db are in .gitignore to avoid accidentally committing sensitive cookies or personal data.
See config.yaml.example for a complete configuration template with detailed comments explaining each field.
Key configuration sections:
- Fetch Settings: Control how many days of historical data to fetch (global and per-service)
- Authentication: NYSEG and Con Edison credentials (username/password)
- Home Assistant Integration: Separate configurations for NYSEG and Con Edison (can use different HA instances)
To get started:
cp config.yaml.example config.yaml
# Edit config.yaml with your credentials and settingsgridscraper/
├── cmd/gridscraper/ # CLI commands
│ ├── main.go # Entry point
│ ├── root.go # Root command & shared logic
│ ├── login.go # Login command
│ ├── fetch.go # Fetch command
│ ├── list.go # List command
│ ├── publish.go # Publish command (Home Assistant)
│ ├── generate_stats.go # Generate statistics command
│ └── debug.go # Debug command
├── internal/
│ ├── config/ # YAML config handling
│ │ └── config.go
│ ├── database/ # SQLite operations
│ │ └── db.go
│ ├── publisher/ # Home Assistant publishing
│ │ └── mqtt.go
│ └── scraper/ # Scraping logic
│ ├── browser.go # Cookie management
│ └── nyseg.go # NYSEG scraper
├── pkg/models/ # Data models
│ └── usage.go
├── scripts/
│ ├── appdaemon/ # AppDaemon scripts for Home Assistant
│ │ └── backfill_state.py
│ └── gridscraper-sync.sh # Automated daily sync script
├── Makefile # Build automation
├── go.mod
├── go.sum
└── README.md
- Go 1.24+
- Chrome/Chromium (for headless browser automation)
github.com/chromedp/chromedp- Browser automationgithub.com/spf13/cobra- CLI frameworkgopkg.in/yaml.v3- YAML config parsingmodernc.org/sqlite- Pure Go SQLite driver
# Build binary
make build
# Build and install to system
sudo make install
# Clean build artifacts
make cleango test ./...- Status: Implemented
- URL: https://energymanager.nyseg.com/insights
- Data: Hourly kWh usage (24 readings/day)
- Method: Direct API calls with browser authentication
- Home Assistant: Single instance support
- Status: Implemented
- URL: https://www.coned.com/en/accounts-billing/my-account/energy-use
- Data: 15-minute interval data (96 readings/day), aggregated to hourly
- Method: Browser automation with CSV export
- Authentication: Username, password, and security challenge question
- Home Assistant: Separate instance support (can use different HA server)
Run the login command first:
gridscraper login nyseg-
Use debug mode to inspect the page:
gridscraper debug nyseg --visible
-
Check if the page structure has changed
-
Save HTML and inspect selectors:
gridscraper debug nyseg --output page.html
Re-run the login command to refresh cookies:
gridscraper login nysegBSD 3-Clause License
Contributions welcome! Please open an issue or pull request.