A command-line utility to save web pages from a sitemap to the Web Archive (archive.org).
- Parse standard XML sitemaps
- Save pages to Web Archive with full captures (screenshots, outlinks)
- Filter pages by last modification date
- Concurrent processing for faster archiving
Requires Go 1.22 or later:
go install github.com/iyaki/web-archiver/v2@latestDownload pre-built binaries from the Releases page.
This program requires Web Archive API credentials as environment variables:
WAYBACK_S3_ACCESS_KEY- Your Web Archive S3 access keyWAYBACK_S3_SECRET_KEY- Your Web Archive S3 secret key
Get your API keys from Web Archive S3-Like API.
web-archiver <sitemap_url> [<date>]sitemap_url(required): URL to the sitemap XML filedate(optional): Filter date in ISO format (YYYY-MM-DD). Only URLs withlastModnewer than this date will be saved.
Save all URLs from a sitemap:
web-archiver https://example.com/sitemap.xmlSave only URLs modified since a specific date:
web-archiver https://example.com/sitemap.xml 2024-05-01