Website Archiver
Archive entire websites for offline use with a modern GUI or CLI
| Feature | Description |
|---|---|
| 🔄 Recursive Crawling | Configurable depth-based website traversal |
| 🧠 Smart Discovery | Automatic sitemap.xml and robots.txt parsing |
| 📦 Asset Management | Download images, CSS, JS, fonts, videos, audio |
| 🔗 Link Rewriting | Converts URLs to local paths for offline navigation |
| 🍪 Cookie Import | Paste exported cookies JSON to access login-protected pages |
| 🔐 Interactive Login | Manual login via visible browser (Brave/Chrome/Chromium) |
| 📊 Real-time Progress | Live statistics via Socket.IO |
| 💾 Flexible Export | Output to folder or ZIP archive |
| 📋 URL Reports | Export urls.json, urls.csv, urls.txt with success/fail status |
| 🗺️ Sitemap Viewer | Interactive HTML tree view of all archived pages |
- Node.js 18 or higher
- npm (included with Node.js)
git clone https://github.com/zxcvresque/recurse.git
cd recurse
npm install
npm startThen open http://localhost:3000 in your browser.
npm startWorkflow:
- Enter the website URL
- Choose Quick Archive or Analyze First
- Configure depth, page limits, and asset options
- Click Start Archiving
- Download the archive from history
node src/cli.js <url> [options]Examples:
# Basic archive
node src/cli.js https://example.com
# Custom depth and output
node src/cli.js https://docs.example.com -d 5 -p 200 -o docs.zip
# With cookies for authenticated pages
node src/cli.js https://private.site.com --cookies cookies.json| Option | Default | Description |
|---|---|---|
-d, --depth <n> |
3 | Maximum crawl depth |
-p, --pages <n> |
50 | Maximum pages to download |
-o, --output <path> |
./archive | Output folder or .zip file |
-c, --cookies <file> |
- | Path to cookies JSON file |
--delay <ms> |
500 | Delay between requests |
--timeout <ms> |
30000 | Page load timeout |
--no-images |
- | Skip downloading images |
--no-css |
- | Skip downloading CSS |
--no-js |
- | Skip downloading JavaScript |
--visible |
- | Show browser window |
- Install a browser extension like EditThisCookie
- Log into the target website
- Export cookies as JSON
- CLI:
node src/cli.js https://site.com --cookies cookies.json - GUI: Paste JSON into "Import Cookies" in Advanced Options
- Enable Interactive Login in Advanced Options
- Select browser profile (Default/Chrome/Brave)
- A visible browser window opens at the target URL
- Log in manually (handle MFA, CAPTCHA, etc.)
- Click "I've Logged In" in Re/curse to continue
archive.zip
├── index.html # Auto-redirect to main page
├── sitemap.html # Interactive tree view of all pages
├── urls.json # Structured URL data (success/fail)
├── urls.csv # Spreadsheet format
├── urls.txt # Human-readable report
├── pages/ # All archived HTML pages
│ ├── index.html
│ └── blog/
│ └── post-1.html
└── assets/
├── images/
├── css/
└── js/
- JavaScript-heavy SPAs: Dynamic content loaded after page render may not be fully captured
- Infinite scroll: Content requiring scroll to load is not automatically triggered
- Login sessions: Some sites with advanced bot detection may block archiving
- Large files: Very large assets (videos, PDFs) may slow down or fail
- External domains: Only same-origin content is archived by default
- Forms & interactions: Interactive elements won't function in archived pages
- Streaming content: Live streams and real-time content cannot be archived
MIT License - See LICENSE file for details.
