Re/curse

Website Archiver

Archive entire websites for offline use with a modern GUI or CLI

✨ Features

Feature	Description
🔄 Recursive Crawling	Configurable depth-based website traversal
🧠 Smart Discovery	Automatic sitemap.xml and robots.txt parsing
📦 Asset Management	Download images, CSS, JS, fonts, videos, audio
🔗 Link Rewriting	Converts URLs to local paths for offline navigation
🍪 Cookie Import	Paste exported cookies JSON to access login-protected pages
🔐 Interactive Login	Manual login via visible browser (Brave/Chrome/Chromium)
📊 Real-time Progress	Live statistics via Socket.IO
💾 Flexible Export	Output to folder or ZIP archive
📋 URL Reports	Export urls.json, urls.csv, urls.txt with success/fail status
🗺️ Sitemap Viewer	Interactive HTML tree view of all archived pages

📥 Installation

Prerequisites

Node.js 18 or higher
npm (included with Node.js)

Quick Start

git clone https://github.com/zxcvresque/recurse.git
cd recurse
npm install
npm start

Then open http://localhost:3000 in your browser.

🚀 Usage

Web GUI (Recommended)

npm start

Workflow:

Enter the website URL
Choose Quick Archive or Analyze First
Configure depth, page limits, and asset options
Click Start Archiving
Download the archive from history

CLI

node src/cli.js <url> [options]

Examples:

# Basic archive
node src/cli.js https://example.com

# Custom depth and output
node src/cli.js https://docs.example.com -d 5 -p 200 -o docs.zip

# With cookies for authenticated pages
node src/cli.js https://private.site.com --cookies cookies.json

CLI Options

Option	Default	Description
`-d, --depth <n>`	3	Maximum crawl depth
`-p, --pages <n>`	50	Maximum pages to download
`-o, --output <path>`	./archive	Output folder or .zip file
`-c, --cookies <file>`	-	Path to cookies JSON file
`--delay <ms>`	500	Delay between requests
`--timeout <ms>`	30000	Page load timeout
`--no-images`	-	Skip downloading images
`--no-css`	-	Skip downloading CSS
`--no-js`	-	Skip downloading JavaScript
`--visible`	-	Show browser window

🔐 Authentication

Method 1: Cookie Import (Headless)

Install a browser extension like EditThisCookie
Log into the target website
Export cookies as JSON
CLI: node src/cli.js https://site.com --cookies cookies.json
GUI: Paste JSON into "Import Cookies" in Advanced Options

Method 2: Interactive Login (Visual)

Enable Interactive Login in Advanced Options
Select browser profile (Default/Chrome/Brave)
A visible browser window opens at the target URL
Log in manually (handle MFA, CAPTCHA, etc.)
Click "I've Logged In" in Re/curse to continue

📂 Output Structure

archive.zip
├── index.html          # Auto-redirect to main page
├── sitemap.html        # Interactive tree view of all pages
├── urls.json           # Structured URL data (success/fail)
├── urls.csv            # Spreadsheet format
├── urls.txt            # Human-readable report
├── pages/              # All archived HTML pages
│   ├── index.html
│   └── blog/
│       └── post-1.html
└── assets/
    ├── images/
    ├── css/
    └── js/

⚠️ Limitations

JavaScript-heavy SPAs: Dynamic content loaded after page render may not be fully captured
Infinite scroll: Content requiring scroll to load is not automatically triggered
Login sessions: Some sites with advanced bot detection may block archiving
Large files: Very large assets (videos, PDFs) may slow down or fail
External domains: Only same-origin content is archived by default
Forms & interactions: Interactive elements won't function in archived pages
Streaming content: Live streams and real-time content cannot be archived

📄 License

MIT License - See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
recurse-logo.png		recurse-logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Re/curse

✨ Features

📥 Installation

Prerequisites

Quick Start

🚀 Usage

Web GUI (Recommended)

CLI

CLI Options

🔐 Authentication

Method 1: Cookie Import (Headless)

Method 2: Interactive Login (Visual)

📂 Output Structure

⚠️ Limitations

📄 License

About

Uh oh!

Languages

zxcvresque/recurse

Folders and files

Latest commit

History

Repository files navigation

Re/curse

✨ Features

📥 Installation

Prerequisites

Quick Start

🚀 Usage

Web GUI (Recommended)

CLI

CLI Options

🔐 Authentication

Method 1: Cookie Import (Headless)

Method 2: Interactive Login (Visual)

📂 Output Structure

⚠️ Limitations

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages