A Progressive Web App (PWA) for downloading multiple arXiv papers as a single ZIP file.
Version: 0.1.2
- Paste & Parse: Paste tables from spreadsheets, markdown, or plain text containing arXiv links
- Smart Extraction: Automatically extracts arXiv IDs and associated paper titles, including Google redirect URLs
- Batch Download: Download selected papers as a single ZIP file
- CORS Proxy Support: Configurable proxy with fallback chain (includes Cloudflare Worker for reliable self-hosting)
- Offline Ready: PWA with service worker — install to your device for quick access
- Progress Tracking: Real-time console logging and per-paper status indicators
- Fallback Options: "Open Failed in Browser" button for manual download when proxies fail
- Visit the app (GitHub Pages or local)
- Paste a table containing arXiv links
- Click Parse Content
- Select papers to download
- Click Download Selected as ZIP
- Markdown tables with
|delimiters - Tab-separated values (copy from Excel/Google Sheets)
- HTML tables
- Plain text with arXiv URLs inline
The parser handles various URL formats including:
- Direct arXiv links:
arxiv.org/abs/2301.12345orarxiv.org/pdf/2301.12345.pdf - Google search redirects:
google.com/search?q=https://arxiv.org/pdf/... - ar5iv.org alternative domain
- Legacy arXiv format:
hep-th/9901001 - Versioned papers:
2301.12345v2
- Fork or clone this repository
- Enable GitHub Pages in repository settings (Settings → Pages)
- Set source to deploy from branch (main/master)
- Access at
https://yourusername.github.io/repo-name/
The app works correctly in subdirectories — no path configuration needed.
Open index.html directly, or serve with any static file server:
# Python
python -m http.server 8000
# Node.js
npx serve .
# PHP
php -S localhost:8000arXiv doesn't serve CORS headers, so requests must go through a proxy. The app includes fallback public proxies, but they're rate-limited and unreliable.
Deploy your own proxy in 5 minutes for reliable, fast downloads.
- Sign up at Cloudflare (free)
- Go to Workers & Pages → Create Application → Create Worker
- Name it (e.g.,
arxiv-proxy) - Click Edit Code and paste the contents of
cloudflare-worker/worker.js - Click Save and Deploy
- Copy your worker URL
In the app's Settings, set CORS Proxy URL to:
https://your-worker-name.your-subdomain.workers.dev/?url=
Note the ?url= at the end — this is required.
- Free tier: 100,000 requests/day
- Fast: Cloudflare's global edge network
- Secure: Only proxies arxiv.org domains
- Reliable: You control it
See cloudflare-worker/README.md for advanced configuration.
The app falls back to these if your configured proxy fails:
corsproxy.ioapi.allorigins.winapi.codetabs.com
These are rate-limited and may fail during heavy use.
Access settings via the ⚙ Settings panel:
| Setting | Default | Description |
|---|---|---|
| CORS Proxy URL | https://corsproxy.io/? |
Proxy endpoint (add your Cloudflare Worker here) |
| Request Delay | 1000ms | Delay between downloads to avoid rate limiting |
Settings persist in localStorage.
The app can be installed as a standalone application:
Chrome/Edge:
- Click the install icon in the address bar, or
- Menu → "Install arXiv Batch Downloader"
Safari (iOS):
- Share → "Add to Home Screen"
Firefox:
- Not supported (Firefox removed PWA support on desktop)
- Deploy your own Cloudflare Worker (see above)
- Try increasing the request delay in Settings
- Use "Open Failed in Browser" button for manual download
- Verify the arXiv ID is correct
- Some very old papers may not have PDFs available
- Ensure your proxy URL ends with
?url= - If using Cloudflare Worker, redeploy with the latest
worker.js
- Check DevTools → Application → Manifest for errors
- Ensure you're on HTTPS (required for PWA)
- Try unregistering old service workers and hard refresh
- Check the console for error messages
- Ensure at least one paper downloaded successfully
├── index.html # Main application (CSS/JS inlined)
├── manifest.json # PWA manifest
├── sw.js # Service worker for offline support
├── icon-192.png # App icon (192x192)
├── icon-512.png # App icon (512x512)
├── README.md # This file
└── cloudflare-worker/
├── worker.js # Cloudflare Worker proxy script
├── wrangler.toml # Wrangler CLI configuration
└── README.md # Worker deployment guide
- No analytics or tracking
- No data sent to external servers except:
- Your configured CORS proxy
- arxiv.org (to fetch PDFs)
- Paper lists are not persisted (only settings stored in localStorage)
- Works entirely client-side
| Browser | Supported | PWA Install |
|---|---|---|
| Chrome 80+ | ✅ | ✅ |
| Edge 80+ | ✅ | ✅ |
| Firefox 75+ | ✅ | ❌ |
| Safari 14+ | ✅ | ✅ (iOS) |
- CORS dependency: Requires a proxy for arXiv downloads
- Memory usage: Large batches (50+ papers) may consume significant memory
- No resume: Progress lost if browser tab is closed mid-download
- Title extraction: Titles come from pasted content, not arXiv metadata API
- Fixed PWA installation for subdirectory hosting (GitHub Pages)
- Fixed manifest and service worker paths
- Reduced console spam (Google redirect warning now shows once)
- Increased default request delay to 1000ms
- Added third fallback proxy (codetabs.com)
- Added exponential backoff on consecutive failures
- Added "Open Failed in Browser" button for manual fallback
- Added Cloudflare Worker for self-hosted proxy
- Initial release
- Markdown, TSV, HTML, and plain text parsing
- Google redirect URL extraction
- ZIP download with JSZip
- Console logging
- Offline support via service worker
MIT License — use freely for personal or commercial projects.