A Playwright-based crawler that checks for JavaScript errors across all pages in the site.
When running in the Docker development environment, use the helper script:
# Start the dev environment
docker-compose -f docker-compose.dev.yml up -d
# Crawl the dev server (default)
./scripts/crawl-dev.sh
# Crawl with options
./scripts/crawl-dev.sh http://web-dev:5000 --verbose true
./scripts/crawl-dev.sh http://web-dev:5000 --log-skipped true
./scripts/crawl-dev.sh http://web-dev:5000 --max-depth 2For production or standalone crawling:
# Install dependencies
npm install
# Crawl production
node scripts/crawler.js https://lacunary.org
# Crawl local Docker container
node scripts/crawler.js http://host.docker.internal:5000--max-depth N- Maximum crawl depth (default: Infinity)--headless false- Show browser window (default: true)--verbose true- Show detailed logging (default: false)--log-skipped true- Log skipped URLs (default: false)
- Starts at the given URL
- Creates an isolated browser page for each URL
- Waits for page to load and stabilize
- Catches any JavaScript errors
- Harvests links and continues crawling
- Stops immediately on first error
In development:
web-dev:5000- Development server- The crawler runs in the same Docker network
URLs are served on port 5000
These are usually from the browser trying to load external resources. Check the IGNORE_PATTERNS in the script.
Chrome may abort loading very large JavaScript files. This is usually not a real error.
- Use
web-dev:5000when running inside Docker - Use
localhost:5000when running outside Docker - Use
host.docker.internal:5000from Docker to reach host