A fully local, Docker-based tool to convert any website into LLM-ready markdown. Fork of supermemoryai/markdowner, refactored to run locally with Puppeteer and optional Bright Data proxy integration.
- Two extraction methods:
html- Direct HTTP fetch (fast, for server-side rendered pages)hydration- Full browser rendering with Puppeteer (for JavaScript-heavy SPAs)
- Bright Data proxy integration - Rotate IPs for scraping at scale
- Local file-based caching - Avoid redundant fetches
- Subpage crawling - Recursively convert up to 10 linked pages
- Docker-ready - Easy deployment with Docker Compose
-
Clone and configure:
git clone https://github.com/your-repo/markdowner.git cd markdowner cp env.example .env # Edit .env with your settings (proxy is optional)
-
Build and run:
docker-compose up -d
-
Convert a URL:
# Simple conversion (hydration method - renders JavaScript) curl "http://localhost:3000/convert?url=https://example.com" # Fast conversion (html method - direct fetch, for SSR pages) curl "http://localhost:3000/convert?url=https://example.com&method=html"
-
Install dependencies:
npm install
-
Run in development mode:
npm run dev
-
Build and run:
npm run build npm start
Convert a URL to markdown.
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
string | required | The website URL to convert |
method |
'html' | 'hydration' |
'hydration' |
Extraction method |
enableDetailedResponse |
boolean | false |
Include full page content instead of article extraction |
crawlSubpages |
boolean | false |
Also convert linked subpages (max 10) |
useProxy |
boolean | false |
Use Bright Data proxy for requests |
Response Headers:
Accept: application/jsonβ Returns JSON with metadataAccept: text/plain(default) β Returns raw markdown
Examples:
# Fast SSR conversion (no browser needed)
curl "http://localhost:3000/convert?url=https://example.com&method=html"
# Full page with JSON response
curl -H "Accept: application/json" \
"http://localhost:3000/convert?url=https://example.com&enableDetailedResponse=true"
# With Bright Data proxy (requires configuration)
curl "http://localhost:3000/convert?url=https://example.com&useProxy=true"
# Crawl subpages (returns JSON array)
curl "http://localhost:3000/convert?url=https://example.com&crawlSubpages=true"Health check endpoint.
curl http://localhost:3000/health
# {"status":"ok","timestamp":"2026-01-10T..."}Get cache statistics.
curl http://localhost:3000/cache/stats
# {"entries":42,"sizeBytes":125000}Clear all cached entries.
curl -X DELETE http://localhost:3000/cache
# {"cleared":42,"message":"Cleared 42 cache entries"}- Best for: Server-side rendered pages, blogs, documentation sites
- How it works: Direct HTTP fetch using axios
- Speed: Very fast (~100-500ms)
- Limitations: Won't capture JavaScript-rendered content
- Best for: SPAs, JavaScript-heavy sites, dynamic content
- How it works: Full Chromium browser rendering via Puppeteer
- Speed: Slower (~2-10s depending on page complexity)
- Capabilities: Captures all dynamically loaded content
For scraping at scale or bypassing geo-restrictions, configure Bright Data:
-
Get credentials from Bright Data Dashboard
-
Set environment variables:
BRIGHTDATA_USERNAME=your-zone-username BRIGHTDATA_PASSWORD=your-zone-password BRIGHTDATA_PROXY=brd.superproxy.io:22225
-
Use the proxy:
curl "http://localhost:3000/convert?url=https://example.com&useProxy=true"
The proxy uses session rotation (session-rand{N}) for automatic IP rotation on each request.
| Variable | Default | Description |
|---|---|---|
PORT |
3000 |
Server port |
CACHE_ENABLED |
true |
Enable file-based caching |
CACHE_TTL_SECONDS |
3600 |
Cache TTL (1 hour) |
CACHE_DIR |
./cache |
Cache directory |
BROWSER_HEADLESS |
true |
Run browser in headless mode |
BROWSER_TIMEOUT |
30000 |
Page load timeout (ms) |
RATE_LIMIT_WINDOW_MS |
60000 |
Rate limit window (1 minute) |
RATE_LIMIT_MAX |
30 |
Max requests per window |
BRIGHTDATA_USERNAME |
- | Bright Data username |
BRIGHTDATA_PASSWORD |
- | Bright Data password |
BRIGHTDATA_PROXY |
- | Bright Data proxy host:port |
# Build the image
docker build -t markdowner .
# Run with environment file
docker run -p 3000:3000 --env-file .env markdowner
# Run with Docker Compose
docker-compose up -d
# View logs
docker-compose logs -f
# Stop
docker-compose downβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Express Server β
β (src/server.ts) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β /convert β β /health β β /cache β β
β ββββββββ¬βββββββ βββββββββββββββ βββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Converter (src/converter.ts) β β
β ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββ€ β
β β method='html' β method='hydration' β β
β β ββββββββββββββ β ββββββββββββββββββββββ β β
β β β Axios β β β Puppeteer β β β
β β β (HTTP) β β β (Chromium browser)β β β
β β ββββββββββββββ β ββββββββββββββββββββββ β β
β ββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββ β
β β β β
β βΌ βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Readability + Turndown (HTML β Markdown) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β File Cache (src/cache.ts) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
MIT