Markdowner Local 🔖

A fully local, Docker-based tool to convert any website into LLM-ready markdown. Fork of supermemoryai/markdowner, refactored to run locally with Puppeteer and optional Bright Data proxy integration.

Features

Two extraction methods:
- html - Direct HTTP fetch (fast, for server-side rendered pages)
- hydration - Full browser rendering with Puppeteer (for JavaScript-heavy SPAs)
Bright Data proxy integration - Rotate IPs for scraping at scale
Local file-based caching - Avoid redundant fetches
Subpage crawling - Recursively convert up to 10 linked pages
Docker-ready - Easy deployment with Docker Compose

Quick Start

Using Docker Compose (Recommended)

Clone and configure:

git clone https://github.com/your-repo/markdowner.git
cd markdowner
cp env.example .env
# Edit .env with your settings (proxy is optional)

Build and run:
```
docker-compose up -d
```

Convert a URL:

# Simple conversion (hydration method - renders JavaScript)
curl "http://localhost:3000/convert?url=https://example.com"

# Fast conversion (html method - direct fetch, for SSR pages)
curl "http://localhost:3000/convert?url=https://example.com&method=html"

Local Development

Install dependencies:
```
npm install
```
Run in development mode:
```
npm run dev
```
Build and run:
```
npm run build
npm start
```

API Reference

`GET /convert`

Convert a URL to markdown.

Parameter	Type	Default	Description
`url`	string	required	The website URL to convert
`method`	`'html'` \| `'hydration'`	`'hydration'`	Extraction method
`enableDetailedResponse`	boolean	`false`	Include full page content instead of article extraction
`crawlSubpages`	boolean	`false`	Also convert linked subpages (max 10)
`useProxy`	boolean	`false`	Use Bright Data proxy for requests

Response Headers:

Accept: application/json → Returns JSON with metadata
Accept: text/plain (default) → Returns raw markdown

Examples:

# Fast SSR conversion (no browser needed)
curl "http://localhost:3000/convert?url=https://example.com&method=html"

# Full page with JSON response
curl -H "Accept: application/json" \
  "http://localhost:3000/convert?url=https://example.com&enableDetailedResponse=true"

# With Bright Data proxy (requires configuration)
curl "http://localhost:3000/convert?url=https://example.com&useProxy=true"

# Crawl subpages (returns JSON array)
curl "http://localhost:3000/convert?url=https://example.com&crawlSubpages=true"

`GET /health`

Health check endpoint.

curl http://localhost:3000/health
# {"status":"ok","timestamp":"2026-01-10T..."}

`GET /cache/stats`

Get cache statistics.

curl http://localhost:3000/cache/stats
# {"entries":42,"sizeBytes":125000}

`DELETE /cache`

Clear all cached entries.

curl -X DELETE http://localhost:3000/cache
# {"cleared":42,"message":"Cleared 42 cache entries"}

Extraction Methods

`html` (Fast)

Best for: Server-side rendered pages, blogs, documentation sites
How it works: Direct HTTP fetch using axios
Speed: Very fast (~100-500ms)
Limitations: Won't capture JavaScript-rendered content

`hydration` (Full Rendering)

Best for: SPAs, JavaScript-heavy sites, dynamic content
How it works: Full Chromium browser rendering via Puppeteer
Speed: Slower (~2-10s depending on page complexity)
Capabilities: Captures all dynamically loaded content

Bright Data Proxy Configuration

For scraping at scale or bypassing geo-restrictions, configure Bright Data:

Get credentials from Bright Data Dashboard

Set environment variables:

BRIGHTDATA_USERNAME=your-zone-username
BRIGHTDATA_PASSWORD=your-zone-password
BRIGHTDATA_PROXY=brd.superproxy.io:22225

Use the proxy:

curl "http://localhost:3000/convert?url=https://example.com&useProxy=true"

The proxy uses session rotation (session-rand{N}) for automatic IP rotation on each request.

Environment Variables

Variable	Default	Description
`PORT`	`3000`	Server port
`CACHE_ENABLED`	`true`	Enable file-based caching
`CACHE_TTL_SECONDS`	`3600`	Cache TTL (1 hour)
`CACHE_DIR`	`./cache`	Cache directory
`BROWSER_HEADLESS`	`true`	Run browser in headless mode
`BROWSER_TIMEOUT`	`30000`	Page load timeout (ms)
`RATE_LIMIT_WINDOW_MS`	`60000`	Rate limit window (1 minute)
`RATE_LIMIT_MAX`	`30`	Max requests per window
`BRIGHTDATA_USERNAME`	-	Bright Data username
`BRIGHTDATA_PASSWORD`	-	Bright Data password
`BRIGHTDATA_PROXY`	-	Bright Data proxy host:port

Docker Commands

# Build the image
docker build -t markdowner .

# Run with environment file
docker run -p 3000:3000 --env-file .env markdowner

# Run with Docker Compose
docker-compose up -d

# View logs
docker-compose logs -f

# Stop
docker-compose down

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Express Server                          │
│                    (src/server.ts)                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐      │
│  │   /convert  │    │   /health   │    │   /cache    │      │
│  └──────┬──────┘    └─────────────┘    └─────────────┘      │
│         │                                                   │
│         ▼                                                   │
│  ┌─────────────────────────────────────────────────────┐    │
│  │              Converter (src/converter.ts)           │    │
│  ├──────────────────────┬──────────────────────────────┤    │
│  │   method='html'      │   method='hydration'         │    │
│  │   ┌────────────┐     │   ┌────────────────────┐     │    │
│  │   │   Axios    │     │   │     Puppeteer      │     │    │
│  │   │  (HTTP)    │     │   │  (Chromium browser)│     │    │
│  │   └────────────┘     │   └────────────────────┘     │    │
│  └──────────────────────┴──────────────────────────────┘    │
│         │                         │                         │
│         ▼                         ▼                         │
│  ┌─────────────────────────────────────────────────────┐    │
│  │        Readability + Turndown (HTML → Markdown)     │    │
│  └─────────────────────────────────────────────────────┘    │
│         │                                                   │
│         ▼                                                   │
│  ┌─────────────────────────────────────────────────────┐    │
│  │              File Cache (src/cache.ts)              │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
docker		docker
src		src
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
.prettierrc		.prettierrc
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
code_of_conduct.md		code_of_conduct.md
docker-compose.yml.example		docker-compose.yml.example
env.example		env.example
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Markdowner Local 🔖

Features

Quick Start

Using Docker Compose (Recommended)

Local Development

API Reference

`GET /convert`

`GET /health`

`GET /cache/stats`

`DELETE /cache`

Extraction Methods

`html` (Fast)

`hydration` (Full Rendering)

Bright Data Proxy Configuration

Environment Variables

Docker Commands

Architecture

License

About

Uh oh!

Releases

Packages

Languages

License

HireBase-1/markdowner

Folders and files

Latest commit

History

Repository files navigation

Markdowner Local 🔖

Features

Quick Start

Using Docker Compose (Recommended)

Local Development

API Reference

GET /convert

GET /health

GET /cache/stats

DELETE /cache

Extraction Methods

html (Fast)

hydration (Full Rendering)

Bright Data Proxy Configuration

Environment Variables

Docker Commands

Architecture

License

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`GET /convert`

`GET /health`

`GET /cache/stats`

`DELETE /cache`

`html` (Fast)

`hydration` (Full Rendering)

Packages