"Get that content over here... as markdown!" π¦
A Go CLI tool that fetches web pages and converts them to clean markdown, optimized for feeding into Agents/LLMs.
- Clean Content Extraction: Uses Mozilla's Readability algorithm to extract main article content
- Metadata Extraction: Captures title, author, publish date, and description
- Smart Caching: Local file-based caching with 24-hour TTL at
~/.config/md-over-here/cache - Multiple URLs: Process multiple URLs in a single command
- Flexible Output: Write to stdout or file
- Robust Error Handling: Partial success - continues processing even if some URLs fail
- Interface-Based Architecture: Designed for future extensibility (headless Chrome support planned)
brew install EstebanForge/tap/md-over-heregit clone https://github.com/EstebanForge/md-over-here
cd md-over-here
make buildOr using Go directly:
go build -o md-over-here ./cmd/md-over-heremake installOr using Go directly:
go install ./cmd/md-over-hereOn first run, the tool automatically creates a symlink at ~/.local/bin/mdoh, allowing you to use the shorter mdoh command instead of md-over-here. This works for both Homebrew and manual installations.
Make sure ~/.local/bin is in your PATH. If it's not already, add this to your shell profile (~/.bashrc, ~/.zshrc, etc.):
export PATH="$HOME/.local/bin:$PATH"# Run all development checks (format, lint, test, build)
make dev
# Run individual commands
make test # Run tests
make lint # Run linter
make fmt # Format code
make test-coverage # Run tests with coverage
# See all available commands
make helpNote: After first run, you can use
mdohas a shorthand formd-over-here.
# Single URL (outputs to stdout - for agents/LLMs)
md-over-here https://example.com/article
# Save to file (single or multiple URLs combined)
md-over-here --save article.md https://example.com/article
# Multiple URLs to stdout
md-over-here https://example.com/article-1 https://example.com/article-2
# Save multiple URLs to one file
md-over-here --save combined.md https://example.com/article-1 https://example.com/article-2
# Bypass cache
md-over-here --no-cache https://example.com/article
# Verbose mode
md-over-here -v https://example.com/article
# Cache management
md-over-here cache stats # Show cache statistics
md-over-here cache clear # Clear all cached content| Flag | Description |
|---|---|
-s, --save <file> |
Save to file (combines multiple URLs with separators) |
--no-cache |
Disable caching for this request |
--cache-dir <path> |
Custom cache directory (default: ~/.config/md-over-here/cache) |
-v, --verbose |
Show metadata and cache status |
--timeout <duration> |
HTTP timeout (default: 30s) |
--user-agent <string> |
Custom User-Agent header |
-h, --help |
Show help message |
| Command | Description |
|---|---|
md-over-here cache stats |
Display cache statistics (entries, size, location) |
md-over-here cache clear |
Remove all cached content |
Both cache subcommands support the --cache-dir flag to specify a custom cache directory.
# Article Title
**URL:** https://example.com/article
**Author:** John Doe
**Published:** 2025-01-15
**Description:** Article description here
---
[Clean article content in markdown...]
---
<!-- Fetched: 2026-01-03T16:47:00Z -->When processing multiple URLs, articles are separated by:
---
## Next Article
---Cached content is stored at ~/.config/md-over-here/cache/ by default.
~/.config/md-over-here/
βββ cache/
β βββ <sha256-hash>.json
β βββ <sha256-hash>.json
βββ config.toml (future: user preferences)
Each cached entry is stored as a JSON file:
{
"url": "https://example.com/article",
"fetchedAt": "2026-01-03T16:46:53Z",
"markdown": "# Article Title\n\n...",
"metadata": {
"Title": "Article Title",
"Author": "John Doe",
"PublishDate": "2025-01-15",
"Description": "Article description"
}
}- URLs are normalized (lowercase scheme/host, sorted query params, no fragment)
- SHA256 hash of normalized URL is used as cache filename
- TTL: 24 hours based on file modification time
# Show cache statistics
md-over-here cache stats
# Clear all cached content
md-over-here cache clear
# Use custom cache directory
md-over-here cache stats --cache-dir /custom/path
md-over-here cache clear --cache-dir /custom/path# Single URL
md-over-here https://example.com/article
# Multiple URLs
md-over-here https://example.com/article-1 https://example.com/article-2# Single URL to file
md-over-here --save article.md https://example.com/article
# Multiple URLs combined to one file
md-over-here --save research.md \
https://blog.example.com/post-1 \
https://blog.example.com/post-2 \
https://blog.example.com/post-3# Parent directories are created automatically
md-over-here --save articles/2025/article.md https://example.com/article
# Absolute paths work too
md-over-here --save /path/to/docs/article.md https://example.com/articlemd-over-here --timeout 60s https://slow-site.com/articlemd-over-here --no-cache https://news.example.com/breaking-story- Interface-based Fetcher: Designed to support future Chrome/headless browser backend
- Graceful Degradation: Falls back to full HTML if content extraction fails
- Partial Success: Processes all URLs even if some fail
- Simple Caching: JSON files for debuggability and simplicity
- github.com/spf13/cobra - CLI framework
- github.com/JohannesKaufmann/html-to-markdown - HTML to Markdown conversion
- codeberg.org/readeck/go-readability/v2 - Content extraction
- Headless Chrome Support:
--use-chromeflag for JS-heavy sites (SPAs, lazy-loaded content) - Parallel Processing:
--parallelflag for faster batch operations
- Image downloading and embedding
- Rate limiting for politeness
Error: dial tcp: lookup example.com: no such host
Solution: Check network connectivity and DNS resolution
Error: HTTP 404: 404 Not Found
Solution: Verify URL is correct and accessible. Tool continues with other URLs in batch.
If Readability extraction fails, the tool falls back to converting the full HTML page to markdown.
If cache directory creation fails, the tool continues without caching and shows a warning in verbose mode.
See CHANGELOG.md for a list of changes in each version.
MIT License - See LICENSE file for details