URL Fetcher MCP Server

An internal MCP server for Savant that provides web content fetching capabilities with support for both static and JavaScript-rendered content.

Features

Dual Engine Architecture:
- HTTP Engine: Fast fetching for static content using standard HTTP
- Chrome Engine: Full browser rendering for JavaScript-heavy sites
Multiple Output Formats:
- Text: Clean text extraction (default)
- HTML: Cleaned HTML with dangerous elements removed
- Markdown: Converted markdown format
Smart Features:
- In-memory caching with configurable TTL
- Chrome browser pool for performance
- Smart wait strategies for dynamic content
- Security features (SSRF protection, content size limits)

Installation

Prerequisites

Go 1.21 or later - Install Go
Chrome/Chromium (optional) - For JavaScript-rendered content
- macOS: brew install --cask google-chrome
- Ubuntu: sudo apt install chromium-browser
- Or any Chromium-based browser

Build from Source

# Clone the repository (if not using as submodule)
git clone https://github.com/gomcpgo/url_fetcher.git
cd url_fetcher

# Build the server
./run.sh build

# Test the installation
./run.sh test

Pre-built Binary

Download the latest binary from the releases page or build locally using the steps above.

Configuration

Configure via environment variables:

Variable	Default	Description
`FETCH_URL_BLOCK_LOCAL`	`true`	Block requests to local/private IPs
`FETCH_URL_CHROME_POOL_SIZE`	`3`	Number of Chrome instances in pool
`FETCH_URL_CACHE_TTL`	`3600`	Cache TTL in seconds (1 hour)
`FETCH_URL_TIMEOUT`	`30`	Request timeout in seconds

Usage

Running the Server

The run script provides multiple commands for different workflows:

# Build the server binary
./run.sh build

# Run the server
./run.sh run

# Test with sample URLs
./run.sh test

# Development mode with auto-restart
./run.sh dev

# Show version information
./run.sh version

# Show all available commands
./run.sh help

Testing Commands

# Unit tests only (fast)
./run.sh test-unit

# Full test suite including real websites
./run.sh test-full

# Comprehensive test suite with reporting
./run.sh test-suite

# Clean build artifacts
./run.sh clean

Configuration Examples

# Run with custom configuration
FETCH_URL_BLOCK_LOCAL=false ./run.sh run
FETCH_URL_CHROME_POOL_SIZE=5 ./run.sh test
FETCH_URL_CACHE_TTL=1800 ./run.sh run

MCP Tool Interface

fetch_url

Fetches content from a URL with various options.

Parameters:

url (required): URL to fetch
engine: "http" (default) or "chrome"
format: "text" (default), "html", or "markdown"
max_content_length: Maximum content length in bytes (default: 10MB)

Example Request:

{
  "tool": "fetch_url",
  "arguments": {
    "url": "https://example.com",
    "engine": "chrome",
    "format": "markdown"
  }
}

Example Response:

{
  "url": "https://example.com",
  "engine": "chrome",
  "status_code": 200,
  "content_type": "text/html",
  "content": "# Example Domain\n\nThis domain is for use in illustrative examples...",
  "format": "markdown",
  "title": "Example Domain",
  "fetch_time_ms": 1234,
  "chrome_available": true
}

Integration with MCP Clients

Claude Desktop

To use the URL Fetcher with Claude Desktop, add this configuration to your claude_desktop_config.json:

Config file locations:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "url-fetcher": {
      "command": "/path/to/url_fetcher/bin/url-fetcher",
      "env": {
        "FETCH_URL_BLOCK_LOCAL": "true",
        "FETCH_URL_CHROME_POOL_SIZE": "3",
        "FETCH_URL_CACHE_TTL": "3600",
        "FETCH_URL_TIMEOUT": "30"
      }
    }
  }
}

Configuration Examples

Basic setup (HTTP only):

{
  "mcpServers": {
    "url-fetcher": {
      "command": "/path/to/url_fetcher/bin/url-fetcher"
    }
  }
}

Performance optimized:

{
  "mcpServers": {
    "url-fetcher": {
      "command": "/path/to/url_fetcher/bin/url-fetcher",
      "env": {
        "FETCH_URL_CHROME_POOL_SIZE": "5",
        "FETCH_URL_CACHE_TTL": "7200",
        "FETCH_URL_TIMEOUT": "45"
      }
    }
  }
}

Development/testing setup:

{
  "mcpServers": {
    "url-fetcher": {
      "command": "/path/to/url_fetcher/bin/url-fetcher",
      "env": {
        "FETCH_URL_BLOCK_LOCAL": "false",
        "FETCH_URL_CACHE_TTL": "300"
      }
    }
  }
}

Other MCP Clients

For other MCP clients, use the following connection details:

Server Command: ./bin/url-fetcher or go run cmd/main.go
Protocol: Model Context Protocol (MCP) over stdio
Tool Available: fetch_url

Usage in Conversations

Once configured, you can use the URL fetcher in your conversations:

"Fetch the content from https://example.com and summarize it"

"Get the latest documentation from https://golang.org/doc/ in markdown format"

"Use the Chrome engine to fetch https://example-spa.com since it requires JavaScript"

The assistant will automatically use the fetch_url tool with appropriate parameters.

Troubleshooting

Server not starting:

Check that the binary path in config is correct and absolute
Ensure the binary has execute permissions: chmod +x bin/url-fetcher
Test the server manually: ./bin/url-fetcher -version

Chrome engine not working:

Verify Chrome/Chromium is installed: google-chrome --version
Check Chrome pool size isn't too large for your system
The server will automatically fall back to HTTP engine if Chrome is unavailable

Slow performance:

Increase Chrome pool size: FETCH_URL_CHROME_POOL_SIZE=5
Adjust cache TTL for your use case: FETCH_URL_CACHE_TTL=7200
Use HTTP engine for static content instead of Chrome

Security issues:

Enable local IP blocking: FETCH_URL_BLOCK_LOCAL=true
Reduce timeout for faster failure: FETCH_URL_TIMEOUT=15
Lower content size limits if needed

Engine Details

HTTP Engine

Automatic retry mechanism for server errors (5xx status codes)
Compression support (gzip, deflate, br)
Configurable timeout and security validation
Falls back gracefully when sites block HTTP requests

Chrome Engine

Automatically detects Chrome/Chromium availability
Falls back to HTTP engine if Chrome is not available
Blocks unnecessary resources (images, fonts, CSS) for performance
Uses smart wait strategy:
- Waits for network idle (500ms)
- Waits for DOM stability (500ms)
- Maximum wait time: 15 seconds

Security Features

URL validation prevents SSRF attacks
Configurable blocking of local/private IPs
Content size limits (default 10MB)
No cookie/session persistence
Safe default headers

Development

Running Tests

The URL Fetcher includes comprehensive test suites for different scenarios:

Quick Tests (Unit Tests Only)

go test ./test/... -v -short

Full Test Suite (Including Real Websites)

go test ./test/... -v

Comprehensive Test Suite

./test_suite.sh

Individual Test Categories

Basic functionality:

go test ./test/... -v -run="TestHTTPEngine|TestContentProcessor|TestCache"

Real website integration:

go test ./test/... -v -run="TestRealWebsiteIntegration"

Format conversion testing:

go test ./test/... -v -run="TestFormatConversion"

Chrome engine testing:

go test ./test/... -v -run="TestChrome"

Test Mode (Interactive Testing)

go run cmd/main.go -test

This runs predefined test cases against real websites and shows the formatted output.

Test Coverage

The test suite covers:

✅ HTTP Engine: Static content fetching with validation and security
✅ Chrome Engine: JavaScript-rendered content with browser pool
✅ Content Processing: Text extraction, HTML cleaning, Markdown conversion
✅ Format Conversion: All three output formats (text, HTML, markdown)
✅ Real Websites: Wikipedia, GitHub, Hacker News, MDN, RFC documents
✅ Security: URL validation, SSRF protection, content size limits
✅ Performance: Caching, concurrent requests, timeout handling
✅ Configuration: All environment variables and settings
✅ Error Handling: Network errors, invalid URLs, Chrome fallback

Test Websites Used

Wikipedia: Rich content with complex HTML structure
GitHub: Developer platform with modern web technologies
Hacker News: News aggregation with simple HTML
MDN Web Docs: Technical documentation with detailed content
RFC Documents: Plain text technical specifications
Example.com: Basic HTML for baseline testing

Project Structure

url_fetcher/
├── cmd/
│   └── main.go              # MCP server implementation
├── pkg/
│   ├── cache/               # In-memory caching
│   ├── config/              # Configuration management
│   ├── fetcher/             # HTTP and Chrome engines
│   ├── processor/           # Content processing (text, HTML, markdown)
│   └── types/               # Common types and constants
└── test/                    # Integration tests

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
cmd		cmd
pkg		pkg
test		test
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
go.sum		go.sum
run.sh		run.sh
test_suite.sh		test_suite.sh

gomcpgo/url_fetcher

Folders and files

Latest commit

History

Repository files navigation

URL Fetcher MCP Server

Features

Installation

Prerequisites

Build from Source

Pre-built Binary

Configuration

Usage

Running the Server

Testing Commands

Configuration Examples

MCP Tool Interface

fetch_url

Integration with MCP Clients

Claude Desktop

Configuration Examples

Other MCP Clients

Usage in Conversations

Troubleshooting

Engine Details

HTTP Engine

Chrome Engine

Security Features

Development

Running Tests

Quick Tests (Unit Tests Only)

Full Test Suite (Including Real Websites)

Comprehensive Test Suite

Individual Test Categories

Test Mode (Interactive Testing)

Test Coverage

Test Websites Used

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages