🚀 OpenFetcher

OpenFetcher is a high-performance, LLM-native crawling engine that transforms entire domains into high-fidelity, semantic Markdown with embedded visual context.

Engineered for the RAG era, it delivers lightning-fast, structured data streams optimized for agentic workflows and massive-scale knowledge ingestion.

✨ Features

Deep Semantic Crawling
Recursively discovers and maps entire domain structures using intelligent sitemap parsing and link extraction.
LLM-Ready Markdown
Converts complex HTML into clean, structured Markdown optimized for RAG context windows.
Visual Context Preservation
Captures high-fidelity images and preserves their semantic relationship with surrounding text.
Real-Time Telemetry
Provides per-page execution timing with live NDJSON streaming output.
Parallel Architecture
Orchestrates concurrent headless browsers for rapid, large-scale extraction.

🏗️ Architecture

graph TD
    A[Start URL] --> B[OpenFetcher Engine]
    B --> C{Discovery Phase}
    C -->|Sitemaps| D[Crawl Queue]
    C -->|Link Extraction| D
    D --> E[Parallel Worker Pool]
    E -->|Selenium Headless| F[Content Extraction]
    F -->|Markdownify| G[LLM-Ready Markdown]
    G --> H[Streaming Response / Webhook]

🚀 Getting Started

1️⃣ Prerequisites

Python 3.10+
Google Chrome + Chromedriver
Recommended: 8GB+ RAM for high-concurrency local runs

2️⃣ Local Installation

git clone https://github.com/PremChaurasiya07/OpenFetcher.git
cd OpenFetcher

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

pip install -r requirements.txt

3️⃣ Run Locally

uvicorn main:app --host 0.0.0.0 --port 3003

⚙️ Increasing Limits (Local Power Users)

Edit the following in scraper_engine.py:

Constant	Render	Local
MAX_CONCURRENT_BROWSERS	1–5	10–15
PAGE_LIMIT	15	100–500
time.sleep	2.0s	0.5–1.0s

📡 API Usage

Endpoint

POST /scrape

Payload

{
  "url": "https://supermemory.ai"
}

📄 License

MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
Screen-Recording-ezgif.com-optimize.gif		Screen-Recording-ezgif.com-optimize.gif
index.html		index.html
main.py		main.py
requirements.txt		requirements.txt
scraper_engine.py		scraper_engine.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 OpenFetcher

✨ Features

🏗️ Architecture

🚀 Getting Started

1️⃣ Prerequisites

2️⃣ Local Installation

3️⃣ Run Locally

⚙️ Increasing Limits (Local Power Users)

📡 API Usage

Endpoint

Payload

📄 License

About

Uh oh!

Releases

Packages

Languages

PremChaurasiya07/OpenFetcher

Folders and files

Latest commit

History

Repository files navigation

🚀 OpenFetcher

✨ Features

🏗️ Architecture

🚀 Getting Started

1️⃣ Prerequisites

2️⃣ Local Installation

3️⃣ Run Locally

⚙️ Increasing Limits (Local Power Users)

📡 API Usage

Endpoint

Payload

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages