| name | import-web-markdown-with-gather |
|---|---|
| description | Import web pages as clean markdown using the local gather CLI. Use when the user asks to fetch a URL as markdown, clip a page into notes, archive readable article text, or convert web content into markdown for context. |
Use gather as the default local tool for converting a URL into readable markdown.
Run gather with these settings unless the user asks otherwise:
gather --metadata-yaml --inline-links --no-paragraph-links "<url>"Rationale:
--metadata-yaml: Adds title/date/source in front matter for downstream indexing.--inline-links: Keeps links close to text for RAG/chunk readability.--no-paragraph-links: Avoids repeated reference blocks after each paragraph.
-
Validate input:
- Accept only
http://orhttps://URLs. - If input is not a URL, ask for one.
- Accept only
-
Run gather:
-
Primary command:
gather --metadata-yaml --inline-links --no-paragraph-links "<url>"
-
-
On failure, retry with fallback mode:
-
First fallback:
gather --metadata-yaml --inline-links --no-paragraph-links \ --no-readability "<url>" -
If the page still fails and raw HTML is available, pass HTML directly:
printf "%s" "$HTML" | gather --html --stdin --metadata-yaml \ --inline-links --no-paragraph-links
-
-
Return markdown text as the main result.
When successful, return:
url: original URLtitle: extracted title when availablemarkdown: full markdown bodyused_fallback:trueif--no-readabilityor--htmlpath was used
- Do not execute JavaScript from pages.
- Do not follow login-only pages automatically.
- Preserve the original URL in output metadata.
- If output is empty or too short, report a partial extraction warning.
Basic import:
gather --metadata-yaml --inline-links --no-paragraph-links "https://example.com/article"Fallback when readability extraction fails:
gather --metadata-yaml --inline-links --no-paragraph-links --no-readability "https://example.com/article"-
Add title only:
gather --title-only "<url>" -
Plain body without source/title injection:
gather --no-include-source --no-include-title "<url>"