feat(proxy): Markdown for Agents — Accept: text/markdown content negotiation by dviejokfs · Pull Request #12 · gotempsh/temps

dviejokfs · 2026-02-18T13:00:28Z

Summary

Implements Markdown for Agents content negotiation in the Pingora reverse proxy, compatible with Cloudflare's emerging standard for AI-native content delivery.

When an AI agent or HTTP client sends Accept: text/markdown, Temps now converts the upstream HTML response to Markdown on the fly before delivering it — reducing token waste and improving response quality for AI systems consuming deployed apps.

How it works

curl https://your-app.temps.sh/ -H "Accept: text/markdown"
# → Content-Type: text/markdown; charset=utf-8
# → Vary: Accept
# → X-Markdown-Tokens: <estimated count>
# → [Markdown body]

Filter chain

Stage	What happens
`early_request_filter`	Detects `Accept: text/markdown`; sets `ctx.wants_markdown = true`; disables upstream compression
`upstream_response_filter`	Confirms upstream is `text/html` (not SSE/WS); adds `Vary: Accept`; cancels conversion otherwise
`response_filter`	Rewrites `Content-Type → text/markdown; charset=utf-8`; removes `Content-Length` and `Content-Encoding`; adds `X-Markdown-Tokens: 0` placeholder
`response_body_filter`	Buffers chunks; converts HTML → Markdown via `htmd` on `end_of_stream`; falls back to passthrough on error or size limit exceeded

Safety guards

SSE/WebSocket: conversion is cancelled if ctx.is_sse or ctx.is_websocket is set — streaming responses always pass through unchanged
Non-HTML: only text/html upstream responses are converted; JSON, images, etc. pass through as-is
2 MB limit: responses larger than 2 MB fall back to passthrough, mirroring Cloudflare's own constraint
Conversion failure: if htmd fails, the original HTML bytes are returned so the client always gets something

Tests

18 unit tests added in proxy::markdown_tests:

Accept header parsing (exact, quality values, case-insensitive, negative)
Content-type gating (HTML converts, JSON/SSE/WS do not)
Multi-chunk body accumulation
Size guard (>2 MB disables conversion)
HTML→Markdown conversion correctness
SSE passthrough safety
Token estimation

Dependencies

htmd 0.5 — pure Rust HTML→Markdown converter, no C dependencies

Implements Markdown for Agents content negotiation in the Pingora proxy, compatible with Cloudflare's emerging standard for AI-native content delivery. When an HTTP client sends 'Accept: text/markdown', the proxy detects the preference in early_request_filter and, if the upstream returns text/html, buffers the response body and converts it to Markdown via htmd before forwarding it to the client. Key behaviours: - Detection: Accept header parsed in early_request_filter; compression disabled for markdown requests to receive raw HTML bytes - Gating: upstream_response_filter cancels conversion for non-HTML content types, SSE streams, and WebSocket upgrades; adds Vary: Accept - Conversion: response_body_filter accumulates chunks and converts the full body on end_of_stream using htmd - Size guard: responses larger than 2 MB fall back to passthrough, mirroring Cloudflare's limit - Headers: Content-Type rewritten to text/markdown; charset=utf-8, Content-Length and Content-Encoding removed, X-Markdown-Tokens set as a best-effort placeholder - Token estimation: word-count heuristic (words * 4 / 3) matching the rough estimate in Cloudflare's x-markdown-tokens header 18 unit tests cover: Accept header parsing, content-type gating, SSE/WebSocket passthrough safety, multi-chunk accumulation, size guard, HTML-to-Markdown conversion, and token estimation.

…d update changelog

Without pre-extraction, htmd converted the entire page document including inlined <script> and <style> blocks, nav, sidebars and footers — producing output 12x larger than Cloudflare's equivalent (115 KB vs 9.5 KB). Now uses scraper to find the first <main> element (document order / shallowest depth) before passing to htmd, with fallback to <body> and then the raw HTML. This matches Cloudflare's Markdown for Agents approach: only the article content node is converted, not the full browser-rendered document shell. Result on the Cloudflare docs reference page: Before: 115,020 bytes (full page + CSS/JS noise) After: 15,658 bytes (article content only) 5 new extraction unit tests added covering: <main> preferred over surrounding nav/footer, <body> fallback, first-of-multiple-main wins, script/style outside <main> excluded, and fragment passthrough.

Extract gate and header-rewrite logic into free functions (apply_markdown_upstream_gate, apply_markdown_response_headers) so they can be tested without a live Pingora session. Gate now cancels conversion for: non-2xx status codes (4xx/5xx/3xx), missing Content-Type, non-HTML content types, uppercase Content-Type (TEXT/HTML), SSE, and WebSocket. 23 new pipeline tests cover every edge case end-to-end (gate → header rewrite → body filter).

dviejokfs added 4 commits February 18, 2026 14:00

fix(proxy): fix clippy unnecessary_literal_unwrap in markdown test an…

45ef12a

…d update changelog

dviejokfs merged commit 8f99026 into main Feb 18, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat(proxy): Markdown for Agents — Accept: text/markdown content negotiation#12

feat(proxy): Markdown for Agents — Accept: text/markdown content negotiation#12
dviejokfs merged 4 commits intomainfrom
feat/markdown-for-agents

dviejokfs commented Feb 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

dviejokfs commented Feb 18, 2026

Summary

How it works

Filter chain

Safety guards

Tests

Dependencies

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant