Skip to content

Conversation

@cscheid
Copy link
Collaborator

@cscheid cscheid commented Jan 21, 2026

Add support for generating llms.txt and .llms.md files for Quarto websites, providing LLM-friendly markdown versions of HTML pages.

Features:

  • New llms-txt: true option in website config
  • Generates .llms.md companion files alongside HTML output
  • Creates llms.txt index file linking to all markdown pages
  • Converts HTML to clean markdown using Pandoc with Lua filter
  • Handles callouts (blockquotes with bold type markers)
  • Converts images to markdown syntax
  • Converts internal links from .html to .llms.md
  • Respects draft settings (excludes drafts from output)
  • Cleans listing pages (removes empty links, category badges)
  • Matches sitemap behavior for incremental builds

New files:

  • src/project/types/website/website-llms.ts
  • src/resources/filters/llms/llms.lua

Test coverage:

  • Basic file generation
  • Content conversion (callouts, code, tables, links)
  • Draft handling
  • Listing page cleanup

@posit-snyk-bot
Copy link
Collaborator

posit-snyk-bot commented Jan 21, 2026

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Add support for generating llms.txt and .llms.md files for Quarto websites,
providing LLM-friendly markdown versions of HTML pages.

Features:
- New `llms-txt: true` option in website config
- Generates .llms.md companion files alongside HTML output
- Creates llms.txt index file linking to all markdown pages
- Converts HTML to clean markdown using Pandoc with Lua filter
- Handles callouts (blockquotes with bold type markers)
- Converts images to markdown syntax
- Converts internal links from .html to .llms.md
- Respects draft settings (excludes drafts from output)
- Cleans listing pages (removes empty links, category badges)
- Matches sitemap behavior for incremental builds

New files:
- src/project/types/website/website-llms.ts
- src/resources/filters/llms/llms.lua

Test coverage:
- Basic file generation
- Content conversion (callouts, code, tables, links)
- Draft handling
- Listing page cleanup

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…tests

- Add **/*.llms.md to projectHiddenIgnoreGlob() to prevent cascading
  renders of llms.txt companion files
- Fix ensureLlmsTxt* test functions to use dirname(htmlFile) instead
  of treating file path as directory
- Update llms-txt test files to use correct two-element array format
  for regex matches [matches, no_matches]
- Add render-project: true where needed for llms.txt generation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@cscheid cscheid requested a review from cderv January 21, 2026 22:53
…tibility

Use pathWithForwardSlashes() to ensure paths in llms.txt use forward
slashes on all platforms. Also adds changelog entry for the llms-txt
feature.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@cderv cderv self-assigned this Jan 23, 2026
@cderv
Copy link
Collaborator

cderv commented Jan 26, 2026

Overall, it looks good! I think this is a great way to do it.

So I had a look at pkgdown implementation to compare:

  • They use llm-docs: false as fields to opt-out.

  • It also generates a llms.txt by mixing the content of README.md for the description part, and then an index of all the pages. For pkgdown, usually README.md is the index.html content. So it is included as such. Not sure it makes sense to add index.html content. 🤔

  • All pages '.htmlare just.md- notllm.md. I don't know if there is a convention. Website like Claude doc all use .md` (example with https://code.claude.com/docs/en/settings.md)

  • Links are absolute paths in pkgdown - they contain the site url. Same for claude docs website: https://code.claude.com/docs/llms.txt

  • After links there is a description based on source file header, the : <description> optional part. It would be good. I wonder if we should add a title and description field under llm-docs to customize that ? Or if we can find a way to use a default if provided (like open-graph.description ? ). Otherwise there is a description-meta available in HTML if set already

    $if(description-meta)$
    <meta name="description" content="$description-meta$" />
    $endif$

  • They do some special processing of some content, probably based on what a pkgdown can have. We will probably add more as users use this feature if we missed some. Just one example below.

For example Definition list are converted to bullet list. Probably because GFM does not support them

❯ pandoc -f html -t gfm
<dl>
<dt>Term 1</dt>
<dd>
<p>Definition 1</p>
</dd>
</dl>
<p>Term 2 with <em>inline markup</em></p>
^Z
Term 1
Definition 1

Term 2 with *inline markup*

unless activated but is this really GFM syntax ?

❯ pandoc -f html -t gfm+definition_lists
<dl>
<dt>Term 1</dt>
<dd>
<p>Definition 1</p>
</dd>
</dl>
<p>Term 2 with <em>inline markup</em></p>
^Z
Term 1

:   Definition 1

Term 2 with *inline markup*

An example of their output: https://pkgdown.r-lib.org/llms.txt

I am thinking among good ideas:

  • Using absolute url for links if site-url is set? A LLM would not have to reconstruct the url from base + relative link that way. 🤔
  • using same argument llm-docs ? but llm-txt seems more clear regarding the specification website and created file.
  • allow to customize [title](url): description or just use <meta name="description" content="$description-meta$" /> content if available ?

Just some ideas - I am sure we'll have more feedback when this will be tested.

Comment on lines +601 to +606
return {
name: `File ${llmsFile} exists`,
verify: (_output: ExecuteOutput[]) => {
verifyPath(llmsFile);
return Promise.resolve();
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have fileExists() if we want to refactor and avoid duplication

quarto-cli/tests/verify.ts

Lines 228 to 236 in 6c8a9b1

export const fileExists = (file: string): Verify => {
return {
name: `File ${file} exists`,
verify: (_output: ExecuteOutput[]) => {
verifyPath(file);
return Promise.resolve();
},
};
};

Comment on lines +613 to +618
return {
name: `File ${llmsFile} does not exist`,
verify: (_output: ExecuteOutput[]) => {
verifyNoPath(llmsFile);
return Promise.resolve();
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have pathDoNotExists() if we want to reuse and avoid code duplication

quarto-cli/tests/verify.ts

Lines 238 to 246 in 6c8a9b1

export const pathDoNotExists = (path: string): Verify => {
return {
name: `path ${path} do not exists`,
verify: (_output: ExecuteOutput[]) => {
verifyNoPath(path);
return Promise.resolve();
},
};
};

Comment on lines +622 to +631
// Verify the llms.txt index file in a website output directory.
// Takes the HTML file path and looks for llms.txt in the same directory.
export const ensureLlmsTxtRegexMatches = (
htmlFile: string,
matchesUntyped: (string | RegExp)[],
noMatchesUntyped?: (string | RegExp)[],
): Verify => {
const llmsTxtPath = join(dirname(htmlFile), "llms.txt");
return verifyFileRegexMatches(regexChecker, `Inspecting ${llmsTxtPath} for Regex matches`)(llmsTxtPath, matchesUntyped, noMatchesUntyped);
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This verify helper is to be used only for index.qmd or another .qmd test that will be at the root of the output dir right ?

If we need to have verify function that works on output-dir as input, it is just a matter of adding the function as special handling in smoke-all.test.ts and you could have

export const ensureLlmsTxtRegexMatches = (
  outputDir: string,
  matchesUntyped: (string | RegExp)[],
  noMatchesUntyped?: (string | RegExp)[],
): Verify => {
  const llmsTxtPath = join(outputDir, "llms.txt");
  return verifyFileRegexMatches(regexChecker, `Inspecting ${llmsTxtPath} for Regex matches`)(llmsTxtPath, matchesUntyped, noMatchesUntyped);
};

But I guess this is just a matter of being sure to use ensureLlmsTxtRegexMatches() only in compatible source document.

Just a thought while reviewing the new functions in verify.ts

Comment on lines +637 to +643
return {
name: `File ${llmsTxtPath} exists`,
verify: (_output: ExecuteOutput[]) => {
verifyPath(llmsTxtPath);
return Promise.resolve();
},
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same - could be fileExists()

Comment on lines +650 to +656
return {
name: `File ${llmsTxtPath} does not exist`,
verify: (_output: ExecuteOutput[]) => {
verifyNoPath(llmsTxtPath);
return Promise.resolve();
},
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And same could be pathDoNotExists

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants