-
Notifications
You must be signed in to change notification settings - Fork 403
feat(website): add llms.txt support for LLM-friendly content #13932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
Add support for generating llms.txt and .llms.md files for Quarto websites, providing LLM-friendly markdown versions of HTML pages. Features: - New `llms-txt: true` option in website config - Generates .llms.md companion files alongside HTML output - Creates llms.txt index file linking to all markdown pages - Converts HTML to clean markdown using Pandoc with Lua filter - Handles callouts (blockquotes with bold type markers) - Converts images to markdown syntax - Converts internal links from .html to .llms.md - Respects draft settings (excludes drafts from output) - Cleans listing pages (removes empty links, category badges) - Matches sitemap behavior for incremental builds New files: - src/project/types/website/website-llms.ts - src/resources/filters/llms/llms.lua Test coverage: - Basic file generation - Content conversion (callouts, code, tables, links) - Draft handling - Listing page cleanup Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
20d8156 to
b5a829e
Compare
…tests - Add **/*.llms.md to projectHiddenIgnoreGlob() to prevent cascading renders of llms.txt companion files - Fix ensureLlmsTxt* test functions to use dirname(htmlFile) instead of treating file path as directory - Update llms-txt test files to use correct two-element array format for regex matches [matches, no_matches] - Add render-project: true where needed for llms.txt generation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…tibility Use pathWithForwardSlashes() to ensure paths in llms.txt use forward slashes on all platforms. Also adds changelog entry for the llms-txt feature. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Overall, it looks good! I think this is a great way to do it. So I had a look at pkgdown implementation to compare:
For example Definition list are converted to bullet list. Probably because GFM does not support them
unless activated but is this really GFM syntax ? An example of their output: https://pkgdown.r-lib.org/llms.txt I am thinking among good ideas:
Just some ideas - I am sure we'll have more feedback when this will be tested. |
| return { | ||
| name: `File ${llmsFile} exists`, | ||
| verify: (_output: ExecuteOutput[]) => { | ||
| verifyPath(llmsFile); | ||
| return Promise.resolve(); | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have fileExists() if we want to refactor and avoid duplication
Lines 228 to 236 in 6c8a9b1
| export const fileExists = (file: string): Verify => { | |
| return { | |
| name: `File ${file} exists`, | |
| verify: (_output: ExecuteOutput[]) => { | |
| verifyPath(file); | |
| return Promise.resolve(); | |
| }, | |
| }; | |
| }; |
| return { | ||
| name: `File ${llmsFile} does not exist`, | ||
| verify: (_output: ExecuteOutput[]) => { | ||
| verifyNoPath(llmsFile); | ||
| return Promise.resolve(); | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have pathDoNotExists() if we want to reuse and avoid code duplication
Lines 238 to 246 in 6c8a9b1
| export const pathDoNotExists = (path: string): Verify => { | |
| return { | |
| name: `path ${path} do not exists`, | |
| verify: (_output: ExecuteOutput[]) => { | |
| verifyNoPath(path); | |
| return Promise.resolve(); | |
| }, | |
| }; | |
| }; |
| // Verify the llms.txt index file in a website output directory. | ||
| // Takes the HTML file path and looks for llms.txt in the same directory. | ||
| export const ensureLlmsTxtRegexMatches = ( | ||
| htmlFile: string, | ||
| matchesUntyped: (string | RegExp)[], | ||
| noMatchesUntyped?: (string | RegExp)[], | ||
| ): Verify => { | ||
| const llmsTxtPath = join(dirname(htmlFile), "llms.txt"); | ||
| return verifyFileRegexMatches(regexChecker, `Inspecting ${llmsTxtPath} for Regex matches`)(llmsTxtPath, matchesUntyped, noMatchesUntyped); | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This verify helper is to be used only for index.qmd or another .qmd test that will be at the root of the output dir right ?
If we need to have verify function that works on output-dir as input, it is just a matter of adding the function as special handling in smoke-all.test.ts and you could have
export const ensureLlmsTxtRegexMatches = (
outputDir: string,
matchesUntyped: (string | RegExp)[],
noMatchesUntyped?: (string | RegExp)[],
): Verify => {
const llmsTxtPath = join(outputDir, "llms.txt");
return verifyFileRegexMatches(regexChecker, `Inspecting ${llmsTxtPath} for Regex matches`)(llmsTxtPath, matchesUntyped, noMatchesUntyped);
};But I guess this is just a matter of being sure to use ensureLlmsTxtRegexMatches() only in compatible source document.
Just a thought while reviewing the new functions in verify.ts
| return { | ||
| name: `File ${llmsTxtPath} exists`, | ||
| verify: (_output: ExecuteOutput[]) => { | ||
| verifyPath(llmsTxtPath); | ||
| return Promise.resolve(); | ||
| }, | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same - could be fileExists()
| return { | ||
| name: `File ${llmsTxtPath} does not exist`, | ||
| verify: (_output: ExecuteOutput[]) => { | ||
| verifyNoPath(llmsTxtPath); | ||
| return Promise.resolve(); | ||
| }, | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And same could be pathDoNotExists
Add support for generating llms.txt and .llms.md files for Quarto websites, providing LLM-friendly markdown versions of HTML pages.
Features:
llms-txt: trueoption in website configNew files:
Test coverage: