Skip to content

Update robots txt#244

Open
dawnkelly09 wants to merge 5 commits intomainfrom
update-robots-txt
Open

Update robots txt#244
dawnkelly09 wants to merge 5 commits intomainfrom
update-robots-txt

Conversation

@dawnkelly09
Copy link
Contributor

Using Disallow: /ai/ was keeping user-based agent bots like Claude and ChatGPT from being able to access the LLM markdown files intended for their use. For now, we can remove all Disallow statements as Google doesn't index Markdown files at all so there is no concern around search results serving the MD rather than HTML version of pages to users.

Copilot AI review requested due to automatic review settings March 5, 2026 20:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates crawling and MkDocs configuration to allow LLM user-agents to access /ai/ markdown resources while providing a way to disable LLM-related MkDocs plugins via an environment toggle.

Changes:

  • Removed the /ai/ disallow rule from robots.txt.
  • Added ENABLED_LLMS_PLUGINS-gated enablement for LLM-related MkDocs plugins.
  • Documented the new local-development toggle in README.md.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
robots.txt Allows crawlers (including LLM user-agents) to access /ai/ paths by removing the disallow rule.
mkdocs.yml Adds env-controlled enabled flags for LLM-related MkDocs plugins.
README.md Documents how to disable git revision and LLM plugins locally for faster builds.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dawnkelly09 dawnkelly09 requested a review from eshaben March 5, 2026 20:34
@dawnkelly09
Copy link
Contributor Author

This updated version works as follows:

  • General allow statement to allow all bots on all parts of the site by default
  • Override statement to block search engine indexer bots from /ai/ dir
  • Override statement to block LLM training crawlers
  • Sitemap

This combination will allow LLM user-related bots to access the /ai/ dir AND our regular webpages, block search engine indexers from grabbing the /ai/ files to prevent duplicate content and wrong file format in search results, and block all LLM crawlers who are scraping for training data from both /ai/ and the webpages.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants