Skip to content

Conversation

@deborahgu
Copy link
Member

@deborahgu deborahgu commented Dec 4, 2025

setting NGINX_ROBOT_RULES defaults to block all crawlers.

FIXES: APER-4252


Make sure that the following steps are done before merging:

  • Have a Site Reliability Engineer review the PR if you don't own all of the services impacted.
  • If you are adding any new default values that need to be overridden when this change goes live, update internal repos and add an entry to the top of the CHANGELOG.
  • Performed the appropriate testing.

the default for all configurations should be to block crawlers in the
robot file.

FIXES: APER-4252
updating changelog
Copilot AI review requested due to automatic review settings December 4, 2025 21:08

This comment was marked as spam.

# Block all crawlers by default
NGINX_ROBOT_RULES:
- agent: "*"
disallow: "/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting a default is fine, but this is currently overridden by stage, prod, and edge ansible vars, so it won't do anything. Stage and edge already have this disallow-all rule in effect. What we would need to change is prod: https://github.com/edx/edx-internal/blob/a7701a2f1415cef320e1cb50714dda60bbbd3c51/ansible/vars/prod-edx.yml#L280

However, Robert pointed out some previous work that raises a question of whether we're ready for this step: edx/edx-arch-experiments#852 (comment)

The LMS has had a noindex meta tag in place for over a year so we're probably fine to go ahead with a robots.txt change, but I'd want to check with SEO first and and I'm not sure what the status is of the other sites.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting a default is fine, but this is currently overridden by stage, prod, and edge ansible vars, so it won't do anything.

This default will also apply to the MFEs and IDAs, though, right? It's not just going to touch edxapp? Only a few of them have overrides. We want this rule to apply everywhere.

The plan is to make any necessary the edx-internal changes after this is approved and merged.

I'd want to check with SEO first and and I'm not sure what the status is of the other sites.

The request comes from SEO and leadership (and legal), and should apply to all sites. None of the sites we'll be whitelisting are controlled in this stack (eg. marketing and support).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This default will also apply to the MFEs and IDAs, though, right? It's not just going to touch edxapp?

I honestly have no idea. I don't think very many things currently use edx/configuration -- I only know about edxapp and some analytics stuff. I thought MFEs are all already on k8s, and wouldn't be affected by this.

The plan is to make any necessary the edx-internal changes after this is approved and merged.

So you'd set this default and then remove the prod (and stage and edge) overrides?

I'm fine with this merging, just wanted to check if you were aware that it wouldn't do anything by itself (and that the noindex thing might be an issue).

Copy link
Member

@timmc-edx timmc-edx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, although see caveats in discussion.

@deborahgu deborahgu merged commit 427380f into master Dec 8, 2025
9 checks passed
@deborahgu deborahgu deleted the dkaplan1/APER-4252_block-robots.txt-on-all-mfes,-idas,-and-edxapp branch December 8, 2025 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants