Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@
All notable changes to this project will be documented in this file.
Add any new changes to the top (right below this line).

- 2025-12-05
- Set `NGINX_ROBOT_RULES` to a default of block everything.

- 2025-09-26
- Moved `EDXAPP_FEATURES_DEFAULT` and `EDXAPP_FEATURES_EXTRA` into top-level settings.

Expand Down
5 changes: 4 additions & 1 deletion playbooks/roles/nginx/defaults/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,10 @@ nginx_ecommerce_gunicorn_hosts:
nginx_credentails_gunicorn_hosts:
- 127.0.0.1

NGINX_ROBOT_RULES: [ ]
# Block all crawlers by default
NGINX_ROBOT_RULES:
- agent: "*"
disallow: "/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting a default is fine, but this is currently overridden by stage, prod, and edge ansible vars, so it won't do anything. Stage and edge already have this disallow-all rule in effect. What we would need to change is prod: https://github.com/edx/edx-internal/blob/a7701a2f1415cef320e1cb50714dda60bbbd3c51/ansible/vars/prod-edx.yml#L280

However, Robert pointed out some previous work that raises a question of whether we're ready for this step: edx/edx-arch-experiments#852 (comment)

The LMS has had a noindex meta tag in place for over a year so we're probably fine to go ahead with a robots.txt change, but I'd want to check with SEO first and and I'm not sure what the status is of the other sites.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting a default is fine, but this is currently overridden by stage, prod, and edge ansible vars, so it won't do anything.

This default will also apply to the MFEs and IDAs, though, right? It's not just going to touch edxapp? Only a few of them have overrides. We want this rule to apply everywhere.

The plan is to make any necessary the edx-internal changes after this is approved and merged.

I'd want to check with SEO first and and I'm not sure what the status is of the other sites.

The request comes from SEO and leadership (and legal), and should apply to all sites. None of the sites we'll be whitelisting are controlled in this stack (eg. marketing and support).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This default will also apply to the MFEs and IDAs, though, right? It's not just going to touch edxapp?

I honestly have no idea. I don't think very many things currently use edx/configuration -- I only know about edxapp and some analytics stuff. I thought MFEs are all already on k8s, and wouldn't be affected by this.

The plan is to make any necessary the edx-internal changes after this is approved and merged.

So you'd set this default and then remove the prod (and stage and edge) overrides?

I'm fine with this merging, just wanted to check if you were aware that it wouldn't do anything by itself (and that the noindex thing might be an issue).

NGINX_EDXAPP_EMBARGO_CIDRS: []
NGINX_P3P_MESSAGE: 'CP="Open edX does not have a P3P policy."'

Expand Down