-
Notifications
You must be signed in to change notification settings - Fork 6
chore: block robots.txt on all mfes, idas, and edxapp #300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: block robots.txt on all mfes, idas, and edxapp #300
Conversation
the default for all configurations should be to block crawlers in the robot file. FIXES: APER-4252
updating changelog
| # Block all crawlers by default | ||
| NGINX_ROBOT_RULES: | ||
| - agent: "*" | ||
| disallow: "/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting a default is fine, but this is currently overridden by stage, prod, and edge ansible vars, so it won't do anything. Stage and edge already have this disallow-all rule in effect. What we would need to change is prod: https://github.com/edx/edx-internal/blob/a7701a2f1415cef320e1cb50714dda60bbbd3c51/ansible/vars/prod-edx.yml#L280
However, Robert pointed out some previous work that raises a question of whether we're ready for this step: edx/edx-arch-experiments#852 (comment)
The LMS has had a noindex meta tag in place for over a year so we're probably fine to go ahead with a robots.txt change, but I'd want to check with SEO first and and I'm not sure what the status is of the other sites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting a default is fine, but this is currently overridden by stage, prod, and edge ansible vars, so it won't do anything.
This default will also apply to the MFEs and IDAs, though, right? It's not just going to touch edxapp? Only a few of them have overrides. We want this rule to apply everywhere.
The plan is to make any necessary the edx-internal changes after this is approved and merged.
I'd want to check with SEO first and and I'm not sure what the status is of the other sites.
The request comes from SEO and leadership (and legal), and should apply to all sites. None of the sites we'll be whitelisting are controlled in this stack (eg. marketing and support).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This default will also apply to the MFEs and IDAs, though, right? It's not just going to touch edxapp?
I honestly have no idea. I don't think very many things currently use edx/configuration -- I only know about edxapp and some analytics stuff. I thought MFEs are all already on k8s, and wouldn't be affected by this.
The plan is to make any necessary the edx-internal changes after this is approved and merged.
So you'd set this default and then remove the prod (and stage and edge) overrides?
I'm fine with this merging, just wanted to check if you were aware that it wouldn't do anything by itself (and that the noindex thing might be an issue).
timmc-edx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, although see caveats in discussion.
setting
NGINX_ROBOT_RULESdefaults to block all crawlers.FIXES: APER-4252
Make sure that the following steps are done before merging: