Skip to content

Conversation

@vedmaka
Copy link

@vedmaka vedmaka commented Oct 23, 2025

This pull request introduces the following improvements:

  1. Allows to configure the list of protected special pages via $wgCrawlerProtectedSpecialPages
  2. Allows for quicker halting of incoming requests via $wgCrawlerProtectionDenyFast toggle (turned off by default)

"CrawlerProtectedSpecialPages": {
"value": [
"recentchangeslinked",
"whatlinkshere"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add "mobilediff" too

|| in_array( 'Special:' . $special->getName(), $protectedSpecialPages, true )
) {
$out = $special->getContext()->getOutput();
if ( $denyFast ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add unit tests to test this branch

// allow forgiving entries in the setting array for Special pages names
in_array( $special->getName(), $protectedSpecialPages, true )
|| in_array( $name, $protectedSpecialPages, true )
|| in_array( 'Special:' . $special->getName(), $protectedSpecialPages, true )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refactor magic word "Special:" to a constant variable at the top of the file

$config = MediaWikiServices::getInstance()->getMainConfig();
$protectedSpecialPages = $config->get( 'CrawlerProtectedSpecialPages' );
$denyFast = $config->get( 'CrawlerProtectedSpecialPages' );

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having multiple checks, please add a line to get a version of $protectedSpecialPages which has been tolowercaseed and had Special: stripped from it.

For example:

$result = array_map(
    fn($p) => ($p = strtolower($p)) && strpos($p, NS_SPECIAL_NAME) === 0
        ? substr($p, 8)
        : $p,
    $protectedSpecialPages
);

if ( in_array( $name, [ 'recentchangeslinked', 'whatlinkshere' ], true ) ) {
if (
// allow forgiving entries in the setting array for Special pages names
in_array( $special->getName(), $protectedSpecialPages, true )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 3 lines will be redundant once the transformation is applied. Please remove the two extra lines

* @return void
* @suppress PhanPluginNeverReturnMethod
*/
protected function denyAccessFast() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Deny access fast" is a subjective name which I don't think properly addresses why one might choose to use this. Naming is a hard problem to solve, so I do empathize. How about we change the 403 vs. 418 preference variable to $wgCrawlerProtectionUse418, and this function's name to denyAccessWith418()?

@jeffw16
Copy link
Member

jeffw16 commented Oct 26, 2025

Also, could we use this PR to incorporate the per-feature blocking options in #6?

@jeffw16
Copy link
Member

jeffw16 commented Nov 29, 2025

This PR has been superseded by #12

@jeffw16 jeffw16 closed this Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants