Special characters * and $ not matched in URI

Section [2.2.3 Special Characters](https://www.rfc-editor.org/rfc/rfc9309.html#section-2.2.3) contains two examples about path matching for paths containing the special characters `*` and `$`. The two characters are percent-encoded in the allow/disallow rule but not encoded in the URL/URI to be matched. Looks like the robots.txt parser and matcher does not follow the examples in the RFC here and fails to match the percent-encoded characters in the rule with the unencoded ones in the URI. See the unit test below.

`*` and `$` are among the reserved characters in URIs ([RFC 3986, section 2.2](https://www.rfc-editor.org/rfc/rfc3986#section-2.2)) and therefor cannot be percent-encoded without potentially changing the semantics of the URI.

```patch
diff --git a/robots_test.cc b/robots_test.cc
index 35853de..3a37813 100644
--- a/robots_test.cc
+++ b/robots_test.cc
@@ -492,6 +492,19 @@ TEST(RobotsUnittest, ID_SpecialCharacters) {
     EXPECT_FALSE(
         IsUserAgentAllowed(robotstxt, "FooBot", "http://foo.bar/foo/quz"));
   }
+  {
+    const absl::string_view robotstxt =
+        "User-agent: FooBot\n"
+        "Disallow: /path/file-with-a-%2A.html\n"
+        "Disallow: /path/foo-%24\n"
+        "Allow: /\n";
+    EXPECT_FALSE(
+        IsUserAgentAllowed(robotstxt, "FooBot",
+                           "https://www.example.com/path/file-with-a-*.html"));
+    EXPECT_FALSE(
+        IsUserAgentAllowed(robotstxt, "FooBot",
+                           "https://www.example.com/path/foo-$"));
+  }
 }
 
 // Google-specific: "index.html" (and only that) at the end of a pattern is
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Special characters * and $ not matched in URI #57

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Special characters * and $ not matched in URI #57

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions