From 50c79ec8a1250452db412f84f2982f85b0cb4817 Mon Sep 17 00:00:00 2001 From: Yves-Marie Date: Fri, 15 Mar 2024 14:39:43 +0100 Subject: [PATCH 01/19] fix portValue typo --- spec.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec.bs b/spec.bs index ad3954a..8cccdd4 100644 --- a/spec.bs +++ b/spec.bs @@ -1748,7 +1748,7 @@ To convert a modifier to a string given a [=part/modifier=] |modifier
To canonicalize a port given a string |portValue| and optionally a string |protocolValue|: - 1. If |value| is the empty string, return |value|. + 1. If |portValue| is the empty string, return |portValue|. 1. Let |dummyURL| be a new [=URL record=]. 1. If |protocolValue| was given, then set |dummyURL|'s [=url/scheme=] to |protocolValue|.

Note, we set the [=URL record=]'s [=url/scheme=] in order for the [=basic URL parser=] to recognize and normalize default port values.

From a8bf21b249a52351126d3308c376e9cca25e34fb Mon Sep 17 00:00:00 2001 From: Yves-Marie Date: Fri, 15 Mar 2024 14:55:53 +0100 Subject: [PATCH 02/19] add serializing for port and host properties --- spec.bs | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/spec.bs b/spec.bs index 8cccdd4..e0ef8ee 100644 --- a/spec.bs +++ b/spec.bs @@ -21,6 +21,8 @@ spec: ECMASCRIPT; urlPrefix: https://tc39.es/ecma262/ spec: URL; urlPrefix: https://url.spec.whatwg.org/ type: dfn text: serialize an integer; url: #serialize-an-integer + text: host serializer; url: #concept-host-serializer +

URL patterns

@@ -505,8 +507,8 @@ A component is a [=struct=] with the following [=struct/items=]: 1. Set |protocol| to |url|'s [=url/scheme=]. 1. Set |username| to |url|'s [=url/username=]. 1. Set |password| to |url|'s [=url/password=]. - 1. Set |hostname| to |url|'s [=url/host=] or the empty string if the value is null. - 1. Set |port| to |url|'s [=url/port=] or the empty string if the value is null. + 1. Set |hostname| to |url|'s [=url/host=], [=host serializer|serialized=], or the empty string if the value is null. + 1. Set |port| to |url|'s [=url/port=], [=serialize an integer|serialized=], or the empty string if the value is null. 1. Set |pathname| to the result of [=URL path serializing=] |url|. 1. Set |search| to |url|'s [=url/query=] or the empty string if the value is null. 1. Set |hash| to |url|'s [=url/fragment=] or the empty string if the value is null. @@ -1725,7 +1727,7 @@ To convert a modifier to a string given a [=part/modifier=] |modifier 1. Let |dummyURL| be a new [=URL record=]. 1. Let |parseResult| be the result of running the [=basic URL parser=] given |value| with |dummyURL| as [=basic URL parser/url=] and [=hostname state=] as [=basic URL parser/state override=]. 1. If |parseResult| is failure, then throw a {{TypeError}}. - 1. Return |dummyURL|'s [=url/host=]. + 1. Return |dummyURL|'s [=url/host=], [=host serializer|serialized=], or empty string if it is null.
From c46e9811a61546bfc0fe5991bd3082a085be8282 Mon Sep 17 00:00:00 2001 From: Yves-Marie Date: Fri, 15 Mar 2024 21:31:29 +0100 Subject: [PATCH 03/19] newline cleanup --- spec.bs | 1 - 1 file changed, 1 deletion(-) diff --git a/spec.bs b/spec.bs index e0ef8ee..93bfbea 100644 --- a/spec.bs +++ b/spec.bs @@ -22,7 +22,6 @@ spec: URL; urlPrefix: https://url.spec.whatwg.org/ type: dfn text: serialize an integer; url: #serialize-an-integer text: host serializer; url: #concept-host-serializer -

URL patterns

From 147b26a151accebb0c27571e3ba3cf3fcead8ebb Mon Sep 17 00:00:00 2001 From: Jeremy Roman Date: Thu, 29 Aug 2024 10:46:45 -0400 Subject: [PATCH 04/19] Editorial: Mark IDs referred to by IETF documents as required (#233) Per https://whatwg.org/working-mode#anchors this is being done to ensure anchors to these concepts continue to work into the future. Resolves #231. --- spec.bs | 1 + 1 file changed, 1 insertion(+) diff --git a/spec.bs b/spec.bs index 4fc708a..c5f07cd 100644 --- a/spec.bs +++ b/spec.bs @@ -7,6 +7,7 @@ Text Macro: LATESTRD 2024-03 Abstract: The URL Pattern Standard provides a web platform primitive for matching URLs based on a convenient pattern syntax. Indent: 2 Markup Shorthands: markdown yes +Required IDs: url-pattern,url-pattern-create,url-pattern-match,url-pattern-has-regexp-groups
@@ -124,6 +126,10 @@ It can be constructed using a string for each component, or from a shorthand str * `https://nx.shop.example/products/01?speed=5#reviews` * `https://shop.example/products/chair#reviews` + + This is a more complicated pattern which includes: + * [=part/modifier/optional=] parts marked with `?` (braces are needed to make it unambiguous exactly what is optional), and + * a [=part/type/regexp=] part named "`id`" which uses a regular expression to define what sorts of substrings match (the parentheses are required to mark it as a regular expression, and are not part of the regexp itself).
@@ -168,6 +174,8 @@ It can be constructed using a string for each component, or from a shorthand str * `https://discussion.example/forum/admin/` * `http://discussion.example:8080/admin/update?id=1` + + This pattern demonstrates how pathnames are resolved relative to a base URL, in a similar way to relative URLs.

The {{URLPattern}} class

@@ -951,7 +959,7 @@ It can be [=parse a pattern string|parsed=] to produce a [=/part list=] which de
Pattern strings can contain capture groups, which by default match the shortest possible string, up to a component-specific separator (`/` in the pathname, `.` in the hostname). For example, the pathname pattern "`/blog/:title`" will match "`/blog/hello-world`" but not "`/blog/2012/02`". - A regular expression can also be used instead, so the pathname pattern "`/blog/:year(\\d+)/:month(\\d+)`" will match "`/blog/2012/02`". + A regular expression enclosed in parentheses can also be used instead, so the pathname pattern "`/blog/:year(\\d+)/:month(\\d+)`" will match "`/blog/2012/02`". A group can also be made optional, or repeated, by using a modifier. For example, the pathname pattern "`/products/:id?"` will match both "`/products`" and "`/products/2`" (but not "`/products/`"). In the pathname specifically, groups automatically require a leading `/`; to avoid this, the group can be explicitly deliminated, as in the pathname pattern "`/products/{:id}?`". From 90ac4a944edb50378e16227ad3b7952b933a560f Mon Sep 17 00:00:00 2001 From: Shunya Shishido Date: Fri, 27 Sep 2024 09:36:21 -0700 Subject: [PATCH 06/19] Meta: link Simplified Chinese translation (#238) Closes #235. --- spec.bs | 1 + 1 file changed, 1 insertion(+) diff --git a/spec.bs b/spec.bs index 4e1e740..8a7ee40 100644 --- a/spec.bs +++ b/spec.bs @@ -7,6 +7,7 @@ Text Macro: LATESTRD 2024-03 Abstract: The URL Pattern Standard provides a web platform primitive for matching URLs based on a convenient pattern syntax. Indent: 2 Markup Shorthands: markdown yes +Translation: zh-Hans https://htmlspecs.com/urlpattern/ Required IDs: url-pattern,url-pattern-create,url-pattern-match,url-pattern-has-regexp-groups From bd98b707646f0eef7732f2e0358ccabd562d869b Mon Sep 17 00:00:00 2001 From: Jeremy Roman Date: Fri, 27 Sep 2024 13:49:06 -0700 Subject: [PATCH 07/19] Explain how HTTP header fields integrate with URL patterns Per discussion on #182 some text explaining this would be useful. This mostly consists of advice since the useful algorithms are already exposed. --- spec.bs | 33 ++++++++++++++++++++++++++++----- 1 file changed, 28 insertions(+), 5 deletions(-) diff --git a/spec.bs b/spec.bs index 8a7ee40..bf65012 100644 --- a/spec.bs +++ b/spec.bs @@ -25,6 +25,11 @@ spec: URL; urlPrefix: https://url.spec.whatwg.org/ type: dfn text: serialize an integer; url: #serialize-an-integer text: host serializer; url: #concept-host-serializer +spec: RFC8941; urlPrefix: https://httpwg.org/specs/rfc8941.html + type: dfn + text: structured header; url: top + for: structured header + text: string; url: string

URL patterns

@@ -33,7 +38,7 @@ spec: URL; urlPrefix: https://url.spec.whatwg.org/ A [=URL pattern=] consists of several [=components=], each of which represents a [=/pattern string|pattern=] which could be matched against the corresponding component of a [=/URL=]. -It can be constructed using a string for each component, or from a shorthand string. It can optionally be resolved relative to a base URL. +It can be constructed using a string for each component, or from a [[#constructor-string-parsing|shorthand string]]. It can optionally be resolved relative to a base URL.

The shorthand "`https://example.com/:category/*`" corresponds to the following components: @@ -1993,7 +1998,7 @@ typedef (USVString or URLPatternInit or URLPattern) URLPatternCompatible; JavaScript APIs should accept all of: * a {{URLPattern}} object * a dictionary-like object which specifies the components required to construct a pattern -* a string (in the constructor string syntax) +* a string (in the [[#constructor-string-parsing|constructor string syntax]]) To accomplish this, specifications should accept {{URLPatternCompatible}} as an argument to an [=operation=] or [=dictionary member=], and process it using the following algorithm, using the appropriate [=environment settings object=]'s [=environment settings object/API base URL=] or equivalent. @@ -2028,7 +2033,7 @@ This allows authors to concisely specify most patterns, and use the JavaScript APIs and accept both: * an object which specifies the components required to construct a pattern -* a string (in the constructor string syntax) +* a string (in the [[#constructor-string-parsing|constructor string syntax]]) If a specification has an Infra value (e.g., after using [=parse a JSON string to an Infra value=]), use the following algorithm, using the appropriate base URL (by default, the URL of the JSON resource). [[INFRA]] @@ -2036,7 +2041,7 @@ If a specification has an Infra value (e.g., after using [=parse a JSON string t To build a [=URL pattern=] from an Infra value |rawPattern| given [=/URL=] |baseURL|, perform the following steps. 1. Let |serializedBaseURL| be the [=URL serializer|serialization=] of |baseURL|. - 1. If |rawPattern| is a [=string=], then: + 1. If |rawPattern| is a [=/string=], then: 1. Return the result of [=creating=] a URL pattern given |rawPattern|, |serializedBaseURL|, and an empty [=map=].

It might become necessary in the future to plumb non-empty options here.
@@ -2044,7 +2049,7 @@ If a specification has an Infra value (e.g., after using [=parse a JSON string t 1. Otherwise, if |rawPattern| is a [=map=], then: 1. Let |init| be «[ "{{URLPatternInit/baseURL}}" → |serializedBaseURL| ]», representing a dictionary of type {{URLPatternInit}}. 1. [=map/For each=] |key| → |value| of |rawPattern|: - 1. If |key| is not the identifier of a dictionary member of {{URLPatternInit}} or one of its inherited dictionaries, |value| is not a [=string=], or the member's type is not declared to be {{USVString}}, then return null. + 1. If |key| is not the identifier of a dictionary member of {{URLPatternInit}} or one of its inherited dictionaries, |value| is not a [=/string=], or the member's type is not declared to be {{USVString}}, then return null.
This will need to be updated if {{URLPatternInit}} gains members of other types.
A future version of this specification might also have a less strict mode, if that proves useful to other specifications.
@@ -2059,6 +2064,24 @@ If a specification has an Infra value (e.g., after using [=parse a JSON string t Specifications may wish to leave room in their formats to accept options for {{URLPatternOptions}}, override the base URL, or similar, since it is not possible to construct a {{URLPattern}} object directly in this case, unlike in a JavaScript API. For example, Speculation Rules accepts a "`relative_to`" key which can be used to switch to using the [=document base URL=] instead of the JSON resource's URL. [[SPECULATION-RULES]] +

Integrating with HTTP header fields

+ +HTTP headers which include URL patterns should accept a string in the [[#constructor-string-parsing|constructor string syntax]], likely as part of a structured field [[RFC8941]]. + +
No known header accepts the dictionary syntax for URL patterns. If that changes, this specification will be updated to define it, likely by processing [[RFC8941]] inner lists.
+ +Specifications for HTTP headers should operate on [=URL patterns=] (e.g., using the [=URL pattern/match=] algorithm) rather than {{URLPattern}} objects (which imply the existence of a JavaScript [=ECMAScript/realm=]). + +
+ To build a [=URL pattern=] from an HTTP structured field value |rawPattern| given [=/URL=] |baseURL|: + + 1. Let |serializedBaseURL| be the [=URL serializer|serialization=] of |baseURL|. + 1. [=Assert=]: |rawPattern| is a [=structured header/string=]. + 1. Return the result of [=creating=] a URL pattern given |rawPattern|, |serializedBaseURL|, and an empty [=map=]. +
+ +
Specifications might consider accepting only patterns which do not [=URL pattern/has regexp groups|have regexp groups=] if evaluating the pattern, since the performance of such patterns might be more reliable, and may not require a [[ECMA-262]] regular expression implementation, which may have security, code size, or other implications for implementations. On the other hand, JavaScript APIs run in environments where such an implementation is readily available.
+

Acknowledgments

The editors would like to thank From 60be5b7a6c5a1cf271f715e90cc44e10c163e1cd Mon Sep 17 00:00:00 2001 From: Yagiz Nizipli Date: Tue, 7 Jan 2025 16:38:08 -0500 Subject: [PATCH 08/19] Correct condition for opaque paths in base URL This PR fixes pathname processing for inputs that have opaque pathnames. --- review-drafts/2024-03.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/review-drafts/2024-03.bs b/review-drafts/2024-03.bs index 6eaadeb..d4bcb9a 100644 --- a/review-drafts/2024-03.bs +++ b/review-drafts/2024-03.bs @@ -1868,7 +1868,7 @@ To convert a modifier to a string given a [=part/modifier=] |modifier 1. If the following are all true:
  • |baseURL| is not null;
  • -
  • |baseURL| has an [=url/opaque path=]; and
  • +
  • |baseURL| does not have an [=url/opaque path=]; and
  • the result of running [=is an absolute pathname=] given |result|["{{URLPatternInit/pathname}}"] and |type| is false,

then: From d4b660ceaa3d2b24d12cbeba8448e66e96c00041 Mon Sep 17 00:00:00 2001 From: Jeremy Roman Date: Tue, 7 Jan 2025 16:50:41 -0500 Subject: [PATCH 09/19] Editorial: Fix broken references to make the build work again --- spec.bs | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/spec.bs b/spec.bs index bf65012..efb6e9b 100644 --- a/spec.bs +++ b/spec.bs @@ -25,7 +25,7 @@ spec: URL; urlPrefix: https://url.spec.whatwg.org/ type: dfn text: serialize an integer; url: #serialize-an-integer text: host serializer; url: #concept-host-serializer -spec: RFC8941; urlPrefix: https://httpwg.org/specs/rfc8941.html +spec: RFC9651; urlPrefix: https://httpwg.org/specs/rfc9651.html type: dfn text: structured header; url: top for: structured header @@ -2027,7 +2027,7 @@ To accomplish this, specifications should accept {{URLPatternCompatible}} as an 1. Return the result of [=creating=] a URL pattern given |input|, the [=URL serializer|serialization=] of |baseURL|, and an empty [=map=].

-This allows authors to concisely specify most patterns, and use the constructor to access uncommon options if necessary. The implicit use of the base URL is similar to, and consistent with, HTML's [=parse a URL=] algorithm. [[HTML]] +This allows authors to concisely specify most patterns, and use the constructor to access uncommon options if necessary. The implicit use of the base URL is similar to, and consistent with, HTML's [=parse a URL=] algorithm. [[HTML]]

Integrating with JSON data formats

@@ -2066,9 +2066,9 @@ Specifications may wish to leave room in their formats to accept options for {{U

Integrating with HTTP header fields

-HTTP headers which include URL patterns should accept a string in the [[#constructor-string-parsing|constructor string syntax]], likely as part of a structured field [[RFC8941]]. +HTTP headers which include URL patterns should accept a string in the [[#constructor-string-parsing|constructor string syntax]], likely as part of a structured field [[RFC9651]]. -
No known header accepts the dictionary syntax for URL patterns. If that changes, this specification will be updated to define it, likely by processing [[RFC8941]] inner lists.
+
No known header accepts the dictionary syntax for URL patterns. If that changes, this specification will be updated to define it, likely by processing [[RFC9651]] inner lists.
Specifications for HTTP headers should operate on [=URL patterns=] (e.g., using the [=URL pattern/match=] algorithm) rather than {{URLPattern}} objects (which imply the existence of a JavaScript [=ECMAScript/realm=]). From 1fa3d21f5e1332599f002d4f1e8df0cf4921640e Mon Sep 17 00:00:00 2001 From: Yagiz Nizipli Date: Tue, 7 Jan 2025 16:54:03 -0500 Subject: [PATCH 10/19] Correct condition for opaque paths in base URL This PR fixes pathname processing for inputs that have opaque pathnames. (Amended by editor to apply to the correct draft.) --- review-drafts/2024-03.bs | 2 +- spec.bs | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/review-drafts/2024-03.bs b/review-drafts/2024-03.bs index d4bcb9a..6eaadeb 100644 --- a/review-drafts/2024-03.bs +++ b/review-drafts/2024-03.bs @@ -1868,7 +1868,7 @@ To convert a modifier to a string given a [=part/modifier=] |modifier 1. If the following are all true:
  • |baseURL| is not null;
  • -
  • |baseURL| does not have an [=url/opaque path=]; and
  • +
  • |baseURL| has an [=url/opaque path=]; and
  • the result of running [=is an absolute pathname=] given |result|["{{URLPatternInit/pathname}}"] and |type| is false,

then: diff --git a/spec.bs b/spec.bs index efb6e9b..ed5e405 100644 --- a/spec.bs +++ b/spec.bs @@ -1883,7 +1883,7 @@ To convert a modifier to a string given a [=part/modifier=] |modifier 1. If the following are all true:

  • |baseURL| is not null;
  • -
  • |baseURL| has an [=url/opaque path=]; and
  • +
  • |baseURL| does not have an [=url/opaque path=]; and
  • the result of running [=is an absolute pathname=] given |result|["{{URLPatternInit/pathname}}"] and |type| is false,

then: From 20ca299e843e608cbdbf70459c570f7df85d0121 Mon Sep 17 00:00:00 2001 From: Jeremy Roman Date: Tue, 7 Jan 2025 17:12:28 -0500 Subject: [PATCH 11/19] Meta: Force a build/deploy Reverting the review draft change caused a commit that the deploy script doesn't like. This one doesn't change that file so should succeed. From 2e38014692b5e714eb835c5f7ce0100d5b6dcd76 Mon Sep 17 00:00:00 2001 From: Shunya Shishido Date: Thu, 9 Jan 2025 10:08:08 +0900 Subject: [PATCH 12/19] Editorial: Fix RFC2119 keyword warnings (#247) --- spec.bs | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/spec.bs b/spec.bs index ed5e405..17c03e9 100644 --- a/spec.bs +++ b/spec.bs @@ -134,8 +134,8 @@ It can be constructed using a string for each component, or from a [[#constructo This is a more complicated pattern which includes: - * [=part/modifier/optional=] parts marked with `?` (braces are needed to make it unambiguous exactly what is optional), and - * a [=part/type/regexp=] part named "`id`" which uses a regular expression to define what sorts of substrings match (the parentheses are required to mark it as a regular expression, and are not part of the regexp itself). + * optional parts marked with `?` (braces are needed to make it unambiguous exactly what is optional), and + * a [=part/type/regexp=] part named "`id`" which uses a regular expression to define what sorts of substrings match (the parentheses are necessary to mark it as a regular expression, and are not part of the regexp itself).

@@ -2080,7 +2080,7 @@ Specifications for HTTP headers should operate on [=URL patterns=] (e.g., using 1. Return the result of [=creating=] a URL pattern given |rawPattern|, |serializedBaseURL|, and an empty [=map=].
-
Specifications might consider accepting only patterns which do not [=URL pattern/has regexp groups|have regexp groups=] if evaluating the pattern, since the performance of such patterns might be more reliable, and may not require a [[ECMA-262]] regular expression implementation, which may have security, code size, or other implications for implementations. On the other hand, JavaScript APIs run in environments where such an implementation is readily available.
+
Specifications might consider accepting only patterns which do not [=URL pattern/has regexp groups|have regexp groups=] if evaluating the pattern, since the performance of such patterns might be more reliable, and might not require a [[ECMA-262]] regular expression implementation, which might have security, code size, or other implications for implementations. On the other hand, JavaScript APIs run in environments where such an implementation is readily available.

Acknowledgments

From 78036b00c0155a6289af98fb60fae16a6717cefd Mon Sep 17 00:00:00 2001 From: Jeremy Roman Date: Thu, 9 Jan 2025 10:39:08 -0500 Subject: [PATCH 13/19] Use the basic URL parser when parsing URLs Use the basic URL parser when parsing URLs Blob handling is not required, since the resulting URL is not stored in any way that would make the blob handling visible. This is consistent with what the URL constructor and similar uses do. This change should not be observable. Partially addresses #242. --- spec.bs | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/spec.bs b/spec.bs index 17c03e9..876f065 100644 --- a/spec.bs +++ b/spec.bs @@ -513,10 +513,10 @@ A component is a [=struct=] with the following [=struct/items=]: 1. If |input| is a {{USVString}}: 1. Let |baseURL| be null. 1. If |baseURLString| was given, then: - 1. Set |baseURL| to the result of [=URL parser|parsing=] |baseURLString|. + 1. Set |baseURL| to the result of running the [=basic URL parser=] on |baseURLString|. 1. If |baseURL| is failure, return null. 1. [=list/Append=] |baseURLString| to |inputs|. - 1. Set |url| to the result of [=URL parser|parsing=] |input| given |baseURL|. + 1. Set |url| to the result of running the [=basic URL parser=] on |input| with |baseURL|. 1. If |url| is failure, return null. 1. [=Assert=]: |url| is a [=/URL=]. 1. Set |protocol| to |url|'s [=url/scheme=]. @@ -1852,7 +1852,7 @@ To convert a modifier to a string given a [=part/modifier=] |modifier Username and password are also never inherited from a base URL when constructing a {{URLPattern}}. (They are, however, inherited from the base URL when parsing a URL supplied as an argument to {{URLPattern/test()}} or {{URLPattern/exec()}}.) - 1. Set |baseURL| to the result of [=URL parser|parsing=] |init|["{{URLPatternInit/baseURL}}"]. + 1. Set |baseURL| to the result of running the [=basic URL parser=] on |init|["{{URLPatternInit/baseURL}}"]. 1. If |baseURL| is failure, then throw a {{TypeError}}. 1. If |init|["{{URLPatternInit/protocol}}"] does not [=map/exist=], then set |result|["{{URLPatternInit/protocol}}"] to the result of [=processing a base URL string=] given |baseURL|'s [=url/scheme=] and |type|. 1. If |type| is not "`pattern`" and |init| [=map/contains=] none of "{{URLPatternInit/protocol}}", "{{URLPatternInit/hostname}}", "{{URLPatternInit/port}}" and "{{URLPatternInit/username}}", then set |result|["{{URLPatternInit/username}}"] to the result of [=processing a base URL string=] given |baseURL|'s [=url/username=] and |type|. From c934c6b6a2aac376c5d59105caf6f44b873e68d1 Mon Sep 17 00:00:00 2001 From: Jeremy Roman Date: Thu, 9 Jan 2025 10:46:39 -0500 Subject: [PATCH 14/19] Editorial: Correct "set" to "let" Editorial: Correct "set" to "let" These uses introduce the variable, so must be "set". --- spec.bs | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/spec.bs b/spec.bs index 876f065..b210a5d 100644 --- a/spec.bs +++ b/spec.bs @@ -1324,12 +1324,12 @@ To parse a pattern string given a [=/pattern string=] |input|, [=/opt 1. If |open token| is not null: - 1. Set |prefix| be the result of running [=consume text=] given |parser|. + 1. Let |prefix| be the result of running [=consume text=] given |parser|. 1. Set |name token| to the result of running [=try to consume a token=] given |parser| and "`name`". 1. Set |regexp or wildcard token| to the result of running [=try to consume a regexp or wildcard token=] given |parser| and |name token|. 1. Let |suffix| be the result of running [=consume text=] given |parser|. 1. Run [=consume a required token=] given |parser| and "`close`". - 1. Set |modifier token| to the result of running [=try to consume a modifier token=] given |parser|. + 1. Let |modifier token| be the result of running [=try to consume a modifier token=] given |parser|. 1. Run [=add a part=] given |parser|, |prefix|, |name token|, |regexp or wildcard token|, |suffix|, and |modifier token|. 1. [=Continue=]. 1. Run [=maybe add a part from the pending fixed value=] given |parser|. From cc87ea92225079191869cb39cd6a6c5c6c25a6e6 Mon Sep 17 00:00:00 2001 From: Jeremy Roman Date: Thu, 9 Jan 2025 10:47:11 -0500 Subject: [PATCH 15/19] Serialize base URL's host when a string is required If the URL's host is an IP address, the string representation is required. This corrects/clarifies existing behavior. --- spec.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec.bs b/spec.bs index b210a5d..ff04d35 100644 --- a/spec.bs +++ b/spec.bs @@ -1858,7 +1858,7 @@ To convert a modifier to a string given a [=part/modifier=] |modifier 1. If |type| is not "`pattern`" and |init| [=map/contains=] none of "{{URLPatternInit/protocol}}", "{{URLPatternInit/hostname}}", "{{URLPatternInit/port}}" and "{{URLPatternInit/username}}", then set |result|["{{URLPatternInit/username}}"] to the result of [=processing a base URL string=] given |baseURL|'s [=url/username=] and |type|. 1. If |type| is not "`pattern`" and |init| [=map/contains=] none of "{{URLPatternInit/protocol}}", "{{URLPatternInit/hostname}}", "{{URLPatternInit/port}}", "{{URLPatternInit/username}}" and "{{URLPatternInit/password}}", then set |result|["{{URLPatternInit/password}}"] to the result of [=processing a base URL string=] given |baseURL|'s [=url/password=] and |type|. 1. If |init| [=map/contains=] neither "{{URLPatternInit/protocol}}" nor "{{URLPatternInit/hostname}}", then: - 1. Let |baseHost| be |baseURL|'s [=url/host=]. + 1. Let |baseHost| be the [=host serializer|serialization=] of |baseURL|'s [=url/host=]. 1. If |baseHost| is null, then set |baseHost| to the empty string. 1. Set |result|["{{URLPatternInit/hostname}}"] to the result of [=processing a base URL string=] given |baseHost| and |type|. 1. If |init| [=map/contains=] none of "{{URLPatternInit/protocol}}", "{{URLPatternInit/hostname}}", and "{{URLPatternInit/port}}", then: From 1d3ab52dfd8afe4fc73723d34026b2fd47a343ce Mon Sep 17 00:00:00 2001 From: Jeremy Roman Date: Wed, 15 Jan 2025 11:16:18 -0500 Subject: [PATCH 16/19] Correct null handling when computing base URL host string --- spec.bs | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/spec.bs b/spec.bs index ff04d35..ef3b36e 100644 --- a/spec.bs +++ b/spec.bs @@ -1858,8 +1858,8 @@ To convert a modifier to a string given a [=part/modifier=] |modifier 1. If |type| is not "`pattern`" and |init| [=map/contains=] none of "{{URLPatternInit/protocol}}", "{{URLPatternInit/hostname}}", "{{URLPatternInit/port}}" and "{{URLPatternInit/username}}", then set |result|["{{URLPatternInit/username}}"] to the result of [=processing a base URL string=] given |baseURL|'s [=url/username=] and |type|. 1. If |type| is not "`pattern`" and |init| [=map/contains=] none of "{{URLPatternInit/protocol}}", "{{URLPatternInit/hostname}}", "{{URLPatternInit/port}}", "{{URLPatternInit/username}}" and "{{URLPatternInit/password}}", then set |result|["{{URLPatternInit/password}}"] to the result of [=processing a base URL string=] given |baseURL|'s [=url/password=] and |type|. 1. If |init| [=map/contains=] neither "{{URLPatternInit/protocol}}" nor "{{URLPatternInit/hostname}}", then: - 1. Let |baseHost| be the [=host serializer|serialization=] of |baseURL|'s [=url/host=]. - 1. If |baseHost| is null, then set |baseHost| to the empty string. + 1. Let |baseHost| be the empty string. + 1. If |baseURL|'s [=url/host=] is not null, then set |baseHost| to its [=host serializer|serialization=]. 1. Set |result|["{{URLPatternInit/hostname}}"] to the result of [=processing a base URL string=] given |baseHost| and |type|. 1. If |init| [=map/contains=] none of "{{URLPatternInit/protocol}}", "{{URLPatternInit/hostname}}", and "{{URLPatternInit/port}}", then: 1. If |baseURL|'s [=url/port=] is null, then set |result|["{{URLPatternInit/port}}"] to the empty string. From 1c2d99f580537ff3a35a9215337518186ca22721 Mon Sep 17 00:00:00 2001 From: Jeremy Roman Date: Thu, 16 Jan 2025 10:50:47 -0500 Subject: [PATCH 17/19] Use WHATWG Infra ASCII code point definition --- spec.bs | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/spec.bs b/spec.bs index ef3b36e..36769c5 100644 --- a/spec.bs +++ b/spec.bs @@ -1076,7 +1076,7 @@ A [=tokenizer=] has an associated code point, a Unicode 1. Let |error| be false. 1. While |regexp position| is less than |tokenizer|'s [=tokenizer/input=]'s [=string/code point length=]: 1. Run [=seek and get the next code point=] given |tokenizer| and |regexp position|. - 1. If the result of running [=is ASCII=] given |tokenizer|'s [=tokenizer/code point=] is false: + 1. If |tokenizer|'s [=tokenizer/code point=] is not an [=ASCII code point=]: 1. Run [=process a tokenizing error=] given |tokenizer|, |regexp start|, and |tokenizer|'s [=tokenizer/index=]. 1. Set |error| to true. 1. [=Break=]. @@ -1090,7 +1090,7 @@ A [=tokenizer=] has an associated code point, a Unicode 1. Set |error| to true. 1. [=Break=] 1. Run [=get the next code point=] given |tokenizer|. - 1. If the result of running [=is ASCII=] given |tokenizer|'s [=tokenizer/code point=] is false: + 1. If |tokenizer|'s [=tokenizer/code point=] is not an [=ASCII code point=]: 1. Run [=process a tokenizing error=] given |tokenizer|, |regexp start|, and |tokenizer|'s [=tokenizer/index=]. 1. Set |error| to true. 1. [=Break=]. @@ -1183,13 +1183,6 @@ A [=tokenizer=] has an associated code point, a Unicode 1. Otherwise return the result of checking if |code point| is contained in the [=IdentifierPart=] set of code points. -
- To determine if a Unicode |code point| is ASCII: - - 1. If |code point| is between U+0000 and U+007F inclusive, then return true. - 1. Otherwise return false. -
-

Parts

A part list is a [=list=] of zero or more [=parts=]. From 9dae7928469ff29226333dd6f9d1a876bad43165 Mon Sep 17 00:00:00 2001 From: Yves-Marie Date: Wed, 29 Jan 2025 20:58:29 +0100 Subject: [PATCH 18/19] markup fixes --- spec.bs | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/spec.bs b/spec.bs index 36769c5..6dc69b2 100644 --- a/spec.bs +++ b/spec.bs @@ -1433,7 +1433,7 @@ To add a part given a [=pattern parser=] |parser|, a string |prefix|, 1. Set |type| to "`full-wildcard`". 1. Set |regexp value| to the empty string. 1. Let |name| be the empty string. -

Next, we determine the [=part=] [=part/name=]. This can be explicitly provided by a "`name`" [=token=] or be automatically assigned. +

Next, we determine the [=part=] [=part/name=]. This can be explicitly provided by a "`name`" [=token=] or be automatically assigned.

1. If |name token| is not null, then set |name| to |name token|'s [=token/value=]. 1. Otherwise if |regexp or wildcard token| is not null: 1. Set |name| to |parser|'s [=pattern parser/next numeric name=], [=serialize an integer|serialized=]. @@ -1779,7 +1779,7 @@ To convert a modifier to a string given a [=part/modifier=] |modifier 1. Append |value| to the end of |modified value|. 1. Let |dummyURL| be a new [=URL record=]. - 1. Let |parseResult| be the result of running [=basic URL parser=] given |modified value| with |dummyURL| as [=basic URL parser/url=] and [=path start state=] as [=basic URL parser/state override=]. + 1. Let |parseResult| be the result of running [=basic URL parser=] given |modified value| with |dummyURL| as [=basic URL parser/url=] and [=path start state=] as [=basic URL parser/state override=]. 1. If |parseResult| is failure, then throw a {{TypeError}}. 1. Let |result| be the result of [=URL path serializing=] |dummyURL|. 1. If |leading slash| is false, then set |result| to the [=code point substring to the end of the string|code point substring=] from 2 to the end of the string within |result|. From d0a4f62f22529f54288b37881108b21b4f453c77 Mon Sep 17 00:00:00 2001 From: Yves-Marie Date: Thu, 30 Jan 2025 10:28:13 +0100 Subject: [PATCH 19/19] Add proper canonicalization of domain names The current specs only handle host canonicalization for IP addresses and opaque hostnames but if an URL has a special scheme its host should be normalized as a domain name (IDNA processing). Cf. #220 --- spec.bs | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/spec.bs b/spec.bs index 6dc69b2..45399e8 100644 --- a/spec.bs +++ b/spec.bs @@ -472,6 +472,7 @@ A component is a [=struct=] with the following [=struct/items=]: 1. Set |urlPattern|'s [=URL pattern/username component=] to the result of [=compiling a component=] given |processedInit|["{{URLPatternInit/username}}"], [=canonicalize a username=], and [=default options=]. 1. Set |urlPattern|'s [=URL pattern/password component=] to the result of [=compiling a component=] given |processedInit|["{{URLPatternInit/password}}"], [=canonicalize a password=], and [=default options=]. 1. If the result running [=hostname pattern is an IPv6 address=] given |processedInit|["{{URLPatternInit/hostname}}"] is true, then set |urlPattern|'s [=URL pattern/hostname component=] to the result of [=compiling a component=] given |processedInit|["{{URLPatternInit/hostname}}"], [=canonicalize an IPv6 hostname=], and [=hostname options=]. + 1. Otherwise, if the result of running [=protocol component matches a special scheme=] given |urlPattern|'s [=URL pattern/protocol component=] is true, or |urlPattern|'s [=URL pattern/protocol component=]'s [=component/pattern string=] is "`*`", then set |urlPattern|'s [=URL pattern/hostname component=] to the result of [=compiling a component=] given |processedInit|["{{URLPatternInit/hostname}}"], [=canonicalize a domain name=], and [=hostname options=]. 1. Otherwise, set |urlPattern|'s [=URL pattern/hostname component=] to the result of [=compiling a component=] given |processedInit|["{{URLPatternInit/hostname}}"], [=canonicalize a hostname=], and [=hostname options=]. 1. Set |urlPattern|'s [=URL pattern/port component=] to the result of [=compiling a component=] given |processedInit|["{{URLPatternInit/port}}"], [=canonicalize a port=], and [=default options=]. 1. Let |compileOptions| be a copy of the [=default options=] with the [=options/ignore case=] property set to |options|["{{URLPatternOptions/ignoreCase}}"]. @@ -1729,15 +1730,23 @@ To convert a modifier to a string given a [=part/modifier=] |modifier
- To canonicalize a hostname given a string |value|: + To canonicalize a hostname given a string |value| and optionally a string |protocolValue|: 1. If |value| is the empty string, return |value|. 1. Let |dummyURL| be a new [=URL record=]. + 1. If |protocolValue| was given, then set |dummyURL|'s [=url/scheme=] to |protocolValue|. +

We set the [=URL record=]'s [=url/scheme=] in order for the [=basic URL parser=] to recognize and normalize non-opaque hostname values.

1. Let |parseResult| be the result of running the [=basic URL parser=] given |value| with |dummyURL| as [=basic URL parser/url=] and [=hostname state=] as [=basic URL parser/state override=]. 1. If |parseResult| is failure, then throw a {{TypeError}}. 1. Return |dummyURL|'s [=url/host=], [=host serializer|serialized=], or empty string if it is null.
+
+ To canonicalize a domain name given a string |value|: + + 1. Return the result of running [=canonicalize a hostname=] given |value| and "`https`". +
+
To canonicalize an IPv6 hostname given a string |value|: @@ -1869,7 +1878,7 @@ To convert a modifier to a string given a [=part/modifier=] |modifier 1. If |init|["{{URLPatternInit/protocol}}"] [=map/exists=], then set |result|["{{URLPatternInit/protocol}}"] to the result of [=process protocol for init=] given |init|["{{URLPatternInit/protocol}}"] and |type|. 1. If |init|["{{URLPatternInit/username}}"] [=map/exists=], then set |result|["{{URLPatternInit/username}}"] to the result of [=process username for init=] given |init|["{{URLPatternInit/username}}"] and |type|. 1. If |init|["{{URLPatternInit/password}}"] [=map/exists=], then set |result|["{{URLPatternInit/password}}"] to the result of [=process password for init=] given |init|["{{URLPatternInit/password}}"] and |type|. - 1. If |init|["{{URLPatternInit/hostname}}"] [=map/exists=], then set |result|["{{URLPatternInit/hostname}}"] to the result of [=process hostname for init=] given |init|["{{URLPatternInit/hostname}}"] and |type|. + 1. If |init|["{{URLPatternInit/hostname}}"] [=map/exists=], then set |result|["{{URLPatternInit/hostname}}"] to the result of [=process hostname for init=] given |init|["{{URLPatternInit/hostname}}"], |result|["{{URLPatternInit/protocol}}"], and |type|. 1. If |init|["{{URLPatternInit/port}}"] [=map/exists=], then set |result|["{{URLPatternInit/port}}"] to the result of [=process port for init=] given |init|["{{URLPatternInit/port}}"], |result|["{{URLPatternInit/protocol}}"], and |type|. 1. If |init|["{{URLPatternInit/pathname}}"] [=map/exists=]: 1. Set |result|["{{URLPatternInit/pathname}}"] to |init|["{{URLPatternInit/pathname}}"]. @@ -1935,10 +1944,12 @@ To convert a modifier to a string given a [=part/modifier=] |modifier
- To process hostname for init given a string |value| and a string |type|: + To process hostname for init given a string |hostnameValue|, a string |protocolValue|, and a string |type|: - 1. If |type| is "`pattern`" then return |value|. - 1. Return the result of running [=canonicalize a hostname=] given |value|. + 1. If |type| is "`pattern`" then return |hostnameValue|. + 1. If |protocolValue| is a [=special scheme=] or the empty string, then return the result of running [=canonicalize a domain name=] given |hostnameValue|. +

If the |protocolValue| is the empty string then no value was provided for {{URLPatternInit/protocol}} in the constructor dictionary. Normally we do not special case empty string dictionary values, but in this case we treat it as a [=special scheme=] in order to default to the most common hostname canonicalization.

+ 1. Return the result of running [=canonicalize a hostname=] given |hostnameValue|.
@@ -2113,8 +2124,9 @@ Ralph Chelala, Sangwhan Moon, Sayan Pal, Victor Costan, -Yoshisato Yanagisawa, and -Youenn Fablet +Yoshisato Yanagisawa, +Youenn Fablet, and +Yves-Marie K. Rinquin for their contributors to this specification. Special thanks to Blake Embrey and the other [pillarjs/path-to-regexp](https://github.com/pillarjs/path-to-regexp) [contributors](https://github.com/pillarjs/path-to-regexp/graphs/contributors) for building an excellent open source library that so many have found useful.