-
Notifications
You must be signed in to change notification settings - Fork 3.2k
HTML API: Auto-escape JavaScript and JSON script tag contents when necessary #10635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML API: Auto-escape JavaScript and JSON script tag contents when necessary #10635
Conversation
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
Co-authored-by: Weston Ruter <westonruter@gmail.com>
|
@sirreal I updated the in the process I accidentally pushed out the build-change-revert that I’ve been using to test and develop. since I can’t force-push to your branch we’ll just have to remember to
|
18b3ca2 to
3f1ba32
Compare
| ); | ||
|
|
||
| $processor->set_modifiable_text( "\n{$importmap}\n" ); | ||
| $decoded_importmap = json_decode( $processor->get_modifiable_text(), true ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be handles post-merge or now, but do we really want to assert identical serialization?
with the check below for equality of the input and decoded values, why do we care if the escaping was different?
on the other hand, if we are wanting to assert specific kinds of escapes, maybe they get dedicated unit tests with clear assertions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's sort it out in follow-up work.
Are you referring to the assertEqualHTML below on the HTML serialization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no I’m talking about JSON-serialization. I don’t really care if they are different but decode identically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what exactly you're referring to. If you help me to understand, I'm happy to discuss and consider changes.
dmsnell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mon-u-mental effort @sirreal.
I left a comment and think that the tests/phpunit/tests/html-api/wpHtmlTagProcessorModifiableText.php file is a bit…distracted…but it’s not a blocker IMO. Would be nice to follow-up with that but some of the other existing HTML API tests are also a bit chaotic for practical reasons.
This is so exciting I can hardly stand it. Let’s get it in and testing at large!
| * @param string $sourcecode Raw contents intended to be serialized into an HTML SCRIPT element. | ||
| * @return string Escaped form of input contents which will not lead to premature closing of the containing SCRIPT element. | ||
| */ | ||
| public static function escape_javascript_script_contents( string $sourcecode ): string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was made public in bfa60cf.
@westonruter asked about this new public method in #10639 (comment), and it was something I had planned to review.
I'm going to make it private for now and we can consider opening it up in follow-up work.
| @@ -0,0 +1,33 @@ | |||
| // https://html.spec.whatwg.org/multipage/parsing.html#script-data-state | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had simplified this graph to produce the ASCII chart that's including in this PR. I used an online graphviz ASCII converter and it rejects some of the syntax in this version. I don't know whether it's a problem with the converter I used or if this ASCII output mode for graphviz does not support some of the features used here.
I want to make sure this is able to produce an ASCII version of the chart. We can revisit in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nothing should reject the syntax. do you know which parts it rejects? in unsupported output formats we should hope that it simply ignores what it can’t support.
perhaps we can find another converter. graphviz has a native one.
When setting JavaScript or JSON script tag content, automatically escape sequences like `<script>` and `</script>`. This renders the content safe for HTML. The semantics of any JSON and virtually any JavaScript are preserved. Script type detection follows the HTML standard for identifying JavaScript and JSON script tags. Other script types continue to reject potentially dangerous content. Developed in #10635. Props jonsurrell, dmsnell, westonruter. Fixes #64419. See #63851, #51159. git-svn-id: https://develop.svn.wordpress.org/trunk@61477 602fd350-edb4-49c9-b593-d223f7449a82
When setting JavaScript or JSON script tag content, automatically escape sequences like `<script>` and `</script>`. This renders the content safe for HTML. The semantics of any JSON and virtually any JavaScript are preserved. Script type detection follows the HTML standard for identifying JavaScript and JSON script tags. Other script types continue to reject potentially dangerous content. Developed in WordPress/wordpress-develop#10635. Props jonsurrell, dmsnell, westonruter. Fixes #64419. See #63851, #51159. Built from https://develop.svn.wordpress.org/trunk@61477 git-svn-id: http://core.svn.wordpress.org/trunk@60789 1a063a9b-81f0-0310-95a4-ce76da25c4cd
When setting JavaScript or JSON script tag content, automatically escape sequences like `<script>` and `</script>`. This renders the content safe for HTML. The semantics of any JSON and virtually any JavaScript are preserved. Script type detection follows the HTML standard for identifying JavaScript and JSON script tags. Other script types continue to reject potentially dangerous content. Developed in WordPress/wordpress-develop#10635. Props jonsurrell, dmsnell, westonruter. Fixes #64419. See #63851, #51159. Built from https://develop.svn.wordpress.org/trunk@61477 git-svn-id: https://core.svn.wordpress.org/trunk@60789 1a063a9b-81f0-0310-95a4-ce76da25c4cd


The HTML API currently rejects script tag contents that may be dangerous. This is a proposal to detect JavaScript and JSON script tags and automatically escape contents when necessary.
<scriptor</script(case-insensitive) is found.In JSON
The JavaScript escaping strategy is also applicable to JSON and produces more minimal, readable, and HTML-safe output. The JavaScript escaping is applied to JSON content as well.
Outdated description
<is replaced with\u003C. This eliminates the problematic strings and aligns with the approach described in #63851 and applied in r60681.This is proposed as a simple character replacement with
strtr. This should be highly performant. A less invasive replacement could be done to only replace<in<scriptor</scriptwhere it's really necessary. This would preserve more of the JSON string, but likely at the cost of performance. It would require either a regular expression with case-insensitive matching (see JavaScript example).In JavaScript
<scriptand</script(followed by a necessary tag termination character\t\n\r\f/>) thesis replaced with its Unicode escape. This should remain valid in all contexts where the text can appear and maintain identical behavior in all except a few edge cases (see ticket or quoted section below for full explanation and caveats).From the ticket:
Trac ticket: https://core.trac.wordpress.org/ticket/64419
This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.