Conversation
|
The rendered spec for this PR is available as a single page at https://tc39.es/ecma262/pr/3714 and as multiple pages at https://tc39.es/ecma262/pr/3714/multipage . |
| 1. Let _internalSlotsList_ be « [[IsRawJSON]] ». | ||
| 1. Let _obj_ be OrdinaryObjectCreate(*null*, _internalSlotsList_). |
There was a problem hiding this comment.
Any reason for not inlining _internalSlotsList_, like we do in most OrdinaryObjectCreate calls where we pass extra slots?
There was a problem hiding this comment.
We've got examples of both, and I personally like having the name as an explicit hint of semantics for an otherwise opaque list of slot references, but am willing to inline it if that's the consensus preference among editors.
There was a problem hiding this comment.
I would inline it, but no strong preference.
There was a problem hiding this comment.
I'm also in favor of inlining it, the few instances where we don't do that (e.g. OrdinaryFunctionCreate, CreateIteratorFromClosure) are a bit different.
…the source text of primitive values tc39/ecma262#3714
d7925aa to
477c794
Compare
| <emu-clause id="sec-createjsonparserecord" type="abstract operation"> | ||
| <h1> | ||
| CreateJSONParseRecord ( | ||
| _parseNode_: a Parse Node, |
There was a problem hiding this comment.
This isn't just any Parse Node, it's one of a handful of kinds of nodes. Let's list them explicitly here.
There was a problem hiding this comment.
JSON.parse calls CreateJSONParseRecord with the Parse Node returned from ParseJSON, which (due to arbitrary decisions about the definition of that algorithm) is a |Script|. And for similar arbitrary decisions about how the grammar is structured, recursive calls provide an |AssignmentExpression|. Neither of those have any obvious connection with JSON parsing, and baking them in risks introducing a spec bug if some future refactoring changes those arbitrary decisions (e.g., if ParseJSON is updated to use goal symbol |ParenthesizedExpression| or |Expression| or |AssignmentExpression|). So I'd rather keep CreateJSONParseRecord as-is and rely upon ShallowestContainedJSONValue to ensure that consistency.
spec.html
Outdated
| 1. If _candidate_ is an instance of _type_, then | ||
| 1. NOTE: In the JSON grammar, a <code>number</code> token may represent a negative value. In ECMAScript, negation is represented as a unary operation. | ||
| 1. If _type_ is |UnaryExpression|, then | ||
| 1. Set _unaryExpression_ to _candidate_. |
There was a problem hiding this comment.
Why are we using this alias instead of just returning _candidate_ directly? In fact, _types_ need not even contain NumericLiteral at all. Or am I mistaken?
There was a problem hiding this comment.
All literal nonterminals appear deeply wrapped inside a |UnaryExpression| (including non-primitive |ArrayLiteral| and |ObjectLiteral|), but only |NumericLiteral|s might need to be associated with a terminal symbol from that level (the leading - for negating an unsigned number). So while you're not exactly mistaken, a refactoring of ShallowestContainedJSONValue to return |UnaryExpression| upon encountering one would result in it only returning such nodes and distributing the discovery of the actually relevant literal nodes elsewhere, which I think would be a much worse way to specify things.
Consider the full AST for JSON.parse("[-1]"), bolding each |UnaryExpression|:
Details
([-1]);- Script
- ScriptBody
- StatementList
- StatementListItem
- Statement
- ExpressionStatement
- Expression
;
([-1])- AssignmentExpression
- ConditionalExpression
- ShortCircuitExpression
- LogicalORExpression
- LogicalANDExpression
- BitwiseORExpression
- BitwiseXORExpression
- BitwiseANDExpression
- EqualityExpression
- RelationalExpression
- ShiftExpression
- AdditiveExpression
- MultiplicativeExpression
- ExponentiationExpression
- UnaryExpression
- UpdateExpression
- LeftHandSideExpression
- NewExpression
- MemberExpression
- PrimaryExpression
- CoverParenthesizedExpressionAndArrowParameterList → ParenthesizedExpression
(Expression)
[-1]- AssignmentExpression
- ConditionalExpression
- ShortCircuitExpression
- LogicalORExpression
- LogicalANDExpression
- BitwiseORExpression
- BitwiseXORExpression
- BitwiseANDExpression
- EqualityExpression
- RelationalExpression
- ShiftExpression
- AdditiveExpression
- MultiplicativeExpression
- ExponentiationExpression
- UnaryExpression
- UpdateExpression
- LeftHandSideExpression
- NewExpression
- MemberExpression
- PrimaryExpression
- ArrayLiteral
[ElementList]
-1ElisionoptAssignmentExpression- ConditionalExpression
- ShortCircuitExpression
- LogicalORExpression
- LogicalANDExpression
- BitwiseORExpression
- BitwiseXORExpression
- BitwiseANDExpression
- EqualityExpression
- RelationalExpression
- ShiftExpression
- AdditiveExpression
- MultiplicativeExpression
- ExponentiationExpression
- UnaryExpression
-UnaryExpression
1- UpdateExpression
- LeftHandSideExpression
- NewExpression
- MemberExpression
- PrimaryExpression
- Literal
- NumericLiteral
In fact, that exercise highlighted a mistake on my part in ShallowestContainedJSONValue that I just pushed a fix for (in |UnaryExpression| → - |UnaryExpression|, we must keep holding the outer Parse Node rather than the inner one).
spec.html
Outdated
| 1. Assert: The first code unit of _jsonString_ is an ASCII lowercase letter code unit (0x0061 through 0x007A, inclusive), an ASCII digit code unit (0x0030 through 0x0039, inclusive), 0x0022 (QUOTATION MARK), or 0x002D (HYPHEN-MINUS). | ||
| 1. Assert: The last code unit of _jsonString_ is an ASCII lowercase letter code unit (0x0061 through 0x007A, inclusive), an ASCII digit code unit (0x0030 through 0x0039, inclusive), or 0x0022 (QUOTATION MARK). |
There was a problem hiding this comment.
Why does the reader care about this?
There was a problem hiding this comment.
It's relevant to the constraint on jsonString that it must be the encoding of a JSON primitive.
There was a problem hiding this comment.
Okay it's true but why does a reader care? What does this buy, for example, an implementer? Are they potentially unaware of this invariant and it would permit a valuable optimisation?
There was a problem hiding this comment.
Okay it's true but why does a reader care? What does this buy, for example, an implementer? Are they potentially unaware of this invariant and it would permit a valuable optimisation?
It's actually the result of intentional step ordering that suggests such an optimization. The algorithm would be simpler with checks for validity rather than a combination of checks for invalidity and assertions of validity, e.g.
- Let jsonString be ? ToString(text).
- Let parseResult be ? ParseJSON(jsonString).
- Assert: jsonString is not the empty String.
- If the first code unit of jsonString is not either an ASCII lowercase letter code unit (0x0061 through 0x007A, inclusive), an ASCII digit code unit (0x0030 through 0x0039, inclusive), 0x0022 (QUOTATION MARK), or 0x002D (HYPHEN-MINUS), throw a SyntaxError exception.
- If the last code unit of jsonString is not either an ASCII lowercase letter code unit (0x0061 through 0x007A, inclusive), an ASCII digit code unit (0x0030 through 0x0039, inclusive), or 0x0022 (QUOTATION MARK), throw a SyntaxError exception.
- Let value be parseResult.[[Value]].
- Assert: value is either a String, a Number, a Boolean, or null.
But I didn't take that approach because it's more efficient to skip ParseJSON when either the first or last character indicates the need for a SyntaxError without parsing.
There was a problem hiding this comment.
Yeah I'm just still not convinced that, even though this invariant is guaranteed by the earlier steps and control flow, we need to alert the reader of that at this point. What does the reader do with that information? We don't even have any further steps that look at the contents of jsonString.
There was a problem hiding this comment.
What does the reader do with that information? We don't even have any further steps that look at the contents of
jsonString.
If the reader is implementing this algorithm, they decide whether or not to include the assertion as a safeguard of spec invariants (e.g., against accidentally permitting a primitive value to have leading or trailing non-ASCII whitespace). Otherwise, they digest it and move on, just like with the preceding assertion that value is a String, a Number, a Boolean, or null.
Alternatively, these assertions could be converted into checks and moved ahead of ParseJSON:
- Let jsonString be ? ToString(text).
- If jsonString is the empty String, throw a SyntaxError exception.
- If the first code unit of jsonString is not either an ASCII lowercase letter code unit (0x0061 through 0x007A, inclusive), an ASCII digit code unit (0x0030 through 0x0039, inclusive), 0x0022 (QUOTATION MARK), or 0x002D (HYPHEN-MINUS), throw a SyntaxError exception.
- If the last code unit of jsonString is not either an ASCII lowercase letter code unit (0x0061 through 0x007A, inclusive), an ASCII digit code unit (0x0030 through 0x0039, inclusive), or 0x0022 (QUOTATION MARK), throw a SyntaxError exception.
- Let parseResult be ? ParseJSON(jsonString).
- Let value be parseResult.[[Value]].
- Assert: value is either a String, a Number, a Boolean, or null.
spec.html
Outdated
| <emu-alg> | ||
| 1. Let _elements_ be a new empty List. | ||
| 1. If |ElementList| is present, set _elements_ to the list-concatenation of _elements_ and ArrayLiteralContentNodes of |ElementList|. | ||
| 1. If |Elision| is present, append |Elision| to _elements_. |
There was a problem hiding this comment.
Elision can't be present in JSON source text. Can't we just assert it's not present instead?
edit: We're already assuming we have whole-world knowledge about the calling patterns of these AOs because in CreateJSONParseRecord we do a ! Get that would possibly fail on a read-through to Array.prototype if the input had allowed Elisions. If we weren't assuming whole-world, we'd have to guard that step on whether _contentNodes_[_I_] was an Elision.
There was a problem hiding this comment.
Whole-world knowledge is appropriate in CreateJSONParseRecord where we're in the JSON section and confined to parsing the JSON grammar. But ArrayLiteralContentNodes is a general syntax-directed operation, and potentially useful outside of that context (where keeping track of elisions would be important). And even beyond that, the current approach falls out very naturally from simple mapping of the productions. Unless you want to make an argument that it's harmful and/or confusing, I much prefer the generality.
There was a problem hiding this comment.
I think its presence implies that it's possible to hit, yeah. I'd prefer we use our whole-world knowledge to eliminate the case, as the implementers will want to do.
jmdyck
left a comment
There was a problem hiding this comment.
(In retrospect, I probably should have filed a PR against the source branch. Ah well.)
|
+1 all of @jmdyck's review comments. |
| ParseJSON ( | ||
| _text_: a String, | ||
| ): either a normal completion containing an ECMAScript language value or a throw completion | ||
| ): either a normal completion containing a Record with fields [[ParseNode]] (a Parse Node) and [[Value]] (an ECMAScript language value), or a throw completion |
There was a problem hiding this comment.
This Record type should maybe be given a name (JSON Parse Result Record?) and tabular declaration-of-fields.
(Side-note for editors: Editorial-Conventions should probably say when it's okay to use anonymous Record schemas.)
It's a bit odd that ParseJSON returns a Record, and JSON Parse Record exists, and yet those are not the same thing. Maybe the latter could be given a name that more precisely conveys its usage.
There was a problem hiding this comment.
JSON Parse Result Record?
Or (more obviously) ParseJSON Result Record.
There was a problem hiding this comment.
ParseJSON is simple and non-recursive in terms of abstract operation calls, and there are only three calls to it, precisely one of which cares about the resulting [[ParseNode]] (and that one sends it to CreateJSONParseRecord for recursively creating a JSON Parse Record tree). Given that, and the simplicity of a { [[ParseNode]], [[Value]] } Record, I'd prefer to not name it.
There was a problem hiding this comment.
I think it's fine to use an anonymous record in this case, but I also agree we should try to define a heuristic for that in the editorial conventions. The exercise alone might help us clarify our own thoughts and result in some changes to the status quo.
| CreateJSONParseRecord ( | ||
| _parseNode_: a Parse Node, | ||
| _key_: a property name, | ||
| _val_: an ECMAScript language value, |
There was a problem hiding this comment.
Not a fan of the shortened name val (and any name shortening that doesn't actively improve readability). I'm not sure if consistency with InternalizeJSONProperty is a good enough reason to do this.
spec.html
Outdated
| 1. Let _parseResult_ be ? ParseJSON(_jsonString_). | ||
| 1. Let _value_ be _parseResult_.[[Value]]. | ||
| 1. Assert: _value_ is either a String, a Number, a Boolean, or *null*. | ||
| 1. Assert: The first code unit of _jsonString_ is either an ASCII lowercase letter code unit (0x0061 through 0x007A, inclusive), an ASCII digit code unit (0x0030 through 0x0039, inclusive), 0x0022 (QUOTATION MARK), or 0x002D (HYPHEN-MINUS). |
There was a problem hiding this comment.
We should probably have dfns for these in The String Type instead of stating the code unit range inline, fine as a follow up though.
…pecific JSONArrayLiteralContentNodes Ref tc39#3714 (comment)
92505af to
bf21490
Compare
|
Addressed #3714 (comment) from @michaelficarra by consolidating |
…pecific JSONArrayLiteralContentNodes Ref tc39#3714 (comment)
bf21490 to
c40c893
Compare
| ArrayLiteral : | ||
| `[` Elision? `]` | ||
| `[` ElementList `]` | ||
| `[` ElementList `,` Elision? `]` | ||
| </emu-grammar> | ||
| <emu-alg> | ||
| 1. Assert: |Elision| is not present. | ||
| 1. If |ElementList| is not present, return a new empty List. |
There was a problem hiding this comment.
We don't even need to mention Elision.
| ArrayLiteral : | |
| `[` Elision? `]` | |
| `[` ElementList `]` | |
| `[` ElementList `,` Elision? `]` | |
| </emu-grammar> | |
| <emu-alg> | |
| 1. Assert: |Elision| is not present. | |
| 1. If |ElementList| is not present, return a new empty List. | |
| ArrayLiteral : | |
| `[` `]` | |
| `[` ElementList `]` | |
| </emu-grammar> | |
| <emu-alg> | |
| 1. If |ElementList| is not present, return a new empty List. |
Then we can just split these two cases.
| ArrayLiteral : | |
| `[` Elision? `]` | |
| `[` ElementList `]` | |
| `[` ElementList `,` Elision? `]` | |
| </emu-grammar> | |
| <emu-alg> | |
| 1. Assert: |Elision| is not present. | |
| 1. If |ElementList| is not present, return a new empty List. | |
| ArrayLiteral : `[` `]` | |
| </emu-grammar> | |
| <emu-alg> | |
| 1. Return a new empty List. | |
| </emu-alg> | |
| <emu-grammar> | |
| ArrayLiteral : `[` ElementList `]` | |
| </emu-grammar> | |
| <emu-alg> |
There was a problem hiding this comment.
ArrayLiteral: `[` `]` does not appear as a standalone production; it is only inferred from ArrayLiteral: `[` Elision? `]`. Readers can figure that out, and can also draw their own conclusions about the |ArrayLiteral| productions not even mentioned, but I'd rather not get into the habit of placing such unnecessary cognitive burden upon them. And further, asserting lack of |Elision| seems helpful for emphasizing the restricted use of this operation. I'd like to keep this section as-is.
There was a problem hiding this comment.
Yeah I could see it both ways. Let's see if the other editors have an opinion about it.
There was a problem hiding this comment.
I don't have a strong preference, but 5.1.5.3 Optional Symbols specifically notes that Nonterminalopt is a shorthand for productions both with and without the Nonterminal, so ArrayLiteral: `[` `]` actually is a standalone production by that logic.
There was a problem hiding this comment.
I used that qualifier deliberately. ArrayLiteral: `[` `]` is definitely a production, but it is not "standalone" because it only exists in derivation from ArrayLiteral: `[` Elision? `]`. And this matters, because it means a simple search for ArrayLiteral: `[` `]` will come up empty.
There was a problem hiding this comment.
I prefer either the current approach of having Elision and asserting that it's not present, or explicitly splitting it into ArrayLiteral : [ ] and ArrayLiteral : [ Elision ] with "Assert: this never happens" in the second.
There was a problem hiding this comment.
Fwiw I got one of the LLMs to check for cases where the SDO omits the _opt element, and this is the result: https://gist.github.com/nicolo-ribaudo/883ea6e3b6e237ccbde2bf58e13dd8b6
The cases it mentions are correct, but I didn't manually verify that they are complete:
- some of them are for SV/TV/TRV, where we don't use the common emu-grammar+emu-alg pattern for defining the SDO
- the case where the "optional" is present is handled implicitly by the recursive SDO application on the single child nonterminal
- the other are cases where we handle both branches (present/missing) explicitly as separate productions
There was a problem hiding this comment.
Fwiw I got one of the LLMs to check for cases where the SDO omits the
_optelement, ...The cases it mentions are correct, but I didn't manually verify that they are complete
They aren't. I count 102 cases spread over 34 SDOs.
spec.html
Outdated
| </emu-clause> | ||
|
|
||
| <emu-clause id="sec-syntax-directed-operations-contents"> | ||
| <h1>Contents</h1> |
| ArrayLiteral : | ||
| `[` Elision? `]` | ||
| `[` ElementList `]` | ||
| `[` ElementList `,` Elision? `]` | ||
| </emu-grammar> | ||
| <emu-alg> | ||
| 1. Assert: |Elision| is not present. | ||
| 1. If |ElementList| is not present, return a new empty List. |
There was a problem hiding this comment.
I prefer either the current approach of having Elision and asserting that it's not present, or explicitly splitting it into ArrayLiteral : [ ] and ArrayLiteral : [ Elision ] with "Assert: this never happens" in the second.
…the source text of primitive values (tc39#3714)
99a6880 to
1393daf
Compare
1393daf to
8f2f58a
Compare
…the source text of primitive values (tc39#3714)
8f2f58a to
c7305c4
Compare
…the source text of primitive values (tc39#3714)
c7305c4 to
e63c83d
Compare
This implements JSON.parse source text access, currently at stage 3. Tests have already been merged, but are being further extended by tc39/test262#4682 .