-
-
Notifications
You must be signed in to change notification settings - Fork 65
Fixing parsing for FormBox #545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
TiagoCavalcante
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
mathics/builtin/makeboxes.py
Outdated
|
|
||
| # TODO: Convert operators to appropriate representations e.g. 'Plus' to '+' | ||
| """ | ||
| >> \\(a + b\\) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been looking at this and notice a couple of small formatting changes. If this docstring were marked as a raw string, e.g. r""" ... """ we would need all of the ugly and confusing double backslashes \\ everywhere..
On line 333:
<dd>is a low-level formatting primitive that converts $expr$
to box form, without evaluating it.
has a hard line break which the current homegrown formatter doesn't ignore. And it would be nice to add an extra space between and end </dt> and the next starting <dd>.
These aren't strictly about this PR, but I happen to notice since you happen to be in the area.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Also, I was thinking about to patching the homegrown formatter to handle these line breaks, and then avoiding that flake8 complains about long lines. But one step at the time...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in f6f3d1f
mathics/core/parser/parser.py
Outdated
| children = [] | ||
| self.box_depth += 1 | ||
| self.bracket_depth += 1 | ||
| # If this does not happend, it would be because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happend -> happen
I am not sure though that I understand. Is the following the same?:
If this construct appears inside a FormBox then the token tag will not be LeftRowBox ?
Note that the test itself is not token.tag != "FormBox" but token.tag == "LeftRowBox".
So this implies there could be things other than a FormBox that can occur too. OtherscriptBox? Span?
What are the merits of using an == LeftRowBox expression vs != FormBox ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try to explain the problem in the comments, but it seems it is not easy. Here we go again: the problem is that
\( a b \` c + d \)
should be parsed as
FormBox[RowBox[{"c", "+", "d"}, RowBox[{"a", "b"}]]
Then, to build the expression, we need to know all the tokens before \` , to build the "Format" expression, and then collect the following elements in the expression as the first argument. With the current parsing algorithm, I cannot make to b_FormBox what was the previous tokens.
The alternative I found without changing the algorithm was to hack just p_RawLeftAssociation. If a \` token is found, then call again p_RawLeftAssociation but with a different tag. In this case, self.box_deep and self.bracket_deep does not change their values, unless a nested Raw(Left|Right)Association pair is found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I could see, the unique case with this special behavior was FormBox. If there were more tags with a similar behavior, where to parsing the next token we need to look at several previous ones and reinterpret them,
I would consider rework the parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # it was called when a `FormBox` was found. | ||
| if token.tag == "LeftRowBox": | ||
| self.box_depth += 1 | ||
| self.bracket_depth += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every place there is some depth that increases by one, there should be another place that decreases the depth by one.
Where is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the tat is not LeftRowBox then we never reach the decrement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are saying something more or less like what the code says. Why is the decrement not reached. What is the depth used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are saying something more or less like what the code says. Why is the decrement not reached. What is the depth used for?
The deep was used to handle nested expressions like \( a b \( InputForm \` \(OutputForm \` c d\) e \) \)
When a \` is found, then the previous tokens are collected, and evaluated as a Symbol, a Box expression or StandardForm if there is no token before \`. The result of this step is stored in a variable, to be used as the second argument of FormBox.
Then, a new call to the method is done, with FormBox as a tag. In the new call, the next elements are processed until the bracket and box deeps reach 0. Then, these elements are collected into a String or a RowBox. Then, the instance where \` was found takes the format and the other collected elements, builds a FormBox[], and returns it (before reaching the line where the decrements happen).
I tried to find a way to put the increment and the decrement together, but it just was more involved than this approach: I should have allowed the increment of the deep variables, and then doing the decrement before returning the FormBox. I didn't think that it was clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope the new comment helps to understand the logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the description. Something about this seems a little to complicated and that there should be something that follows the specification in a little more direct way:
- In InputForm and StandardForm, (form∖`input∖) yields FormBox[input,form].
- ∖(∖`input∖) yields FormBox[input,RawForm].
I would like to think about and reflect on this some more. I think we can make this cleaner, clearer and simpler.
What would be ideal is if we could do a cleanup step first - no changes to behavior yet.
Then add the list of tests that are wrong and that we want to fix.
And after this, then come up with a way to implement this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ∖(∖`input∖) yields FormBox[input,RawForm].
Actually, in WMA,
In[1]:= \(\`input\)
Out[1]= FormBox[input, StandardForm]
I would like to think about and reflect on this some more. I think we can make this cleaner, clearer and simpler.
I am sure it is possible, just that after trying for a while, I didn't find a better way.
What would be ideal is if we could do a cleanup step first - no changes to behavior yet. Then add the list of tests that are wrong and that we want to fix.
What would be a cleanup step? The list of (simple) tests to check are the tests that I added in test/core/parser/test_parser.py.
And after this, then come up with a way to implement this.
OK, in that case, I am going to put this as a draft for a while.
test/core/parser/test_parser.py
Outdated
| self.check("\\( \\` b \\)", 'FormBox["b", StandardForm]') | ||
| self.check("\\( a \\` b \\)", 'FormBox["b", a]') | ||
| self.check("\\( a \\` \\)", 'FormBox["", a]') | ||
| self.check("\\( a \\` b + c \\)", 'FormBox[RowBox[{"b", "+", "c"}], a]') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small thing but I would appreciate it if you' use raw strings here as well. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just followed the convention used in this file. But I agree, it would be clearer with raw strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in f6f3d1f
| box2 = self.parse_box(q + 1) | ||
| return Node("FractionBox", box1, box2) | ||
|
|
||
| def b_FormBox(self, box1, token, p): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because with the current logic, to implement the behavior of FormBox is not possible using this mechanism, so in my proposal, this method is never used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is what I mean about the myopic kind of thing. FormBox needs to worry about precedence and the b_ routines are where this kind of thing is done. So if you are defeating that, then maybe the thinking around this is flawed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was aware about the possible issue with the precedence. So I checked it against WMA, and it also seems to neglect the operator precedence of \`, so it gives the maximum precedence to it. On the other hand, the tokens that follows \` are processed using all the precedence rules.
What makes me think that this is the right approach (or at least a better approach than the current one) is that it does not break the existing tests, and it just makes a difference if a \` token is found inside a LeftRowBracket/RightRowBracket expression.
|
I am having trepidations about this, and worry that a myopic approach like this, if it can work may lead to a mess, if it can be done. It may be like trying to solve a Rubik's cube face by face instead of understanding the natural subgroups imposed by turns of the cube and reducing those instead. |
mathics/core/parser/parser.py
Outdated
| if is_symbol_name(fmt_name): | ||
| fmt = Symbol(fmt_name) | ||
| else: | ||
| fmt = Node("Removed", Symbol("$$Failure")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rocky,you are right: it should be
fmt = Node("Removed", String("$$Failure"))
to be compatible with WMA.
For example:
In WMA, if `FormBox` receives wrong arguments (a Number as a format for instance) the interpreter produces
In[1]:= (3 ` + b)
Out[1]= FormBox[RowBox[{+, b}], Removed[$$Failure]]
In[2]:= %//FullForm
Out[2]//FullForm= FormBox[RowBox[List["+", "b"]], Removed["$$Failure"]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in f6f3d1f
Indeed, I have a similar feeling. This is why I tried to implement this in the most locally possible way. A broader approach would require to share with the |
The changes to All changes to Thanks for your patience and understanding. |
|
@rocky, this is the PR that I remember. |
I looked at this again, and I my opinion and view of this has not changed. I am not convinced there is a parsing problem but rather a problem somewhere outside of parsing. |
This PR starts to fix the issues enumerated (right after) the docstring of
MakeBoxes(mathics-core/mathics/builtin/makeboxes.py
Line 374 in a90d0ad
In this first round, the behavior of
FormBoxis fixed in a way to make it closer to the WL expected behavior:\(a_(String|Real|Integer) ` b_ \)is parsed asFormBox[ToBoxes[b], Removed[$$Failure]](notice that the argument ofRemovedis aSymbolnot aString).\(a_Symbol \` b_ \)is parsed asFormBox[ToBoxes[b], a]\(a_ \` b_ \)is parsed asFormBox[b, ToBoxes[a]]Tests were adjusted accordingly.