Skip to content

Conversation

@mmatera
Copy link
Contributor

@mmatera mmatera commented Sep 8, 2022

This PR starts to fix the issues enumerated (right after) the docstring of MakeBoxes (

# TODO: Convert operators to appropriate representations e.g. 'Plus' to '+'
)

In this first round, the behavior of FormBox is fixed in a way to make it closer to the WL expected behavior:

  • \(a_(String|Real|Integer) ` b_ \) is parsed as FormBox[ToBoxes[b], Removed[$$Failure]] (notice that the argument of Removed is a Symbol not a String).
  • \(a_Symbol \` b_ \) is parsed as FormBox[ToBoxes[b], a]
  • Otherwise, \(a_ \` b_ \) is parsed as FormBox[b, ToBoxes[a]]

Tests were adjusted accordingly.

Copy link
Contributor

@TiagoCavalcante TiagoCavalcante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


# TODO: Convert operators to appropriate representations e.g. 'Plus' to '+'
"""
>> \\(a + b\\)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been looking at this and notice a couple of small formatting changes. If this docstring were marked as a raw string, e.g. r""" ... """ we would need all of the ugly and confusing double backslashes \\ everywhere..

On line 333:

      <dd>is a low-level formatting primitive that converts $expr$
        to box form, without evaluating it.

has a hard line break which the current homegrown formatter doesn't ignore. And it would be nice to add an extra space between and end </dt> and the next starting <dd>.

These aren't strictly about this PR, but I happen to notice since you happen to be in the area.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Also, I was thinking about to patching the homegrown formatter to handle these line breaks, and then avoiding that flake8 complains about long lines. But one step at the time...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f6f3d1f

children = []
self.box_depth += 1
self.bracket_depth += 1
# If this does not happend, it would be because
Copy link
Member

@rocky rocky Sep 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happend -> happen

I am not sure though that I understand. Is the following the same?:

If this construct appears inside a FormBox then the token tag will not be LeftRowBox ?
Note that the test itself is not token.tag != "FormBox" but token.tag == "LeftRowBox".

So this implies there could be things other than a FormBox that can occur too. OtherscriptBox? Span?

What are the merits of using an == LeftRowBox expression vs != FormBox ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to explain the problem in the comments, but it seems it is not easy. Here we go again: the problem is that
\( a b \` c + d \)
should be parsed as
FormBox[RowBox[{"c", "+", "d"}, RowBox[{"a", "b"}]]

Then, to build the expression, we need to know all the tokens before \` , to build the "Format" expression, and then collect the following elements in the expression as the first argument. With the current parsing algorithm, I cannot make to b_FormBox what was the previous tokens.
The alternative I found without changing the algorithm was to hack just p_RawLeftAssociation. If a \` token is found, then call again p_RawLeftAssociation but with a different tag. In this case, self.box_deep and self.bracket_deep does not change their values, unless a nested Raw(Left|Right)Association pair is found.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I could see, the unique case with this special behavior was FormBox. If there were more tags with a similar behavior, where to parsing the next token we need to look at several previous ones and reinterpret them,
I would consider rework the parser.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to make this more explicit in an extended comment in
f6f3d1f

Also, I changed the comparison as @rocky suggested.

# it was called when a `FormBox` was found.
if token.tag == "LeftRowBox":
self.box_depth += 1
self.bracket_depth += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every place there is some depth that increases by one, there should be another place that decreases the depth by one.

Where is that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the tat is not LeftRowBox then we never reach the decrement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are saying something more or less like what the code says. Why is the decrement not reached. What is the depth used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are saying something more or less like what the code says. Why is the decrement not reached. What is the depth used for?

The deep was used to handle nested expressions like \( a b \( InputForm \` \(OutputForm \` c d\) e \) \)

When a \` is found, then the previous tokens are collected, and evaluated as a Symbol, a Box expression or StandardForm if there is no token before \`. The result of this step is stored in a variable, to be used as the second argument of FormBox.
Then, a new call to the method is done, with FormBox as a tag. In the new call, the next elements are processed until the bracket and box deeps reach 0. Then, these elements are collected into a String or a RowBox. Then, the instance where \` was found takes the format and the other collected elements, builds a FormBox[], and returns it (before reaching the line where the decrements happen).

I tried to find a way to put the increment and the decrement together, but it just was more involved than this approach: I should have allowed the increment of the deep variables, and then doing the decrement before returning the FormBox. I didn't think that it was clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope the new comment helps to understand the logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the description. Something about this seems a little to complicated and that there should be something that follows the specification in a little more direct way:

I would like to think about and reflect on this some more. I think we can make this cleaner, clearer and simpler.

What would be ideal is if we could do a cleanup step first - no changes to behavior yet.
Then add the list of tests that are wrong and that we want to fix.

And after this, then come up with a way to implement this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ∖(∖`input∖) yields FormBox[input,RawForm].

Actually, in WMA,

In[1]:= \(\`input\)                                                             

Out[1]= FormBox[input, StandardForm]

I would like to think about and reflect on this some more. I think we can make this cleaner, clearer and simpler.

I am sure it is possible, just that after trying for a while, I didn't find a better way.

What would be ideal is if we could do a cleanup step first - no changes to behavior yet. Then add the list of tests that are wrong and that we want to fix.

What would be a cleanup step? The list of (simple) tests to check are the tests that I added in test/core/parser/test_parser.py.

And after this, then come up with a way to implement this.

OK, in that case, I am going to put this as a draft for a while.

self.check("\\( \\` b \\)", 'FormBox["b", StandardForm]')
self.check("\\( a \\` b \\)", 'FormBox["b", a]')
self.check("\\( a \\` \\)", 'FormBox["", a]')
self.check("\\( a \\` b + c \\)", 'FormBox[RowBox[{"b", "+", "c"}], a]')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small thing but I would appreciate it if you' use raw strings here as well. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just followed the convention used in this file. But I agree, it would be clearer with raw strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in f6f3d1f

@rocky
Copy link
Member

rocky commented Sep 11, 2022

I am not seeing this Removed[$$Failure] thing described above. I see a $Failed Symbol though. Similarly, I am not seeing a function called Removed although there is a function called Remove.

Please educate me here.

box2 = self.parse_box(q + 1)
return Node("FractionBox", box1, box2)

def b_FormBox(self, box1, token, p):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because with the current logic, to implement the behavior of FormBox is not possible using this mechanism, so in my proposal, this method is never used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is what I mean about the myopic kind of thing. FormBox needs to worry about precedence and the b_ routines are where this kind of thing is done. So if you are defeating that, then maybe the thinking around this is flawed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was aware about the possible issue with the precedence. So I checked it against WMA, and it also seems to neglect the operator precedence of \`, so it gives the maximum precedence to it. On the other hand, the tokens that follows \` are processed using all the precedence rules.

What makes me think that this is the right approach (or at least a better approach than the current one) is that it does not break the existing tests, and it just makes a difference if a \` token is found inside a LeftRowBracket/RightRowBracket expression.

@rocky
Copy link
Member

rocky commented Sep 11, 2022

I am having trepidations about this, and worry that a myopic approach like this, if it can work may lead to a mess, if it can be done. It may be like trying to solve a Rubik's cube face by face instead of understanding the natural subgroups imposed by turns of the cube and reducing those instead.

if is_symbol_name(fmt_name):
fmt = Symbol(fmt_name)
else:
fmt = Node("Removed", Symbol("$$Failure"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rocky,you are right: it should be

fmt = Node("Removed", String("$$Failure"))

to be compatible with WMA.

For example:

In WMA, if `FormBox` receives wrong arguments (a Number as a format for instance) the interpreter produces

In[1]:= (3 ` + b)

Out[1]= FormBox[RowBox[{+, b}], Removed[$$Failure]]

In[2]:= %//FullForm

Out[2]//FullForm= FormBox[RowBox[List["+", "b"]], Removed["$$Failure"]]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f6f3d1f

@mmatera
Copy link
Contributor Author

mmatera commented Sep 11, 2022

I am having trepidations about this, and worry that a myopic approach like this, if it can work may lead to a mess, if it can be done. It may be like trying to solve a Rubik's cube face by face instead of understanding the natural subgroups imposed by turns of the cube and reducing those instead.

Indeed, I have a similar feeling. This is why I tried to implement this in the most locally possible way. A broader approach would require to share with the p_* methods all the previous tokens, or some other more elaborated mechanism.

@mmatera mmatera marked this pull request as draft September 11, 2022 23:45
@rocky
Copy link
Member

rocky commented Sep 12, 2022

What would be a cleanup step? The list of (simple) tests to check are the tests that I added in test/core/parser/test_parser.py.

The changes to mathics/builtin/makeboxes.py, test/core/parser/test_parser.py with the tests that are currently broken marked with pytest.skip.markif and the change to CHANGES.rst that is a mistake from a prior commit, not the line that mentions adding a new feature.

All changes to mathics/core/parser/parser.py I would not include but keep here in the branch. This is the code that should be isolated, understood better and possibly reorganized.

Thanks for your patience and understanding.

@mmatera
Copy link
Contributor Author

mmatera commented Apr 2, 2023

@rocky, this is the PR that I remember.

@rocky
Copy link
Member

rocky commented Apr 2, 2023

@rocky, this is the PR that I remember.

I looked at this again, and I my opinion and view of this has not changed. I am not convinced there is a parsing problem but rather a problem somewhere outside of parsing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants