Fixing parsing for FormBox #545

mmatera · 2022-09-08T11:23:02Z

This PR starts to fix the issues enumerated (right after) the docstring of MakeBoxes (

mathics-core/mathics/builtin/makeboxes.py

Line 374 in a90d0ad

# TODO: Convert operators to appropriate representations e.g. 'Plus' to '+'

)

In this first round, the behavior of FormBox is fixed in a way to make it closer to the WL expected behavior:

$a_(String|Real|Integer) ` b_ $ is parsed as FormBox[ToBoxes[b], Removed[$$Failure]] (notice that the argument of Removed is a Symbol not a String).
$a_Symbol \` b_ $ is parsed as FormBox[ToBoxes[b], a]
Otherwise, $a_ \` b_ $ is parsed as FormBox[b, ToBoxes[a]]

Tests were adjusted accordingly.

…$Failure]

TiagoCavalcante

LGTM

mathics/core/parser/parser.py

rocky · 2022-09-11T12:50:59Z

mathics/builtin/makeboxes.py

-
-    # TODO: Convert operators to appropriate representations e.g. 'Plus' to '+'
-    """
    >> \\(a + b\\)


I have been looking at this and notice a couple of small formatting changes. If this docstring were marked as a raw string, e.g. r""" ... """ we would need all of the ugly and confusing double backslashes \\ everywhere..

On line 333:

<dd>is a low-level formatting primitive that converts $expr$ to box form, without evaluating it.

has a hard line break which the current homegrown formatter doesn't ignore. And it would be nice to add an extra space between and end </dt> and the next starting <dd>.

These aren't strictly about this PR, but I happen to notice since you happen to be in the area.

Indeed. Also, I was thinking about to patching the homegrown formatter to handle these line breaks, and then avoiding that flake8 complains about long lines. But one step at the time...

Fixed in f6f3d1f

rocky · 2022-09-11T12:52:16Z

mathics/core/parser/parser.py

        children = []
-        self.box_depth += 1
-        self.bracket_depth += 1
+        # If this does not happend, it would be because


happend -> happen

I am not sure though that I understand. Is the following the same?:

If this construct appears inside a FormBox then the token tag will not be LeftRowBox ?
Note that the test itself is not token.tag != "FormBox" but token.tag == "LeftRowBox".

So this implies there could be things other than a FormBox that can occur too. OtherscriptBox? Span?

What are the merits of using an == LeftRowBox expression vs != FormBox ?

I try to explain the problem in the comments, but it seems it is not easy. Here we go again: the problem is that
$ a b \` c + d $
should be parsed as
FormBox[RowBox[{"c", "+", "d"}, RowBox[{"a", "b"}]]

Then, to build the expression, we need to know all the tokens before \` , to build the "Format" expression, and then collect the following elements in the expression as the first argument. With the current parsing algorithm, I cannot make to b_FormBox what was the previous tokens.
The alternative I found without changing the algorithm was to hack just p_RawLeftAssociation. If a \` token is found, then call again p_RawLeftAssociation but with a different tag. In this case, self.box_deep and self.bracket_deep does not change their values, unless a nested Raw(Left|Right)Association pair is found.

As far as I could see, the unique case with this special behavior was FormBox. If there were more tags with a similar behavior, where to parsing the next token we need to look at several previous ones and reinterpret them,
I would consider rework the parser.

I tried to make this more explicit in an extended comment in
f6f3d1f

Also, I changed the comparison as @rocky suggested.

rocky · 2022-09-11T13:03:54Z

mathics/core/parser/parser.py

+        # it was called when a `FormBox` was found.
+        if token.tag == "LeftRowBox":
+            self.box_depth += 1
+            self.bracket_depth += 1


Every place there is some depth that increases by one, there should be another place that decreases the depth by one.

Where is that?

If the tat is not LeftRowBox then we never reach the decrement.

You are saying something more or less like what the code says. Why is the decrement not reached. What is the depth used for?

You are saying something more or less like what the code says. Why is the decrement not reached. What is the depth used for?

The deep was used to handle nested expressions like $ a b \( InputForm \` \(OutputForm \` c d$ e \) \)

When a \` is found, then the previous tokens are collected, and evaluated as a Symbol, a Box expression or StandardForm if there is no token before \`. The result of this step is stored in a variable, to be used as the second argument of FormBox.
Then, a new call to the method is done, with FormBox as a tag. In the new call, the next elements are processed until the bracket and box deeps reach 0. Then, these elements are collected into a String or a RowBox. Then, the instance where \` was found takes the format and the other collected elements, builds a FormBox[], and returns it (before reaching the line where the decrements happen).

I tried to find a way to put the increment and the decrement together, but it just was more involved than this approach: I should have allowed the increment of the deep variables, and then doing the decrement before returning the FormBox. I didn't think that it was clearer.

I hope the new comment helps to understand the logic.

Thanks for the description. Something about this seems a little to complicated and that there should be something that follows the specification in a little more direct way:

In InputForm and StandardForm, (form∖`input∖) yields FormBox[input,form].

∖(∖`input∖) yields FormBox[input,RawForm].

I would like to think about and reflect on this some more. I think we can make this cleaner, clearer and simpler.

What would be ideal is if we could do a cleanup step first - no changes to behavior yet.
Then add the list of tests that are wrong and that we want to fix.

And after this, then come up with a way to implement this.

∖(∖`input∖) yields FormBox[input,RawForm].

Actually, in WMA,

In[1]:= $\`input$ Out[1]= FormBox[input, StandardForm]

I would like to think about and reflect on this some more. I think we can make this cleaner, clearer and simpler.

I am sure it is possible, just that after trying for a while, I didn't find a better way.

What would be ideal is if we could do a cleanup step first - no changes to behavior yet. Then add the list of tests that are wrong and that we want to fix.

What would be a cleanup step? The list of (simple) tests to check are the tests that I added in test/core/parser/test_parser.py.

And after this, then come up with a way to implement this.

OK, in that case, I am going to put this as a draft for a while.

rocky · 2022-09-11T13:11:37Z

test/core/parser/test_parser.py

        self.check("\\( \\` b \\)", 'FormBox["b", StandardForm]')
        self.check("\\( a \\` b \\)", 'FormBox["b", a]')
        self.check("\\( a \\` \\)", 'FormBox["", a]')
+        self.check("\\( a \\` b + c \\)", 'FormBox[RowBox[{"b", "+", "c"}], a]')


A small thing but I would appreciate it if you' use raw strings here as well. Thanks.

I just followed the convention used in this file. But I agree, it would be clearer with raw strings.

Done in f6f3d1f

rocky · 2022-09-11T13:30:39Z

I am not seeing this Removed[$$Failure] thing described above. I see a $Failed Symbol though. Similarly, I am not seeing a function called Removed although there is a function called Remove.

Please educate me here.

rocky · 2022-09-11T13:47:54Z

mathics/core/parser/parser.py

        box2 = self.parse_box(q + 1)
        return Node("FractionBox", box1, box2)

-    def b_FormBox(self, box1, token, p):


Why was this removed?

Because with the current logic, to implement the behavior of FormBox is not possible using this mechanism, so in my proposal, this method is never used.

Yes, this is what I mean about the myopic kind of thing. FormBox needs to worry about precedence and the b_ routines are where this kind of thing is done. So if you are defeating that, then maybe the thinking around this is flawed.

Yes, I was aware about the possible issue with the precedence. So I checked it against WMA, and it also seems to neglect the operator precedence of \`, so it gives the maximum precedence to it. On the other hand, the tokens that follows \` are processed using all the precedence rules.

What makes me think that this is the right approach (or at least a better approach than the current one) is that it does not break the existing tests, and it just makes a difference if a \` token is found inside a LeftRowBracket/RightRowBracket expression.

rocky · 2022-09-11T13:53:02Z

I am having trepidations about this, and worry that a myopic approach like this, if it can work may lead to a mess, if it can be done. It may be like trying to solve a Rubik's cube face by face instead of understanding the natural subgroups imposed by turns of the cube and reducing those instead.

mmatera · 2022-09-11T14:52:51Z

mathics/core/parser/parser.py

+                    if is_symbol_name(fmt_name):
+                        fmt = Symbol(fmt_name)
+                    else:
+                        fmt = Node("Removed", Symbol("$$Failure"))


@rocky,you are right: it should be

fmt = Node("Removed", String("$$Failure"))

to be compatible with WMA.

For example:

In WMA, if `FormBox` receives wrong arguments (a Number as a format for instance) the interpreter produces

In[1]:= (3 ` + b)

Out[1]= FormBox[RowBox[{+, b}], Removed[$$Failure]]

In[2]:= %//FullForm

Out[2]//FullForm= FormBox[RowBox[List["+", "b"]], Removed["$$Failure"]]

Fixed in f6f3d1f

mmatera · 2022-09-11T15:17:39Z

I am having trepidations about this, and worry that a myopic approach like this, if it can work may lead to a mess, if it can be done. It may be like trying to solve a Rubik's cube face by face instead of understanding the natural subgroups imposed by turns of the cube and reducing those instead.

Indeed, I have a similar feeling. This is why I tried to implement this in the most locally possible way. A broader approach would require to share with the p_* methods all the previous tokens, or some other more elaborated mechanism.

… HEAD

rocky · 2022-09-12T00:29:11Z

What would be a cleanup step? The list of (simple) tests to check are the tests that I added in test/core/parser/test_parser.py.

The changes to mathics/builtin/makeboxes.py, test/core/parser/test_parser.py with the tests that are currently broken marked with pytest.skip.markif and the change to CHANGES.rst that is a mistake from a prior commit, not the line that mentions adding a new feature.

All changes to mathics/core/parser/parser.py I would not include but keep here in the branch. This is the code that should be isolated, understood better and possibly reorganized.

Thanks for your patience and understanding.

mmatera · 2023-04-02T16:57:12Z

@rocky, this is the PR that I remember.

rocky · 2023-04-02T17:10:37Z

@rocky, this is the PR that I remember.

I looked at this again, and I my opinion and view of this has not changed. I am not convinced there is a parsing problem but rather a problem somewhere outside of parsing.

mmatera added 3 commits September 8, 2022 07:36

fix parsing of FormBox inside a RowBox.

64c9295

improving format handling in FormBox. Removed["$$Failure"]->Removed[$…

3d83960

…$Failure]

removing trailing comment

f15fe6c

TiagoCavalcante approved these changes Sep 8, 2022

View reviewed changes

mathics/core/parser/parser.py Outdated Show resolved Hide resolved

Tiagos's comment

73926ff

rocky reviewed Sep 11, 2022

View reviewed changes

mmatera commented Sep 11, 2022

View reviewed changes

mmatera added 3 commits September 11, 2022 12:47

Merge remote-tracking branch 'origin/improving_makeboxes_compat' into…

f3920f0

… HEAD

go over Rocky's comments

f6f3d1f

CHANGES.rst

9a2680e

mmatera marked this pull request as draft September 11, 2022 23:45

Merge branch 'master' into improving_makeboxes_compat

9485d9d

mmatera added 10 commits September 17, 2022 17:41

moving pending doctests in MakeBoxes to pytests

0cfd7d0

improving tests and mark what is working now

ca7dd9b

xfalied optiona

78fabca

merge

65cfbac

reformulate tests

8fcf8de

adding test to reveal the internals of MakeBoxes

72bd2b3

fix and improve MakeBoxes test

89d74ee

merge

f77fae3

Merge branch 'more_tests_for_makeboxes' into improving_makeboxes_compat

dd57ebe

Merge branch 'master' into improving_makeboxes_compat

a67a6af

Uh oh!

Fixing parsing for FormBox #545

Are you sure you want to change the base?

Fixing parsing for FormBox #545

Uh oh!

Conversation

mmatera commented Sep 8, 2022 • edited by rocky Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TiagoCavalcante left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rocky Sep 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rocky commented Sep 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rocky commented Sep 11, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mmatera commented Sep 11, 2022

Uh oh!

rocky commented Sep 12, 2022

Uh oh!

mmatera commented Apr 2, 2023

Uh oh!

rocky commented Apr 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

mmatera commented Sep 8, 2022 •

edited by rocky

Loading

rocky Sep 11, 2022 •

edited

Loading

rocky commented Sep 11, 2022 •

edited

Loading