Inconsistency between Marabou and maraboupy on ACAS property phi10

We came across what seems to be an inconsistency between Marabou and maraboupy on ACAS property $\phi_{10}$. 

## Setting

We use the [CAISAR platform](https://git.frama-c.com/pub/caisar) and their specification of [ACAS_Xu](https://git.frama-c.com/pub/caisar/-/blob/master/examples/acasxu/acasxu.why?ref_type=heads). CAISAR generates proof obligations for multiple provers, including the plain Marabou binary and maraboupy.

CAISAR uses a thin wrapper to run maraboupy, available [here](https://git.frama-c.com/pub/caisar/-/blob/master/bin/runMarabou.py?ref_type=heads).

We use the [network_4_5](https://github.com/NeuralNetworkVerification/Marabou/blob/master/resources/onnx/acasxu/ACASXU_experimental_v2a_4_5.onnx).

Both `Marabou` and `maraboupy` are at version 2.0.0

## The problem

We noted that Marabou returns UNSAT while maraboupy returns SAT on the specification. We tracked down the issue to one particular subproblem, attached.
The output logs of maraboupy are also attached.

[why_maraboupy-P10_1.txt](https://github.com/user-attachments/files/20711229/why_maraboupy-P10_1.txt) 

[why_maraboupy-P10_1_output.txt](https://github.com/user-attachments/files/20711236/why_maraboupy-P10_1_output.txt)


maraboupy finds a counterexample. However, running the provided counterexample with the onnxruntime provides a completly different output. Below is the script to run the check (python 3.12, onnxruntime 1.22.0)

```python
import onnxruntime as ort
import onnx
import numpy as np


"""
Counterexample provided by maraboupy

Input assignment:
    x0 = 0.474420
    x1 = 0.151443
    x2 = -0.500000
    x3 = 0.316102
    x4 = 0.375000

Output:
    y0 = -0.019086
    y1 = -0.019086
    y2 = 0.018132
    y3 = -0.018459
    y4 = 0.018360
"""

onnx_path = "/path/to/the/ACASXU_4_5.onnx"
onnx_model = onnx.load(onnx_path)
onnx.checker.check_model(onnx_model)
ort_sess = ort.InferenceSession(onnx_path)

# counter example provided by maraboupy
x = np.asarray([[[[ 0.474420, 0.151443, -0.500000, 0.316102, 0.375000 ]]]],dtype=np.float32)

outputs = ort_sess.run(None, {'input': x})
predicted = outputs[0]
print(f'Predicted: "{predicted}"')

# returns different output than maraboupy
# Predicted: [-0.0206515 -0.0190316  0.0181379 -0.0184323  0.018364 ]
```

The difference between the various output values hints at a floating point error.

As CAISAR have default options for some provers, we thought it might come from there, but with disabling CAISAR options and relying on the prover default, we have the same problem.

## Possible cause

Within the logs, we noted the following mention which may hints that there is some performance degradation. In which case, we wonder why it happens with maraboupy and not with plain Marabou.

```
High degradation found!
Performing precision restoration. Degradation before: 0.100004741846422. After: 0.000000000000351
Performing precision restoration. Degradation before: 0.000000000000351. After: 0.000000000000351
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistency between Marabou and maraboupy on ACAS property phi10 #882

Setting

The problem

Possible cause

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistency between Marabou and maraboupy on ACAS property phi10 #882

Description

Setting

The problem

Possible cause

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions