-
Notifications
You must be signed in to change notification settings - Fork 106
Description
We came across what seems to be an inconsistency between Marabou and maraboupy on ACAS property
Setting
We use the CAISAR platform and their specification of ACAS_Xu. CAISAR generates proof obligations for multiple provers, including the plain Marabou binary and maraboupy.
CAISAR uses a thin wrapper to run maraboupy, available here.
We use the network_4_5.
Both Marabou and maraboupy are at version 2.0.0
The problem
We noted that Marabou returns UNSAT while maraboupy returns SAT on the specification. We tracked down the issue to one particular subproblem, attached.
The output logs of maraboupy are also attached.
why_maraboupy-P10_1_output.txt
maraboupy finds a counterexample. However, running the provided counterexample with the onnxruntime provides a completly different output. Below is the script to run the check (python 3.12, onnxruntime 1.22.0)
import onnxruntime as ort
import onnx
import numpy as np
"""
Counterexample provided by maraboupy
Input assignment:
x0 = 0.474420
x1 = 0.151443
x2 = -0.500000
x3 = 0.316102
x4 = 0.375000
Output:
y0 = -0.019086
y1 = -0.019086
y2 = 0.018132
y3 = -0.018459
y4 = 0.018360
"""
onnx_path = "/path/to/the/ACASXU_4_5.onnx"
onnx_model = onnx.load(onnx_path)
onnx.checker.check_model(onnx_model)
ort_sess = ort.InferenceSession(onnx_path)
# counter example provided by maraboupy
x = np.asarray([[[[ 0.474420, 0.151443, -0.500000, 0.316102, 0.375000 ]]]],dtype=np.float32)
outputs = ort_sess.run(None, {'input': x})
predicted = outputs[0]
print(f'Predicted: "{predicted}"')
# returns different output than maraboupy
# Predicted: [-0.0206515 -0.0190316 0.0181379 -0.0184323 0.018364 ]The difference between the various output values hints at a floating point error.
As CAISAR have default options for some provers, we thought it might come from there, but with disabling CAISAR options and relying on the prover default, we have the same problem.
Possible cause
Within the logs, we noted the following mention which may hints that there is some performance degradation. In which case, we wonder why it happens with maraboupy and not with plain Marabou.
High degradation found!
Performing precision restoration. Degradation before: 0.100004741846422. After: 0.000000000000351
Performing precision restoration. Degradation before: 0.000000000000351. After: 0.000000000000351