Skip to content

Inconsistency between Marabou and maraboupy on ACAS property phi10 #882

@GirardR1006

Description

@GirardR1006

We came across what seems to be an inconsistency between Marabou and maraboupy on ACAS property $\phi_{10}$.

Setting

We use the CAISAR platform and their specification of ACAS_Xu. CAISAR generates proof obligations for multiple provers, including the plain Marabou binary and maraboupy.

CAISAR uses a thin wrapper to run maraboupy, available here.

We use the network_4_5.

Both Marabou and maraboupy are at version 2.0.0

The problem

We noted that Marabou returns UNSAT while maraboupy returns SAT on the specification. We tracked down the issue to one particular subproblem, attached.
The output logs of maraboupy are also attached.

why_maraboupy-P10_1.txt

why_maraboupy-P10_1_output.txt

maraboupy finds a counterexample. However, running the provided counterexample with the onnxruntime provides a completly different output. Below is the script to run the check (python 3.12, onnxruntime 1.22.0)

import onnxruntime as ort
import onnx
import numpy as np


"""
Counterexample provided by maraboupy

Input assignment:
    x0 = 0.474420
    x1 = 0.151443
    x2 = -0.500000
    x3 = 0.316102
    x4 = 0.375000

Output:
    y0 = -0.019086
    y1 = -0.019086
    y2 = 0.018132
    y3 = -0.018459
    y4 = 0.018360
"""

onnx_path = "/path/to/the/ACASXU_4_5.onnx"
onnx_model = onnx.load(onnx_path)
onnx.checker.check_model(onnx_model)
ort_sess = ort.InferenceSession(onnx_path)

# counter example provided by maraboupy
x = np.asarray([[[[ 0.474420, 0.151443, -0.500000, 0.316102, 0.375000 ]]]],dtype=np.float32)

outputs = ort_sess.run(None, {'input': x})
predicted = outputs[0]
print(f'Predicted: "{predicted}"')

# returns different output than maraboupy
# Predicted: [-0.0206515 -0.0190316  0.0181379 -0.0184323  0.018364 ]

The difference between the various output values hints at a floating point error.

As CAISAR have default options for some provers, we thought it might come from there, but with disabling CAISAR options and relying on the prover default, we have the same problem.

Possible cause

Within the logs, we noted the following mention which may hints that there is some performance degradation. In which case, we wonder why it happens with maraboupy and not with plain Marabou.

High degradation found!
Performing precision restoration. Degradation before: 0.100004741846422. After: 0.000000000000351
Performing precision restoration. Degradation before: 0.000000000000351. After: 0.000000000000351

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions