Skip to content

Strict validation is applied inconsistently with restrictions #261

@liamhuber

Description

@liamhuber

consider the case where we have the follow very demanding node (RequiresBar) interacting with a variety of inputs:

  • completely untyped data from the parent workflow
  • completely untyped data from a sibling
  • type-hinted only data from a sibling
  • everything it demands from a sibling

E.g.

import rdflib
from semantikon import ontology as onto
from semantikon import workflow
from semantikon.metadata import u

EX = rdflib.Namespace("http://example.org/")

def RequiresBar(
    x: u(
        str,
        uri=EX.Foo,
        restrictions=(
            (rdflib.OWL.onProperty, EX.hasBar),
            (rdflib.OWL.someValuesFrom, EX.bar),
        ),
    ),
) -> u(str, derived_from="inputs.bar"):
    return x

def NoHints(x):
    return x

def JustType(x: str) -> str:
    return x

def Everything(x: str) -> u(
    str, uri=EX.Foo, triples=(EX.hasBar, EX.bar)
):
    return x

def wf(inp):
    gets_macro_inp = RequiresBar(inp)
    no_hints = NoHints(inp)
    gets_nothing = RequiresBar(no_hints)
    just_type = JustType(inp)
    gets_type = RequiresBar(just_type)
    everything = Everything(inp)
    gets_everything = RequiresBar(everything)
    return gets_macro_inp, gets_nothing, gets_type, gets_everything

wf_dict = workflow.get_workflow_dict(wf)
graph = onto.get_knowledge_graph(wf_dict)

On validating this graph strictly,

strict_val = onto.validate_values(graph, strict_typing=True)
strict_val

We see that we get "missing_triples" complaints (which pertain dominantly to the restrictions=) and "incompatible_connections" (which pertain dominantly to uri= analysts, although we see the dynamic restriction types appearing as well):

{'missing_triples': [(rdflib.term.URIRef('wf.RequiresBar_0.inputs.x'),
   rdflib.term.URIRef('http://example.org/hasBar'),
   rdflib.term.URIRef('http://example.org/bar')),
  (rdflib.term.URIRef('wf.RequiresBar_2.inputs.x'),
   rdflib.term.URIRef('http://example.org/hasBar'),
   rdflib.term.URIRef('http://example.org/bar')),
  (rdflib.term.URIRef('wf.RequiresBar_1.inputs.x'),
   rdflib.term.URIRef('http://example.org/hasBar'),
   rdflib.term.URIRef('http://example.org/bar'))],
 'incompatible_connections': [(rdflib.term.URIRef('wf.RequiresBar_0.inputs.x'),
   rdflib.term.URIRef('wf.inputs.inp'),
   [rdflib.term.URIRef('http://example.org/Foo'),
    rdflib.term.URIRef('Na193e4f39adf4bd1b193a07dbd28bade'),
    rdflib.term.URIRef('Nec3ab949314748129cfec1f5f4cc780c'),
    rdflib.term.URIRef('N30fa5c846c50433c8c6823d096e7fc00'),
    rdflib.term.URIRef('N129a751657ba4e30b57bde25bff2f4fc')],
   []),
  (rdflib.term.URIRef('wf.RequiresBar_1.inputs.x'),
   rdflib.term.URIRef('wf.NoHints_0.outputs.x'),
   [rdflib.term.URIRef('http://example.org/Foo'),
    rdflib.term.URIRef('Nec3ab949314748129cfec1f5f4cc780c'),
    rdflib.term.URIRef('N30fa5c846c50433c8c6823d096e7fc00'),
    rdflib.term.URIRef('Na193e4f39adf4bd1b193a07dbd28bade'),
    rdflib.term.URIRef('N129a751657ba4e30b57bde25bff2f4fc')],
   []),
  (rdflib.term.URIRef('wf.RequiresBar_2.inputs.x'),
   rdflib.term.URIRef('wf.JustType_0.outputs.x'),
   [rdflib.term.URIRef('http://example.org/Foo'),
    rdflib.term.URIRef('N129a751657ba4e30b57bde25bff2f4fc'),
    rdflib.term.URIRef('Nec3ab949314748129cfec1f5f4cc780c'),
    rdflib.term.URIRef('N30fa5c846c50433c8c6823d096e7fc00'),
    rdflib.term.URIRef('Na193e4f39adf4bd1b193a07dbd28bade')],
   [])],
 'distinct_units': {}}

Quite sensibly for the strict case, each RequiresBar except for the one getting Everything it needs complains on both fronts.

However, when we turn off strict validation (the default case):

val = onto.validate_values(graph, strict_typing=False)
val

We silence the "incompatible_connections" but have no power over the "missing_triples"!

{'missing_triples': [(rdflib.term.URIRef('wf.RequiresBar_0.inputs.x'),
   rdflib.term.URIRef('http://example.org/hasBar'),
   rdflib.term.URIRef('http://example.org/bar')),
  (rdflib.term.URIRef('wf.RequiresBar_2.inputs.x'),
   rdflib.term.URIRef('http://example.org/hasBar'),
   rdflib.term.URIRef('http://example.org/bar')),
  (rdflib.term.URIRef('wf.RequiresBar_1.inputs.x'),
   rdflib.term.URIRef('http://example.org/hasBar'),
   rdflib.term.URIRef('http://example.org/bar'))],
 'incompatible_connections': [],
 'distinct_units': {}}

I would classify this as a bug -- IMO the "strictness" should impact "missing_triples" and "incompatible_connections" the same way when it comes to edges that exist but carry untyped information. I expect this is doable, but we currently handle the missing triples with a SPARQL query, and my semantikon.ontology and SPARQL skills are not good enough for me to already have an solution implementation in mind.

This problem really wrecks me pyiron_workflow when I try to implement suggestion menus, as I am stuck with failed validations that don't pertain to the creation of a particular connection, and I have no way to relax the strictness of these untyped workflow input -> child node input edges when the child has restrictions=.

In general, I am finding the restrictions to be the source of most of my headaches. I'll keep thinking about more fundamental reworks, but as discussed in #252 I'm rather stuck. This is a tough problem and I'm not casting stones here, I'm just optimistic that we can reach even farther than we currently grasp.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions