-
Notifications
You must be signed in to change notification settings - Fork 2
Add trapi set interpretation support #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
... and 1 file with indirect coverage changes 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a misunderstanding of set interpretation. Set interpretation is on a per-node basis, where different nodes can have different set interpretations. ALL and MANY essentially mark a query node as not defining result uniqueness, where ALL has the added caveat that every member ID on the query node must be fulfilled, or else no result can be made.
Right now, it appears that the code requires all nodes "agree" on both their set interpretation, and on expected member IDs. Neither of these are a valid requirement; nodes may arbitrarily set their individual set interpretations, and different nodes may have different member IDs.
For a basic example, consider the following query, where letters represent QNodeIDs and numbers represent unique IDs on that node:
A(1,2,3; BATCH) --> B(4,5,6; ALL)
A is BATCH, so any ID 1, 2, or 3 will fulfill it. B is ALL, so a valid result MUST have every member id 4, 5, 6. Every ID on A must find an edge relating it to all IDs on B in order to be included in the final output. Imagine Retriever gets the following knowledge, which will show up as results passed to evaluate_set_interpretation() (letters and numbers together represent a binding of the knowledge node to the query node):
A1 -> B4; A1 -> B5; A1 -> B6
A2 -> B4; A2 -> B5
A3 -> B4; A3 -> B6; A3 -> B6
evaluate_set_interpretation() should prune the two results depending on A2, as we don't have a sufficient set to match all of B. We end up with two results, formatted using the placeholder set nodes with nodenorm UUIDs defined in the TRAPI spec.
A1 -> B(4,5,6)
A3 -> B(4,5,6)
In the general case of a multi-interpretation n-node query, I believe it's a safe approach to attempt to evaluate nodes marked ALL for set assembly and pruning, and then evaluate nodes marked MANY to further group results as applicable.
Okay that makes more sense. I think MANY needs more flushing out in the TRAPI specification as it isn't clear at all what we want from it at the moment. ALL makes more sense now, but after we implement this I think it would be good to add more detail to this in the specification as it isn't clear to an outsider how it works in my opinion. I'll go ahead and take a stab at implementing this |
|
|
Adds support for the
set_interpretationfield within the Query instance. More information can be found within the TRAPI specification here. Modifies our lookup behavior, with the core implementation found withinsrc/retriever/utils/trapi.py. Tests can be found undertests/set_interpretation