Skip to content

Conversation

@ctrl-schaff
Copy link

Adds support for the set_interpretation field within the Query instance. More information can be found within the TRAPI specification here. Modifies our lookup behavior, with the core implementation found within src/retriever/utils/trapi.py. Tests can be found under tests/set_interpretation

@ctrl-schaff ctrl-schaff self-assigned this Dec 8, 2025
@ctrl-schaff ctrl-schaff added the enhancement New feature or request label Dec 8, 2025
@sentry
Copy link

sentry bot commented Dec 8, 2025

Codecov Report

❌ Patch coverage is 93.63057% with 10 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/retriever/utils/trapi.py 94.83% 5 Missing and 3 partials ⚠️
src/retriever/lookup/lookup.py 0.00% 2 Missing ⚠️
Files with missing lines Coverage Δ
src/retriever/lookup/lookup.py 0.00% <0.00%> (ø)
src/retriever/utils/trapi.py 94.09% <94.83%> (+0.68%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@tokebe tokebe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a misunderstanding of set interpretation. Set interpretation is on a per-node basis, where different nodes can have different set interpretations. ALL and MANY essentially mark a query node as not defining result uniqueness, where ALL has the added caveat that every member ID on the query node must be fulfilled, or else no result can be made.

Right now, it appears that the code requires all nodes "agree" on both their set interpretation, and on expected member IDs. Neither of these are a valid requirement; nodes may arbitrarily set their individual set interpretations, and different nodes may have different member IDs.

For a basic example, consider the following query, where letters represent QNodeIDs and numbers represent unique IDs on that node:

A(1,2,3; BATCH) --> B(4,5,6; ALL)

A is BATCH, so any ID 1, 2, or 3 will fulfill it. B is ALL, so a valid result MUST have every member id 4, 5, 6. Every ID on A must find an edge relating it to all IDs on B in order to be included in the final output. Imagine Retriever gets the following knowledge, which will show up as results passed to evaluate_set_interpretation() (letters and numbers together represent a binding of the knowledge node to the query node):

A1 -> B4; A1 -> B5; A1 -> B6
A2 -> B4; A2 -> B5
A3 -> B4; A3 -> B6; A3 -> B6

evaluate_set_interpretation() should prune the two results depending on A2, as we don't have a sufficient set to match all of B. We end up with two results, formatted using the placeholder set nodes with nodenorm UUIDs defined in the TRAPI spec.

A1 -> B(4,5,6) 
A3 -> B(4,5,6)

In the general case of a multi-interpretation n-node query, I believe it's a safe approach to attempt to evaluate nodes marked ALL for set assembly and pruning, and then evaluate nodes marked MANY to further group results as applicable.

@ctrl-schaff
Copy link
Author

This seems to be a misunderstanding of set interpretation. Set interpretation is on a per-node basis, where different nodes can have different set interpretations. ALL and MANY essentially mark a query node as not defining result uniqueness, where ALL has the added caveat that every member ID on the query node must be fulfilled, or else no result can be made.

Right now, it appears that the code requires all nodes "agree" on both their set interpretation, and on expected member IDs. Neither of these are a valid requirement; nodes may arbitrarily set their individual set interpretations, and different nodes may have different member IDs.

For a basic example, consider the following query, where letters represent QNodeIDs and numbers represent unique IDs on that node:

A(1,2,3; BATCH) --> B(4,5,6; ALL)

A is BATCH, so any ID 1, 2, or 3 will fulfill it. B is ALL, so a valid result MUST have every member id 4, 5, 6. Every ID on A must find an edge relating it to all IDs on B in order to be included in the final output. Imagine Retriever gets the following knowledge, which will show up as results passed to evaluate_set_interpretation() (letters and numbers together represent a binding of the knowledge node to the query node):

A1 -> B4; A1 -> B5; A1 -> B6
A2 -> B4; A2 -> B5
A3 -> B4; A3 -> B6; A3 -> B6

evaluate_set_interpretation() should prune the two results depending on A2, as we don't have a sufficient set to match all of B. We end up with two results, formatted using the placeholder set nodes with nodenorm UUIDs defined in the TRAPI spec.

A1 -> B(4,5,6) 
A3 -> B(4,5,6)

In the general case of a multi-interpretation n-node query, I believe it's a safe approach to attempt to evaluate nodes marked ALL for set assembly and pruning, and then evaluate nodes marked MANY to further group results as applicable.

Okay that makes more sense. I think MANY needs more flushing out in the TRAPI specification as it isn't clear at all what we want from it at the moment. ALL makes more sense now, but after we implement this I think it would be good to add more detail to this in the specification as it isn't clear to an outsider how it works in my opinion. I'll go ahead and take a stab at implementing this

@tokebe
Copy link
Contributor

tokebe commented Dec 9, 2025

MANY essentially only attempts to maximally group results, but doesn't prune. So in the above example, if B were MANY instead, we'd end up with 3 results:

A1 -> B(4,5,6) 
A2 -> B(4,5)
A3 -> B(4,5,6)

@ctrl-schaff ctrl-schaff marked this pull request as ready for review January 5, 2026 15:27
@ctrl-schaff ctrl-schaff requested a review from tokebe January 7, 2026 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants