-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Hi! I discovered extract-cbd-shape this evening and tried it out. It looks very promising, though it’s slightly different from my use case.
I’ve built something in the same space: Resource Fetcher. I’m interested in adding SHACL reverse property path support to Resource Fetcher, which led me to test extract-cbd-shape as a possible replacement.
From my understanding, the two modules differ in their approach. Is it correct that with extract-cbd-shape, you first over-fetch (potentially a large amount) into a store, and then use the module to extract the exact triples for a subject from that store? If so, the main difference is that Resource Fetcher performs iterative fetching via SPARQL.
In Resource Fetcher, I start with a SPARQL ?s ?p ?o construct query to get predicates for further fetching—either for blank node objects or as determined by a passThroughCallback. For blank node objects, it creates "trails" (state indicating further fetching is needed), and then executes the next query, going one layer deeper, following all blank nodes that haven’t yet ended with a non-blank node. This continues until all trails are resolved.
Here’s an example query:
SELECT * WHERE { GRAPH ?g {
{
VALUES (?subject ?depth1_predicate) {
( <https://example.org/allanDoyle> <https://schema.org/givenName> )
( <https://example.org/allanDoyle> <https://schema.org/additionalName> )
( <https://example.org/allanDoyle> <https://schema.org/familyName> )
( <https://example.org/allanDoyle> <https://schema.org/gender> )
( <https://example.org/allanDoyle> <https://schema.org/birthDate> )
( <https://example.org/allanDoyle> <https://schema.org/deathDate> )
( <https://example.org/allanDoyle> <https://schema.org/address> )
}
?subject ?depth1_predicate ?depth1_object .
OPTIONAL {
?depth1_object ?depth2_predicate ?depth2_object .
}
}
}}
And a consecutive query going one level further:
SELECT * WHERE { GRAPH ?g {
{
VALUES (?subject ?depth1_predicate ?depth2_predicate) {
( <https://example.org/allanDoyle> <https://schema.org/address> <https://schema.org/streetAddress> )
( <https://example.org/allanDoyle> <https://schema.org/address> <https://schema.org/postalCode> )
( <https://example.org/allanDoyle> <https://schema.org/address> <https://schema.org/addressLocality> )
}
?subject ?depth1_predicate ?depth1_object .
?depth1_object ?depth2_predicate ?depth2_object .
OPTIONAL {
?depth2_object ?depth3_predicate ?depth3_object .
}
}
}}All quads in a trail that ends with a non-blank node are saved; other trails will be processed in the query for the next level.
Because the interface requires specifying sources and an engine (Comunica), it’s possible to execute over HTTP as well as an RDF/JS store.
Questions:
- Are there plans to support iterative fetching of quads, as done in Resource Fetcher?
- Are there plans to support fetching over SPARQL endpoints, especially where named graphs are very large and over-fetching is impractical?
- I would love to learn more from the work and research that has been done to create this module as it might give me ideas how I could extend Resource Fetcher if its goals are different from the goals of
extract-cbd-shape
Very nice work in this project!