-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Halyard is powerful distributed triplestore, instantly answering majority of SPARQL queries, however weak in some complex operations (like ORDER BY and GROUP BY) and complicated to implement a custom code that goes beyond SPARQL.
SANSA Stack (and similar Spark-based SPARQL frameworks) seem to be complimentary to Halyard - powerful in ordering, aggregations, and easy to integrate custom transformation logic into the pipe, however slow in ad-hoc SPARQL queries, and unable to form SPARQL Endpoint.
The idea is to provide a hybrid solution, where SANSA Stack (or any other Spark framework) can directly use Halyard data and Halyard query engine as a (distributed) source of RDF data for further processing.
- Minimal implementation is to provide Halyard library for Spark, so SANSA Stack can directly consume the Halyard data (read the RDF data directly from HBase) and can call Halyard SPARQL Query Engine (consume results from Halyard SPARQL Graph Query locally and directly).
- Integrated solution would require to include Halyard as a Service Provider in SANSA SPARQL Query Engine, so hybrid access to Halyard data from Sansa would be available inside SANSA SPARQL as a Federated Service Provider.
- Optimal solution would also include transparent integration of Halyard SPARQL parallelization (similar to
halyard:forkAndFilterByfunction used in Halyard BulkExport), so Spark engine would be able to directly manage Halyard parallelization (transparently for user).
This is an idea of potential synergy effect of Halyard and SANSA Stack, that seems to be worth to test.