Imagine a jQuery-style autocompletion widget without hardcoded data options, built using Linked Data. This project contains a proof of concept of such how the source data can be fragmented and hosted for such an application.
- Create a docker volume:
docker volume create fragments_volume- Optional: If a different volume name was chosen, update the volume mappings of both services in
docker-compose.yml.
- Optional: If a different volume name was chosen, update the volume mappings of both services in
- Gather all input data sources into one directory
- In
docker-compose.yml, replace/data/dumps/pathon line 7 with the chosen directory. This directory will be mounted as/inputin thefilescontainer
- In
- Create
files/config.json, usingfiles/example_config.jsonas a template.maxFileHandlesis the maximum number of open file handles the fragmenter may have open; 1024 is a common limit set by operating systems.outDircan remain unchanged, this is a mounted volume determined bydocker-compose.ymldomainis used as the root URI to base every fragment's identifier on, so is technically not just the domain but also the protocol, the base path, ...tasksis a list of all datasets, and how they should be processedinputis the path to the file, which should be in the/inputdirectory as determined bydocker-compose.ymlnamewill become part of each fragment's path, to keep the fragmented datasets separatepropertiesis a list of all predicate (URIs) to fragment this dataset on
Running docker-compose build; docker-compose up will then fragment all the given datasets, and serve them on localhost:80.
Running docker-compose up server will skip fragmenting the data (again), and will only serve the existing data fragments.
files/Dockerfile: creates a runnable docker container by compiling the java sources and copying theconfig.jsonfileexample_config.json: template to create aconfig.jsonfile fromsrc/: Java sources of the fragmenter
server/Dockerfile: copies the localnginx.confinto the default nginx containernginx.conf: enables gzip compression, CORS, and caching headers
docker-compose.yml: ensures that thefilescontainer writes to content root of theservercontainer
Input data is processed in 3 steps:
-
The data is parsed as an RDFStream from Apache Jena, each triple or quad is processed separately
-
Discovered literals are processed to obtain a set of fragment files to pipe the triples to
-
The literal is normalized to NFKD; filtered to just the Letters (L), Numbers (N) and Separator (Z) Unicode classes; and then lowercased
-
The normalized literal is tokenized by whitespace
-
Prefixes are extracted from each token
-
A writable StreamRDF is returned for each prefix, and the triple/quad is written to them
-
-
Once all triples are processed, hypermedia links are added to the fragments