This is the repository of LEMMING, an ExaMple MImickiNg graph Generator, and SimplexKG, A Simplex Approach to Synthetic Knowledge Graph Generation (Link to be added).
LEMMING contains Synthetic Knowledge Graph Generators based on instance data.
-
Java Development Kit (JDK): Version 17 or later.
-
Apache Maven: Version 3.6.
You can either use the pre-built JAR file provided in this repository, or build it yourself using:
mvn clean package
The JAR file will be located in the target directory after the build process is complete.
mvn clean package
java -jar target/lemming.jar single-graph -ds test -dp src/test/resources/snippet_linkedgeo.nt -nv 10
LEMMING currently supports 2 graph generation processes.
The first as presented in LEMMING requires a versioned dataset as input.
java -jar lemming.jar graph -ds <dataset> -nv <num_vertices> -thrs <threads> -c <class_selection> -v <vertex_selection>
Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
| -ds | True | NA | Dataset {dbp, pg, swdf, lgeo, geology} |
| -dp | True | NA | Dataset path. Only required in single-graph mode and when the dataset is not present in application.properties. |
| -nv | True | NA | Desired number of vertices in the generated graph (number of vertices of the target graph) |
| -thrs | False | 1 | Number of threads |
| -s | False | System.currentTimeMillis() | Seed for results reproduction. |
| -m | False | Binary | Generation type {Binary, Simplex, Baseline} |
| -c | False | UCS | Type of class selector {UCS, BCS, CCS} |
| -v | False | UCS | Type of vertex selector {UIS, BIS} |
| -sp | False | UCS | Only used in Simplex mode. Simplex property sampling scheme, either biased or uniform. {BP, UP} |
| -sc | False | UCS | Only used in Simplex mode. Simplex class sampling scheme, either biased or uniform. {BC, UC} |
| -sc | False | UCS | Only used for baseline generators. Barabási–Albert and Watts–Strogatz. {BA, WS} |
| -op | False | 0 | Number of optimization iterations |
This mode requires only one graph version as input and skips the preprocessing and the optimization stage as a result.
java -jar lemming.jar single-graph -ds <dataset> -dp <dataset-path> -nv <num_vertices> -thrs <threads> -c <class_selection> -v <vertex_selection>
LEMMING includes a preprocessing stage where invariant arithmetic expressions are learned for a given dataset. This stage runs by default if LEMMING does not find the path to the preprocessed data. The expressions are saved in value_store.val. However, you can explicitly run it using:
java -jar lemming.jar store -ds <dataset>
Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
| -ds | True | NA | Dataset {dbp, pg, swdf, lgeo, geology} |
| -dp | True | NA | Dataset path. Only required when the dataset is not present in application.properties. |
| --min-fitness | False | 100000.0 | Minimum Fitness |
| ---max-iterations | False | 50 | Maximum number of iterations |
Internally, Lemming is using the Grph library.
For testing, we are using the email-Eu-core network published by the Stanford University. It has been transformed into a simple RDF file.
The Lemming logo has been created by TortugaAttack.
Download the datasets with:
wget https://files.dice-research.org/projects/Lemming/datasets.tar.gz && tar -xzf datasets.tar.gz --remove-files
Generate the graphs for all generator types for all datasets:
bash generate_graphs.sh swdf 32
bash generate_graphs.sh lgeo 32
bash generate_graphs.sh geology 32
The triple stores benchmark was done through IGUANA on Tentris, Virtuoso, Apache Jena Fuseki, GraphDB and Blazegraph triple stores. The benchmarking should be run for each of the generated graphs and the target graph. Please note that the target graph in this step should be the pre-processed one (after materialization). We have prepared scripts to manage the lifecycle of the triplestores, as well as upload the graphs to the triple store and starting IGUANA:
bash run_all.sh /home/lemming/generated_graphs/
You can find the original LEMMING files in here and the SimplexKG files here.
@inproceedings{roeder2021lemming,
author = {R{\"o}der, Michael and Nguyen, Pham Thuy Sy and Conrads, Felix and da Silva, Ana Alexandra Morim and Ngomo, Axel-Cyrille Ngonga},
booktitle = {Proceedings of the 15th IEEE International Conference on Semantic Computing (ICSC)},
doi = {10.1109/ICSC50631.2021.00015},
pages = {62-69},
publisher = {IEEE Computer Society},
title = {LEMMING -- Example-based Mimicking of Knowledge Graphs},
url = {https://doi.org/10.1109/ICSC50631.2021.00015},
year = 2021
}
