Skip to content

dice-group/Lemming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

928 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOI Codacy Badge

LEMMING: Example Mimicking Knowledge Graph Generators

This is the repository of LEMMING, an ExaMple MImickiNg graph Generator, and SimplexKG, A Simplex Approach to Synthetic Knowledge Graph Generation (Link to be added).

LEMMING contains Synthetic Knowledge Graph Generators based on instance data.

Prerequisites and Project Build

Prerequisites

  • Java Development Kit (JDK): Version 17 or later.

  • Apache Maven: Version 3.6.

Building the project

You can either use the pre-built JAR file provided in this repository, or build it yourself using:

mvn clean package

The JAR file will be located in the target directory after the build process is complete.

Sample run

mvn clean package
java -jar target/lemming.jar single-graph -ds test -dp src/test/resources/snippet_linkedgeo.nt -nv 10

How to run

LEMMING currently supports 2 graph generation processes.

1. Versioned graph generation

The first as presented in LEMMING requires a versioned dataset as input.

java -jar lemming.jar graph -ds <dataset> -nv <num_vertices> -thrs <threads> -c <class_selection> -v <vertex_selection>

Parameters

ParameterRequiredDefaultDescription
-dsTrueNADataset {dbp, pg, swdf, lgeo, geology}
-dpTrueNADataset path. Only required in single-graph mode and when the dataset is not present in application.properties.
-nvTrueNADesired number of vertices in the generated graph (number of vertices of the target graph)
-thrsFalse1Number of threads
-sFalseSystem.currentTimeMillis()Seed for results reproduction.
-mFalseBinaryGeneration type {Binary, Simplex, Baseline}
-cFalseUCSType of class selector {UCS, BCS, CCS}
-vFalseUCSType of vertex selector {UIS, BIS}
-spFalseUCSOnly used in Simplex mode. Simplex property sampling scheme, either biased or uniform. {BP, UP}
-scFalseUCSOnly used in Simplex mode. Simplex class sampling scheme, either biased or uniform. {BC, UC}
-scFalseUCSOnly used for baseline generators. Barabási–Albert and Watts–Strogatz. {BA, WS}
-opFalse0Number of optimization iterations

2. Single-version graph generation

This mode requires only one graph version as input and skips the preprocessing and the optimization stage as a result.

java -jar lemming.jar single-graph -ds <dataset> -dp <dataset-path> -nv <num_vertices> -thrs <threads> -c <class_selection> -v <vertex_selection>

Preprocessing stage

LEMMING includes a preprocessing stage where invariant arithmetic expressions are learned for a given dataset. This stage runs by default if LEMMING does not find the path to the preprocessed data. The expressions are saved in value_store.val. However, you can explicitly run it using:

java -jar lemming.jar store -ds <dataset>

Parameters

ParameterRequiredDefaultDescription
-dsTrueNADataset {dbp, pg, swdf, lgeo, geology}
-dpTrueNADataset path. Only required when the dataset is not present in application.properties.
--min-fitnessFalse100000.0Minimum Fitness
---max-iterationsFalse50Maximum number of iterations

Used data and software

Internally, Lemming is using the Grph library.

For testing, we are using the email-Eu-core network published by the Stanford University. It has been transformed into a simple RDF file.

The Lemming logo has been created by TortugaAttack.

Reproducing experiments

Download the datasets with:

wget https://files.dice-research.org/projects/Lemming/datasets.tar.gz && tar -xzf datasets.tar.gz --remove-files

Generate the graphs for all generator types for all datasets:

bash generate_graphs.sh swdf 32
bash generate_graphs.sh lgeo 32
bash generate_graphs.sh geology 32

The triple stores benchmark was done through IGUANA on Tentris, Virtuoso, Apache Jena Fuseki, GraphDB and Blazegraph triple stores. The benchmarking should be run for each of the generated graphs and the target graph. Please note that the target graph in this step should be the pre-processed one (after materialization). We have prepared scripts to manage the lifecycle of the triplestores, as well as upload the graphs to the triple store and starting IGUANA:

bash run_all.sh /home/lemming/generated_graphs/

Files

You can find the original LEMMING files in here and the SimplexKG files here.

How to cite

@inproceedings{roeder2021lemming,
  author = {R{\"o}der, Michael and Nguyen, Pham Thuy Sy and Conrads, Felix and da Silva, Ana Alexandra Morim and Ngomo, Axel-Cyrille Ngonga},
  booktitle = {Proceedings of the 15th IEEE International Conference on Semantic Computing (ICSC)},
  doi = {10.1109/ICSC50631.2021.00015},
  pages = {62-69},
  publisher = {IEEE Computer Society},
  title = {LEMMING -- Example-based Mimicking of Knowledge Graphs},
  url = {https://doi.org/10.1109/ICSC50631.2021.00015},
  year = 2021
}

About

LEMMING is an ExaMple MImickiNg graph Generator

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors