Skip to content

Properties file

Simon Ott edited this page Dec 13, 2020 · 17 revisions

The properties file should have the following format:

{KEY} = {VALUE}
{KEY} = {VALUE}
...

General properties

Are used/Should be set for every action.

Key Value type Description Default
PATH_TRAINING Valid path (file) Path to training file (absolute or relative) train.txt
PATH_TEST Valid path (file) Path to test file (absolute or relative) test.txt
PATH_VALID Valid path (file) Path to validation file (absolute or relative) valid.txt
PATH_RULES Valid path (file) Path to rule set file (absolute or relative) rules.txt
DISCRIMINATION_BOUND Integer Discriminates rules which result sets have more elements than this bound. (Also used for limiting memory consumption.) 0 means no limit. 4000
UNSEEN_NEGATIVE_EXAMPLES Integer The number of negative examples for which we assume that they exist, however, we have not seen them. Rules with high coverage are favoured the higher the chosen number. 5
REFLEXIV_TOKEN String Token used for substitution of reflexive rules. (Used if ruleset was trained with REWRITE_REFLEXIV = TRUE) me_myself_i
TOP_K_OUTPUT Integer The top-k results that are after filtering kept in the results. 10
WORKER_THREADS Integer Amount of threads that are used for computation. (-1 means all threads are used) -1

ACTION calcjacc

Calculates the similarity matrices (Jaccard index) of all relations used for aggregating with non-redundant noisy-or. The Jaccard index is estimated using MinHash. Output: Binary files storing the Jaccard indices between rules for each relation.

Key Value type Description Default
General properties (see table above)
PATH_JACCARD Valid path (directory) Path to the directory used for storing the binary similarity matrix files jaccard
RESOLUTION Integer Sets the accuracy of the Jaccard estimation. The number of hash functions used in MinHash (f.e. RESOLUTION = 200 --> 200 hash functions --> Max resolution of Jaccard 1/200) 200
SEED Integer Seed for generating hash functions used in MinHash 0

ACTION learnnrnoisy

Learns the optimal thresholds for clustering the rules on similarity. There are two possible search strategies: grid search and random search.

Requires calculation of similarity matrices (calcjacc).

Key Value type Description Default
General properties (see table above)
PATH_JACCARD Valid path (directory) Path to the directory containing the binary similarity matrix files jaccard
PATH_CLUSTER Valid path (file) Path to file used for storing clustering results cluster.txt
BUFFER_SIZE Integer Buffer size (in amount of integers, 4 byte) used to limit memory consumption of buffering previously inferred rules. Should only be set if running out of memory. (2500000000 --> ~10 GB) Maximum unsigned long long
STRATEGY [grid|random] Sets the search strategy to be used for finding optimal clustering grid
ITERATIONS Integer Amount of iterations used in random search strategy 10000
RESOLUTION Integer Determines smallest possible (1/RESOLUTION) change of the threshold. (Amount of iterations used in grid search strategy, Limitation of search space in random search) 200
SEED Integer Seed for the sampling of thresholds used in random search strategy 0

ACTION applynrnoisy

Key Value type Description Default
General properties (see table above)
PATH_CLUSTER Valid path (file) Path to file containing clustering results cluster.txt
PATH_OUTPUT Valid path (file) Path to file used for storing predictions predictions.txt

ACTION applynoisyonly | applymaxonly

Key Value type Description Default
General properties (see table above)
PATH_OUTPUT Valid path (file) Path to file used for storing predictions predictions.txt

Trial properties

Key Value type Description Default
TRIAL [0|1] If set to 1, rules are only applied to a representative sample of all testtriples, sample size is calculated according to CONFIDENCE_LEVEL and MARGIN_OF_ERROR 0
PATH_TEST_SAMPLE Valid path Path to the testtriples of the sample (Used for evaluation), can be absolute or relative to application (file is created, if it already exists it is overwritten) "test_sample.txt" (relative to exe)
CONFIDENCE_LEVEL [80|85|90|95|99] Confidence level of evaluation results 95
MARGIN_OF_ERROR Integer (Percent) Margin of error +- of evaluation results 5

Clone this wiki locally