Properties file

The properties file should have the following format:

{KEY} = {VALUE}
{KEY} = {VALUE}
...

General properties

Are used/Should be set for every action.

Key	Value type	Description	Default
PATH_TRAINING	Valid path (file)	Path to training file (absolute or relative)	train.txt
PATH_TEST	Valid path (file)	Path to test file (absolute or relative)	test.txt
PATH_VALID	Valid path (file)	Path to validation file (absolute or relative)	valid.txt
PATH_RULES	Valid path (file)	Path to rule set file (absolute or relative)	rules.txt
DISCRIMINATION_BOUND	Integer	Discriminates rules which result sets have more elements than this bound. (Also used for limiting memory consumption.) 0 means no limit.	4000
UNSEEN_NEGATIVE_EXAMPLES	Integer	The number of negative examples for which we assume that they exist, however, we have not seen them. Rules with high coverage are favoured the higher the chosen number.	5
REFLEXIV_TOKEN	String	Token used for substitution of reflexive rules. (Used if ruleset was trained with REWRITE_REFLEXIV = TRUE)	me_myself_i
TOP_K_OUTPUT	Integer	The top-k results that are after filtering kept in the results.	10
WORKER_THREADS	Integer	Amount of threads that are used for computation. (-1 means all threads are used)	-1

ACTION calcjacc

Calculates the similarity matrices (Jaccard index) of all relations used for aggregating with non-redundant noisy-or. The Jaccard index is estimated using MinHash. Output: Binary files storing the Jaccard indices between rules for each relation.

Key	Value type	Description	Default
		General properties (see table above)
PATH_JACCARD	Valid path (directory)	Path to the directory used for storing the binary similarity matrix files	jaccard
RESOLUTION	Integer	Sets the accuracy of the Jaccard estimation. The number of hash functions used in MinHash (f.e. RESOLUTION = 200 --> 200 hash functions --> Max resolution of Jaccard 1/200)	200
SEED	Integer	Seed for generating hash functions used in MinHash	0

ACTION learnnrnoisy

Learns the optimal thresholds for clustering the rules on similarity. There are two possible search strategies: grid search and random search.

Requires calculation of similarity matrices (calcjacc).

Key	Value type	Description	Default
		General properties (see table above)
PATH_JACCARD	Valid path (directory)	Path to the directory containing the binary similarity matrix files	jaccard
PATH_CLUSTER	Valid path (file)	Path to file used for storing clustering results	cluster.txt
BUFFER_SIZE	Integer	Buffer size (in amount of integers, 4 byte) used to limit memory consumption of buffering previously inferred rules. Should only be set if running out of memory. (2500000000 --> ~10 GB)	Maximum unsigned long long
STRATEGY	[grid\|random]	Sets the search strategy to be used for finding optimal clustering	grid
ITERATIONS	Integer	Amount of iterations used in random search strategy	10000
RESOLUTION	Integer	Determines smallest possible (1/RESOLUTION) change of the threshold. (Amount of iterations used in grid search strategy, Limitation of search space in random search)	200
SEED	Integer	Seed for the sampling of thresholds used in random search strategy	0

ACTION applynrnoisy

Key	Value type	Description	Default
		General properties (see table above)
PATH_CLUSTER	Valid path (file)	Path to file containing clustering results	cluster.txt
PATH_OUTPUT	Valid path (file)	Path to file used for storing predictions	predictions.txt

ACTION applynoisyonly | applymaxonly

Key	Value type	Description	Default
		General properties (see table above)
PATH_OUTPUT	Valid path (file)	Path to file used for storing predictions	predictions.txt

Trial properties

Key	Value type	Description	Default
TRIAL	[0\|1]	If set to 1, rules are only applied to a representative sample of all testtriples, sample size is calculated according to CONFIDENCE_LEVEL and MARGIN_OF_ERROR	0
PATH_TEST_SAMPLE	Valid path	Path to the testtriples of the sample (Used for evaluation), can be absolute or relative to application (file is created, if it already exists it is overwritten)	"test_sample.txt" (relative to exe)
CONFIDENCE_LEVEL	[80\|85\|90\|95\|99]	Confidence level of evaluation results	95
MARGIN_OF_ERROR	Integer (Percent)	Margin of error +- of evaluation results	5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Properties file

General properties

ACTION calcjacc

ACTION learnnrnoisy

ACTION applynrnoisy

ACTION applynoisyonly | applymaxonly

Trial properties

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally