-
Notifications
You must be signed in to change notification settings - Fork 6
Properties file
Simon Ott edited this page Dec 13, 2020
·
17 revisions
The properties file should have the following format:
{KEY} = {VALUE}
{KEY} = {VALUE}
...
- General properties
- ACTION calcjacc
- ACTION learnnrnoisy
- ACTION applynrnoisy
- ACTION applynoisyonly | applymaxonly
- Trial properties
Are used/Should be set for every action.
| Key | Value type | Description | Default |
|---|---|---|---|
| PATH_TRAINING | Valid path (file) | Path to training file (absolute or relative) | train.txt |
| PATH_TEST | Valid path (file) | Path to test file (absolute or relative) | test.txt |
| PATH_VALID | Valid path (file) | Path to validation file (absolute or relative) | valid.txt |
| PATH_RULES | Valid path (file) | Path to rule set file (absolute or relative) | rules.txt |
| DISCRIMINATION_BOUND | Integer | Discriminates rules which result sets have more elements than this bound. (Also used for limiting memory consumption.) 0 means no limit. | 4000 |
| UNSEEN_NEGATIVE_EXAMPLES | Integer | The number of negative examples for which we assume that they exist, however, we have not seen them. Rules with high coverage are favoured the higher the chosen number. | 5 |
| REFLEXIV_TOKEN | String | Token used for substitution of reflexive rules. (Used if ruleset was trained with REWRITE_REFLEXIV = TRUE) | me_myself_i |
| TOP_K_OUTPUT | Integer | The top-k results that are after filtering kept in the results. | 10 |
| WORKER_THREADS | Integer | Amount of threads that are used for computation. (-1 means all threads are used) | -1 |
Calculates the similarity matrices (Jaccard index) of all relations used for aggregating with non-redundant noisy-or. The Jaccard index is estimated using MinHash. Output: Binary files storing the Jaccard indices between rules for each relation.
| Key | Value type | Description | Default |
|---|---|---|---|
| General properties (see table above) | |||
| PATH_JACCARD | Valid path (directory) | Path to the directory used for storing the binary similarity matrix files | jaccard |
| RESOLUTION | Integer | Sets the accuracy of the Jaccard estimation. The number of hash functions used in MinHash (f.e. RESOLUTION = 200 --> 200 hash functions --> Max resolution of Jaccard 1/200) | 200 |
| SEED | Integer | Seed for generating hash functions used in MinHash | 0 |
Learns the optimal thresholds for clustering the rules on similarity. There are two possible search strategies: grid search and random search.
Requires calculation of similarity matrices (calcjacc).
| Key | Value type | Description | Default |
|---|---|---|---|
| General properties (see table above) | |||
| PATH_JACCARD | Valid path (directory) | Path to the directory containing the binary similarity matrix files | jaccard |
| PATH_CLUSTER | Valid path (file) | Path to file used for storing clustering results | cluster.txt |
| BUFFER_SIZE | Integer | Buffer size (in amount of integers, 4 byte) used to limit memory consumption of buffering previously inferred rules. Should only be set if running out of memory. (2500000000 --> ~10 GB) | Maximum unsigned long long |
| STRATEGY | [grid|random] | Sets the search strategy to be used for finding optimal clustering | grid |
| ITERATIONS | Integer | Amount of iterations used in random search strategy | 10000 |
| RESOLUTION | Integer | Determines smallest possible (1/RESOLUTION) change of the threshold. (Amount of iterations used in grid search strategy, Limitation of search space in random search) | 200 |
| SEED | Integer | Seed for the sampling of thresholds used in random search strategy | 0 |
| Key | Value type | Description | Default |
|---|---|---|---|
| General properties (see table above) | |||
| PATH_CLUSTER | Valid path (file) | Path to file containing clustering results | cluster.txt |
| PATH_OUTPUT | Valid path (file) | Path to file used for storing predictions | predictions.txt |
| Key | Value type | Description | Default |
|---|---|---|---|
| General properties (see table above) | |||
| PATH_OUTPUT | Valid path (file) | Path to file used for storing predictions | predictions.txt |
| Key | Value type | Description | Default |
|---|---|---|---|
| TRIAL | [0|1] | If set to 1, rules are only applied to a representative sample of all testtriples, sample size is calculated according to CONFIDENCE_LEVEL and MARGIN_OF_ERROR | 0 |
| PATH_TEST_SAMPLE | Valid path | Path to the testtriples of the sample (Used for evaluation), can be absolute or relative to application (file is created, if it already exists it is overwritten) | "test_sample.txt" (relative to exe) |
| CONFIDENCE_LEVEL | [80|85|90|95|99] | Confidence level of evaluation results | 95 |
| MARGIN_OF_ERROR | Integer (Percent) | Margin of error +- of evaluation results | 5 |