This repository contains the HPM tools for
- Automaton and SubCSL synthesis
- Trage generation
- Data import from Nibe ZIP format
- Trace selection and merging
- Learning
- Validation.
The project can be used via Gradle:
./gradlew build
./gradlew run --args "--help" # or similarAdditionally, executable bundles can be built:
./gradlew distZip # ZIP with dependencies
./gradlew jpackage # Installer with JAR, needs platform build tools
./gradlew singularityImage # Singularity image, needs singularity and Linux platformStyle is enforced via Spotless and wrapped inside Gradle's testing machinery:
./gradlew spotlessApply # Format all files, should be run before committing or full building
./gradlew check # Run tests and style checkerWhen properly installed via the artifact created by jpackage or inside the singularity container,
the tool can be invoked as heat-pump-mining [ARGS]. Note that when running via Gradle or out of an
IDE, only the arguments can be passed (i.e., heat-pump-mining is not part of the normal
parameters). The tools are documented in alphabetical order.
Note that for trace databases, compression is automatically determined by the file name extension:
data.json.zst is automatically (de-)coompressed using the Zstandard algorithm, data.json.gz uses
Gzip, data.json is uncompressed, etc.
This is a tool for validation that compares the hitting times predicted by an automaton to those in an actual trace and reports the resulting delta. The trace is split into prefix-suffix pairs with increasing prefix size. The prefix is then used to compute the current location in the automaton using Viterbi's algorithm. Then, a prediction for hitting certain events is made and compared to the data in the actual traces.
Usage:
heat-pump-mining compute-hitting-times <options>Options:
--automaton automaton.dotthe automaton to use for predictions.--input traces.json.zstthe trace database.--output results.csvresults file, defaults to standard output.--initprepend the initial state's output to the trace, required when used for learning.--parallelenable multithreaded computation.--events event_1 event_2 …the target events for hitting time prediction.--samples nthe number of prefix-suffix splits to perform.
A validation tool that computes the revision score for a given trace database, i.e., the agreement of the traces and the automaton under IOalergia's stochastic tests. This can operate in a pre-trace mode, in which a revision score is computed for each trace in the database and then averaged, and a global mode, in which a score is computed for all traces at once.
Usage:
heat-pump-mining compute-revision-score <options>Options:
--automaton automaton.dotthe automaton to measure.--input traces.json.zstthe trace database.--output results.csvresults file, defaults to standard output.--initprepend the initial state's output to the trace, required when used for learning.- one of
--single(per-trace scoring),--single-parallel(multithreaded per-trace scoring), and--global(global scoring). --frequency-weight nscore bias between frequencies and times, defaults to 0.5.
Converts the change point detector's format (Nibe ZIP) to the JSON format used by all other tools in this software.
Usage:
heat-pump-mining convert-format <options>Options:
--input data.zipinput file.--output data.json.zstconverted file.
This evaluation tool measures the agreement of a given set of SubCSL formulas and a trace database using the approach for evaluating IOalergia.
Usage:
heat-pump-mining evaluate-sub-csl <options>Options:
--sub-csl formulas.txtformula file, each on a separate line.--traces traces.json.zsttrace database to use.--output results.txtresults file, defaults to standard output.
This tool uses a naive random generation approach to create a family of automata. The algorithm ensures that for each output, one state exists, and that the resulting automaton is deterministic.
Usage:
heat-pump-mining generate-automata <options>Options:
--output automaton.dottarget file pattern; each generated automaton is suffixed with-n.--automata nnumber of automata to generate, defaults to one.--alphabet o_1 o_2 …the output alphabet to use.--min-size n --max-size mdefines the range of states to generate.--min-exit-time n --max-exit-time mdefines the range of exit times to generate.--seed ndefines the generation seed, defaults to zero.
This tool randomly generates a set of SubCSL formulas. Each formula contains a temporal operator and propositional components of confugurable depth.
Usage:
heat-pump-mining generate-sub-csl <options>Options:
--output formulas.txttarget file.--formulas nnumber of formulas to generate, defaults to one.--alphabet o_1 o_2 …the output alphabet to use.--leaf-probability nthe probability that in each generation step for a propositional sub-formula, a literal is created. Lower values yield more complex formulas.--min-duration n --max-duration mdefines the range of acceptable durations for the temporal operator to generate.--seed ndefines the generation seed, defaults to zero.
Generate random traces from a given automaton.
Usage:
heat-pump-mining generate-traces <options>Options:
--automaton automaton.dotthe automaton to generate traces from.--output traces.json.zstthe trace database to generate.--traces nthe number of traces to generate, defaults to one.--mean-length nthe mean length of generated traces, defaults to 100.--length-stddev nstandard deviation of lengths, defaults to zero.- either
--normal-distributionor--exponential-distributionto select the distribution of exit times to assume. - one of
--nanoseconds,--microseconds,--milliseconds,--seconds,--minutes,--hours, or--daysto determine the precision used in exit time generation. This should roughly match the scale of the automaton's exit times. --seed ndefines the generation seed, defaults to zero.
Run the RTIOAlergia algorithm on a given trace database to obtain a matching automaton. If the input traces do not have the same starting event, a canonical start event is selected and all traces with a different event are ignored. Alternatively, an additional init event can be prepended to every trace. Note that this must be accounted for when using the automaton!
Usage:
heat-pump-mining learn <options>Options:
--input traces.json.zstthe trace database to learn from.--output automaton.dotlearned automaton.--initor--init valueprepend an initial event to every trace (default:$init). This avoids ignoring of traces.--blue-state-order orderone ofCANONICAL_ORDER,LEX_ORDER,FIFO_ORDER, andLIFO_ORDER; determines the order in which blue states are processed. Defaults to lexicographic.--parallelmultithreaded checking of merge eligibility.--nondeterministicsacrifice determinism for faster merging (not recommended).--frequency-epsilon ethe merge eligibility bound for frequency similarity, higher is stricter. Defaults to 0.05.--frequency-epsilon-decay zthe merge eligibility bound decay for frequency similarity. Defaults to one (disabled).--timing-epsilon ethe merge eligibility bound for timing similarity, higher is stricter. Defaults to 0.05.--timing-epsilon-decay zthe merge eligibility bound decay for timing similarity. Defaults to one (disabled).--tail-length kdisable stochastic tests in a certain recursive merge depth. Disabled by default.--consider-mergedperform stochastic tests on the merged automaton, not the PTA's data. Stochastically unsound, but helpful for sparse traces.
This tool pretty-prints an automaton into a shorted, more huma-readable expression. The new automaton cannot be used by any tool in this software, but can be rendered using Graphviz. The automaton only retains probabilities and average exit times, but no individual measurements. Additionally, transitions can be color-graded based on their probabilities.
Usage:
heat-pump-mining render-automaton <options>Options:
--input automaton.dotthe input automaton.--output neat.dotthe rendered automaton.--full-color-min ithe probability above which a transition is rendered in black. Transitions below this are rendered in grayscale proportional to their likeliness. Defaults to zero.--render-min ithe minimum probability to render a transition at all. Defaults to zero.
Selects a subset of events from the inputs. This requires a key event and a list of events. It then uses the change points detected for the key event to segment the trace resulting from the combination of all selected events. The change points for non-key events are ignored.
Usage:
heat-pump-mining select-and-merge <options>Options:
--input data.json.zstthe input database from acreat-formatinvocation.--output traces.json.zstthe merged trace database.--events e_1 e_2 …the events to include.--split-event ethe key event.- one of
--use-only-first(only use the 0th trace, creating one long trace),--use-all-overlapping(use all traces as is, yielding redundant data), anduse-all-non-overlapping(segment the sub-traces, yielding many non-overlapping traces; recommended default).