Cleaner HON Build code with faster computation #11
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi all!
After doing some tests, I seems to me that there is a cleaner way of defining the HON network (or rather the Edge set) from a set of Rules.
Let S=s1s2s3... be a rule and t a symbol in the input sequence such that Rules[S][t] > 0,
this will correspond in HON to a directed edge S -> S't where S't is the concatenation of S' and t
here S'=s3s4... is the longest suffix of S such that S't is an existing rule.
For example if S=abc and Rules[abc][t] > 0 then there will be an edge abc -> bct if 'abct' was not detected as a rule while 'bct' was.
If no extension of 't' was detected as a rule, we would simply have abc -> t (since the longest suffix S' is here the empty string).
From this definition we can rewrite the code of "BuildNetwork.py" without using any rewiring of edges. Using real datasets I got the same HON with the proposed changes. Moreover my method seems to be faster when there are a lot of edges to add (or rewiring to do in the current version).
I hope I didn't miss something. I tested both approaches on different real world sequences and the new definition of the edge set is comprehensive IMO but still ...