Skip to content

An automaton made of STE's which is used to determine how closely a given pattern string (p), matches an input string within a given Levenshtein edit distance (d).

License

Notifications You must be signed in to change notification settings

jeffudall/Leven

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

107 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Levenshtein

A Levenshtein automoton contains a pattern string (p) and a Levenshtein edit distance (d). Any input string that matches pattern string within the edit distance will report a match. A pattern string is used to populate the Levenshtein automaton STE's with characters. The edit distance is the number of edits - insert, substitute, or delete - that are allowed in any matchine string. Levenshtein automata are used to find closely matching strings, for instance a spell checker could use a Levenshtein automata to look for cloesly matching words for a misspelled word.

Example:

Let's say the given string pattern is s="wahoo" and the Levenshtein edit distance is d=2. A Levenshtein automaton with this pattern and edit distance would look like Figure 1 below. The first number in each state's name is the column number, from zero to the number of characters; for "wahoo" this would be 0-5. The second number is number of edits, from 0 to the edit distance; for d=2 it would be 0-2.

state_wahoo_d2

Figure 1 - Levenshtein automaton with string pattern s="wahoo" and edit distance of d=2.[1]
(Starting/always active state is green and reporting states have purple outlines.)

Starting at the first state, (0.0) we feed in the first character of an input string. If the first character is "w" this would move the state to the right, to (1.0), indicating one character match step and zero edits. If it was a blank space this would move to the state right above, to (0.1), indicating it zero successful character steps and one deletion edit. If it was any other character it would move above and to the right, to (1.1), indicating one character step with one insertion or substitution edit. It would then continue on to examine the second character at the active state, either (1.0), (0.1), or (1.1), and move to the next state depending on if it gets a match(right), deletion(up), or insert/substitution(up and right).

If it reaches any of the states to the far right (any 5.X states) this indicates the input string (p) matches the given pattern string within the given edit distance (d). However, if it reaches the top level of the automaton (any of the X.2 states) and encounters a futher edit this will terminate it's movement through the Levenshtein automata, thus preventing a positive match.

Automata Processor

In order to make a Levenshtein automaton using Micron's Automata Processor (AP) we must convert the state machine to an automaton using state transition elements (STE's), which are the single automaton resource units of the AP.

STE's only output a logical yes/no match for the character they are looking for. This means that every output arrow will turn on the attached STE if it finds a match. This also means that extra STE's must be used to simulate the behavior of the automaton. These extra STE's, ( * ), will match on any input and connecting them as shown below in Figure 2 allow us to construct a Levenstein automaton using STE's that has the same behavior as the one above using states in Figure 1.

state_wahoo_d2_AP

Figure 2 - Levenshtein automaton built with STE's for the Automata Processor(AP)
(Starting/always active STE's are green and reporting STE's are purple.)

Levenshtein Program

The leven program will accept a string or file of strings as input patterns and create Levenshtein automata with an edit distance chosen by the user. It can also create automatons with random patterns of DNA characters or alpha-numeric characters with a pattern width and edit distance chosen by the user. It will then create an automata network markup language (ANML) file, which is an XML file using the ANML syntax created by Micron.

(See README file in the Code folder for more information on input syntax and program behavior.)

The leven program is dependant on VASim by Jack Wadden from University of Virginia. You will need to download VASim and place it in one folder up when compiling main.cpp (or edit the make file).

References

[1] T. Tracy et al., "Nondeterministic Finite Automata in Hardware - the Case of the Levenshtein Automaton" University of Verginia, Charlottesville, VA, 2016.

About

An automaton made of STE's which is used to determine how closely a given pattern string (p), matches an input string within a given Levenshtein edit distance (d).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages