Add minimization algorithm for Acyclic FST#49
Add minimization algorithm for Acyclic FST#49AdrienDuff wants to merge 1 commit intofacebookresearch:mainfrom
Conversation
| Graph viterbiPath(const Graph& g); | ||
|
|
||
| /** Minimize an Acyclic FST */ | ||
| Graph minimizeAcyclicFST(const Graph& g); |
There was a problem hiding this comment.
Let's just call this minimize. Ideally we would throw an error if a cycle is detected in the graph.
|
|
||
|
|
||
| //Initialization | ||
| //a. Find all states with no outgoing arcs. (Since we are dealing with an acyclic FST, it is always possible.) |
There was a problem hiding this comment.
What I would do is rename the function to minimize, and throw an error if this condition is not met (e.g. if we don't find any start or accept states with 0 outgoing nodes can assume that the graph is cyclic). In the future we can attempt to handle this case.
| //a. Find all states with no outgoing arcs. (Since we are dealing with an acyclic FST, it is always possible.) | ||
| //b. Split the resulting set into 4 sets according to their START and ACCEPT status. | ||
|
|
||
| int nodeStartAccept = -1 , nodeStartNoAccept = -1, nodeNoStartAccept = -1, nodeNoStartNoAccept = -1; |
There was a problem hiding this comment.
In general we assume in GTN that graphs are "trim" meaning there are no paths which lead to dangling nodes. You are welcome to make the same assumption, or alternatively if you find any node that is not accepting that has 0 outgoing arcs we should ignore it since it does nothing for the set of paths the graph allows and really shouldn't be part of a "minimal" graph.
| Graph minimizeAcyclicFST(const Graph& g){ | ||
| Graph graph; | ||
| std::vector<int> oldToNew(g.numNodes(), -1); // a map between the nodes of g and the minimized graph. | ||
| std::vector<int> oldProcessed; // store which nodes has been processed in the g graph |
There was a problem hiding this comment.
You should make this a vector<bool> processed (the name old is redundant). Make it the size of the old graph number of nodes and initialize every element to false. Then to check if a node is already processed check processed[n].
| if (std::all_of(g.out(predNode).begin(), g.out(predNode).end(), | ||
| [&g, &oldProcessed](int a) {return std::count(oldProcessed.begin(), oldProcessed.end(), g.dstNode(a)) > 0;})){ |
There was a problem hiding this comment.
Change oldProcessed to processed as mentioned above and simplify this accordingly.
| if ( std::equal(g.out(node1).begin(), g.out(node1).end(), g.out(node2).begin(), [&g, &oldToNew](int a1, int a2){ | ||
| return (g.ilabel(a1) == g.ilabel(a2) && | ||
| g.olabel(a1) == g.olabel(a2) && | ||
| oldToNew[g.dstNode(a1)] == oldToNew[g.dstNode(a2)]);}) | ||
| ){ | ||
| return true; | ||
| } |
There was a problem hiding this comment.
nit: replace this block with return std::equal(...
The implementation follow the Bottom-Up Algorithm from http://www.cs.jhu.edu/~hajic/courses/cs226/alg.html
Tests are coming soon.