Match subnets in O(n^3) via linear_sum_assignment. #795
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
To solve subnets, instead of brute-forcing the n! combinations, use scipy.optimize.linear_sum_assignment (commonly known as the Hungarian/Munkres algorithm, although scipy actually uses a different algorithm) which provides a solution in O(n^3). This is not an original idea; see e.g. #450 and
https://imagej.net/imagej-wiki-static/TrackMate_Algorithms.html#Solving_LAP
Locally, this method solves the "slow" example in adaptive-search.ipynb (
tracks_regular = trackpy.link_df(cg, 0.75)) in ~50ms instead of 1min. linear_sum_assignment ("lsa") was extensively benchmarked in scipy PR#12541 (which was most about adding a sparse variant, but one can just look at the performance of lsa), which shows sub-second runtimes for thousands of inputs. This is also why this PR fully skips the use of MAX_SUB_NET (at most, one may consider setting an alternate value in the thousands for this strategy...).(In fact, it may perhaps(?) be possible to get even better performance via by completely dropping the subnet paradigm and just passing the whole distance matrix (sparsified by removing edges greater than the search range) to scipy.sparse.csgraph.min_weight_full_bipartite_matching, but 1) one needs to figure out how to handle unmatched points (maybe by adding "dummy" points with a high but finite link cost), and 2) this would require more massive code surgery anyways. Edit: Maybe this doesn't actually work if we want to maintain the current approach of having too-long edges within a subnet be set to search_range**2.)
The docs will need to be updated, but this PR is just to discuss the implementation.