-
Notifications
You must be signed in to change notification settings - Fork 126
Add parallelization functions to the package #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Full parallelization was added using the joblib library as well as a helper function to handle the nested path lists. modified: README.rst modified: fastdtw/__init__.py modified: fastdtw/_fastdtw.pyx modified: fastdtw/fastdtw.py
The files have been updated to incorporate the respective changes within the _fastdtw.pyx and fastdtw.py files. modified: fastdtw/__init__.py modified: fastdtw/_fastdtw.cpp
|
@lvermue Thank you for the PR!
|
|
@lvermue Is just writing something like this code insufficient? import itertools
from fastdtw import fastdtw
from joblib import Parallel, delayed
import numpy as np
X = np.random.randint(1, 40, size=(100, 100))
results = Parallel(n_jobs=-1)(delayed(fastdtw)(X[i], X[j]) for i, j in itertools.product(range(100), repeat=2))
distance_mat = np.array([r[0] for r in results]).reshape(100, 100) |
|
@slaypni There are two main aspects to this:
|
Previously the method assumed symmetric behaviour of the DTW-method and created symmetric distance matrices by copying the upper triangle distances to the lower triangle. Now the method correctly calculates the lower triangle by explicitly calculating those inverse relations. modified: fastdtw/_fastdtw.cpp modified: fastdtw/_fastdtw.pyx modified: fastdtw/fastdtw.py
|
@lvermue As you mentioned, the simple script could reduce the execution time by half replacing So I think the proposed version is good for the use of computing distance matrix, but also prefer to have some changes in terms of its code structure. Glimpsing diff of the code, I noticed there are same pattern of codes which seem redundant. So it is nicer to gather those codes. And, computing distance matrix is a bit out of the scope of this package, however it would be nice to have convenient function to calculate it. So, I would like to have the function under Taking those into account, I prefer something like the following from functools import partial
from fastdtw import fastdtw
from fastdtw.util import distmat
dists, paths = distmat(partial(fastdtw, radius=3), X) |
Full parallelization was added to the package using the joblib library.
Now NxM matrices, i.e. N-time series with M-time points, can be calculated in parallel.
To embed different lengths the missing time points can be padded with np.nan values.
The changes were tested on a machine with 20 cores leading to following results:
Single core
Parallel
Examples on how to use the new functions were added to the README.rst file and the docstring of the respective functions.