Hey Gio/Sten,
Thanks for writing this, it explains a lot to me that the four lines of maths-code in the SPIN paper did not.
However, I noticed that a lot, if not all, of the matrix operations in the code produce new matrices that are only used once. That is wasteful, especially if the operations are simple enough to be memory bound (multiplication, addition). The effect is also stronger the bigger the matrix used, which I guess is true for the new dataset.
For example, the code is full of lines such as:
tmpmat = dist_matrix[indexes,:]
tmpmat = tmpmat[:,indexes]
This is in lines 130-131, and inside an inner loop. Per the documentation:
Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).
Warning
The definition of advanced indexing means that x[(1,2,3),] is fundamentally different than x[(1,2,3)]. The latter is equivalent to x[1,2,3] which will trigger basic selection while the former will trigger advanced indexing. Be sure to understand why this is occurs.
Also recognize that x[[1,2,3]] will trigger advanced indexing, whereas x[[1,2,slice(None)]] will trigger basic slicing.
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing
Since I'm not entirely sure what those lines do at the moment I'm not sure how to rewrite them, but I am sure they can be rewritten to make good use of numpy views and reduce memory allocation/garbage collection.
Another example is on line 97:
mismatch_score = dot(dist_matrix, weights_mat)
Per the numpy docs this produces a new matrix each time - if we pre-allocate mismatch_score and reuse it (assuming the dimensions are constant, and they appear to be) we can do:
dot(dist_matrix, weights_mat, mismatch_score)
This would save the repeated memory allocation (and eventual garbage collection); since this function is also called inside an inner loop and the comment above it complains that it gets quite slow, this might have quite a big effect.
One last example from lines 62-66:
sqd = (arange(1,mat_size+1)[newaxis,:] - arange(1,mat_size+1)[:,newaxis])**2
#make the distance relative to the mat_size
norm_sqd = sqd/wid
#evaluate a normal pdf
weights_mat = exp(-norm_sqd/mat_size)
I haven't tested this, but I based on the documentation I think this can be rewritten to:
sqd = arange(1,mat_size+1)[:,newaxis])**2
# in-place subtraction, note that I've hoisted the unary minus,
# since -(a - b) == b - a
sqd -= (arange(1,mat_size+1)[newaxis,:]
# multiplication is much faster than division, at the cost of negligible rounding error
sqd *= 1.0/(wid*mat_size)
#make the distance relative to the mat_size and evaluate a normal pdf
weights_mat = exp(sqd, sqd)
This would reduce the number of allocated matrices from six to two. The biggest downside would be that the new code is less readable - so I'd annotate it to explain the unoptimised operations in the comments.
If you think it's a good use of my time I would like to make an "in-place operations" branch where I'll try to optimise all of the low-hanging fruit here - I suspect it could speed up the code quite a bit, and even if it turns out not to make too much of a difference it will help me get a feeling for the BackSPIN algorithm.
Hey Gio/Sten,
Thanks for writing this, it explains a lot to me that the four lines of maths-code in the SPIN paper did not.
However, I noticed that a lot, if not all, of the matrix operations in the code produce new matrices that are only used once. That is wasteful, especially if the operations are simple enough to be memory bound (multiplication, addition). The effect is also stronger the bigger the matrix used, which I guess is true for the new dataset.
For example, the code is full of lines such as:
This is in lines 130-131, and inside an inner loop. Per the documentation:
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing
Since I'm not entirely sure what those lines do at the moment I'm not sure how to rewrite them, but I am sure they can be rewritten to make good use of numpy views and reduce memory allocation/garbage collection.
Another example is on line 97:
Per the numpy docs this produces a new matrix each time - if we pre-allocate
mismatch_scoreand reuse it (assuming the dimensions are constant, and they appear to be) we can do:This would save the repeated memory allocation (and eventual garbage collection); since this function is also called inside an inner loop and the comment above it complains that it gets quite slow, this might have quite a big effect.
One last example from lines 62-66:
I haven't tested this, but I based on the documentation I think this can be rewritten to:
This would reduce the number of allocated matrices from six to two. The biggest downside would be that the new code is less readable - so I'd annotate it to explain the unoptimised operations in the comments.
If you think it's a good use of my time I would like to make an "in-place operations" branch where I'll try to optimise all of the low-hanging fruit here - I suspect it could speed up the code quite a bit, and even if it turns out not to make too much of a difference it will help me get a feeling for the BackSPIN algorithm.