-
Notifications
You must be signed in to change notification settings - Fork 2
Major build cleanup. #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
I am happy with where this is, I think the only thing I could pessimistically think of is a test (with fixed seed) that equates the result with a known quantity. |
1450688 to
18b22ea
Compare
|
I moved the consistency test to the start of the commit chain and remade it after disabling Then I turned |
4dd81ff to
4184349
Compare
Turning off -ffast-math is because otherwise any little compile flags change will change the output. Also remove unwanted library_dirs and include_dirs.
Fix the Makefile dependencies so things get rebuilt when they should be.
Add a `test` target.
Only produce one .so file, which knows how to run both 2d and 3d.
Break out main and the I/O methods into a separate cpp file, which is
not used for the python extension module.
Do not create pointless objects. The methods are all static. TSNE is
now a namespace, not a class, and the exported `run` method is wrapped
in `extern "C"` to make linkage easier.
Minor cleanup of includes / using statements.
Because I just couldn't help myself, remove a couple places where
new/delete were being used for no good reason.
Get rid of a few places where NDIMS (the build time constant) and a
run-time variable, which are supposed to have the same value, were both
being used in a method. This means flowing the template a bit further
up the tree. There's lots more places this could be done - this only
addresses the cases where not doing so could lead to errors if the
template variable was inconsistent with the run-time variable.
Kill off the last pointless vestiges of the cblas dependency.
Use hidden visibility where appropriate - only export the run() method.
This cuts the size of the .so and allows the compiler to inline more
aggressively, especially since we've also turned on LTO.
Together, all of these changes result in a build with a single 901k
extension module, instead of two 1.9MB extension modules.
Set -std=c++14 in the build. This is required by some of the cleanups
because of the use of the `nullptr` keyword. Older gcc will not default
to c++11 or higher, so will fail to build nullptr.
And c++17 doesn't work because
```
include/python2.7/stringobject.h:175:5: error:
ISO C++17 does not allow 'register' storage class specifier [-Wregister]
register Py_ssize_t *len /* pointer to length variable or NULL
^~~~~~~~~
```
Update travis build to Xenial container. This updates gcc to 5.4, which has
the required c++ standards support.
Also flow down templating to a few more places, and use arrays instead of vectors where possible.
Allocate an array of children, rather than having an array of child pointers. Cleanup useless methods, constness.
This decreases the in-memory size of the tree, which significantly increases the speed since it's easier for the CPU to fit it in cache. There's more that can be done with this - in particular, the boundary. But that's more complicated.
Rather than downloading an entire miniconda distribution, just `pip install` the requisite packages. Should speed up the build significantly. Also, use clang compiler only, because otherwise we fail the numerical stability test (which is ok, generally, but annoying). Use lld to link, because the llvm LTO plugin for ld.gold doesn't work on travis. Turn of stability test on CI. Minor compiler optimization differences can destroy repeatability. Getting it to not do that is too complicated.
4184349 to
8c86710
Compare
70d6d93 to
6f1065d
Compare
1. Don't copy the data. Just keep a reference. 2. Don't store the dimension in every datapoint. That's ridiculously inefficient. 3. Don't store the index in every datapoint. Just compute it as needed from pointer arithmetic.
This allows for more sse optimizations in a few cases.
Makes the datapoints smaller and allows vectorization in initialization to work better.
In addition to being more clear, this allows the compiler to recognize that pos_f is not aliased anywhere else so it can vectorize the calculation better.
Not only was that an important slowdown, but it also impacted repeatability across libm implementations.
|
On a somewhat larger dataset, which took around a minute on marsoc, running on hydra we see which is roughly consistent with the improvement I expected to see. |
Fix the Makefile dependencies so things get rebuilt when they should be.
Only produce one .so file, which knows how to run both 2d and 3d.
Break out main and the I/O methods into a separate cpp file, which is
not used for the python extension module.
Do not create pointless objects. The methods are all static. TSNE is
now a namespace, not a class, and the exported
runmethod is wrappedin
extern "C"to make linkage easier.Minor cleanup of includes / using statements.
Because I just couldn't help myself, remove a couple places where
new/delete were being used for no good reason.
Get rid of a few places where NDIMS (the build time constant) and a
run-time variable, which are supposed to have the same value, were both
being used in a method. This means flowing the template a bit further
up the tree. There's lots more places this could be done - this only
addresses the cases where not doing so could lead to errors if the
template variable was inconsistent with the run-time variable.
Kill off the last pointless vestiges of the cblas dependency.
Use hidden visibility where appropriate - only export the run() method.
This cuts the size of the .so and allows the compiler to inline more
aggressively, especially since we've also turned on LTO.
Together, all of these changes result in a build with a single 901k
extension module, instead of two 1.9MB extension modules.