Explicitly template and instantiate d_ValueFill function #437
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When compiling NativeAcceleration.so, I noticed some linker errors related to the (templated)
d_ValueFillfunction:As you can see, these are undefined symbols. Reason being is that they aren't being instantiated anywhere.
However, that kinda got my interest why this isn't a problem in the release builds. They ship with a whole hoist of instantiations of the same templated function!
After way too much time debugging this, it turns out that the "release" builds - as available on Conda - are not actually optimized builds. When compiling with
-DCMAKE_RELEASE_TYPE=Release, all compiler optimizations get switched on and I guess somewhere, some of those implementations get optimized away - as they don't get instantiated anywhere explicitly in the code. When not defining-DCMAKE_RELEASE_TYPE, no compiler optimizations get used and it works.This matches with what I observe in the scripts here. Another way it could work is that current builds are made with GCC 9. Starting at GCC 10+, link-time template instantiation was removed.
Anyways long story short, this PR adds explicit template instantiations to the NativeAcceleration library, allowing one to compile and link it with compiler optimizations turned on. To stick a little bit more to best practices and aid my debugging, I made the types explicit while I was at it, too. Compiles fine with GCC 10+ now!