Skip to content

Conversation

@Markus92
Copy link

@Markus92 Markus92 commented Nov 5, 2025

When compiling NativeAcceleration.so, I noticed some linker errors related to the (templated) d_ValueFill function:

$ nm -C libNativeAcceleration.so | grep d_ValueFill
                 U void gtom::d_ValueFill<float2>(float2*, unsigned long, float2)
                 U void gtom::d_ValueFill<float>(float*, unsigned long, float)

As you can see, these are undefined symbols. Reason being is that they aren't being instantiated anywhere.

However, that kinda got my interest why this isn't a problem in the release builds. They ship with a whole hoist of instantiations of the same templated function!

$ nm -C libNativeAcceleration.so | grep d_ValueFill
00000000001ba1ea W void gtom::d_ValueFill<int2>(int2*, unsigned long, int2)
00000000001ba2b9 W void gtom::d_ValueFill<int3>(int3*, unsigned long, int3)
00000000001b9d06 W void gtom::d_ValueFill<float2>(float2*, unsigned long, float2)
00000000001ba53c W void gtom::d_ValueFill<gtom::tfloat2>(gtom::tfloat2*, unsigned long, gtom::tfloat2)
00000000001ba611 W void gtom::d_ValueFill<gtom::tfloat3>(gtom::tfloat3*, unsigned long, gtom::tfloat3)
00000000001ba6f7 W void gtom::d_ValueFill<gtom::tfloat4>(gtom::tfloat4*, unsigned long, gtom::tfloat4)
00000000001ba46c W void gtom::d_ValueFill<bool>(bool*, unsigned long, bool)
00000000001b9ddb W void gtom::d_ValueFill<char>(char*, unsigned long, char)
00000000001b9c31 W void gtom::d_ValueFill<double>(double*, unsigned long, double)
00000000001b9b5e W void gtom::d_ValueFill<float>(float*, unsigned long, float)
00000000001b9eab W void gtom::d_ValueFill<unsigned char>(unsigned char*, unsigned long, unsigned char)
00000000001ba11d W void gtom::d_ValueFill<int>(int*, unsigned long, int)
00000000001ba39f W void gtom::d_ValueFill<unsigned int>(unsigned int*, unsigned long, unsigned int)
00000000001b9f7b W void gtom::d_ValueFill<short>(short*, unsigned long, short)
00000000001ba04c W void gtom::d_ValueFill<unsigned short>(unsigned short*, unsigned long, unsigned short)

After way too much time debugging this, it turns out that the "release" builds - as available on Conda - are not actually optimized builds. When compiling with -DCMAKE_RELEASE_TYPE=Release, all compiler optimizations get switched on and I guess somewhere, some of those implementations get optimized away - as they don't get instantiated anywhere explicitly in the code. When not defining -DCMAKE_RELEASE_TYPE, no compiler optimizations get used and it works.
This matches with what I observe in the scripts here. Another way it could work is that current builds are made with GCC 9. Starting at GCC 10+, link-time template instantiation was removed.

Anyways long story short, this PR adds explicit template instantiations to the NativeAcceleration library, allowing one to compile and link it with compiler optimizations turned on. To stick a little bit more to best practices and aid my debugging, I made the types explicit while I was at it, too. Compiles fine with GCC 10+ now!

@dtegunov
Copy link
Contributor

dtegunov commented Nov 6, 2025

This is very helpful, thank you! Is explicit instantiation in every call still necessary once we explicitly instantiate it with all necessary types in Memory.cu?

@Markus92
Copy link
Author

Markus92 commented Nov 6, 2025

Not sure if it's necessary, but after messing around with it for a while I was just happy this worked. You could definitely try removing it, see if it compiles/links fine with -O3. Good chance it'll do the job :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants