-
Notifications
You must be signed in to change notification settings - Fork 1
Description
The tapered guide currently has threads follow neutrons through their whole journey so the ones that do not make it far through the instrument leave that computation resource useless thereafter.
Could the computation be split by component, where later components consume fewer threads (or other resource) corresponding to how they have to handle fewer neutrons? Would that improve efficiency? But how to filter out the others?
@ckendrick points out that https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EXECUTION.html#group__CUDART__EXECUTION_1g504b94170f83285c71031be6d5d15f73 may be helpful.
Also, what other approaches might there be to efficiency improvement? For instance, could a multi-GPU deployment system be considered where they handle different parts of the instrument?