-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Our PedanticCudaSolver works fine, but it should be fast as well. If we implement the following ideas, GoldSolver could be completely replaced with PedanticCudaSolver.
- If
IResultFormatterdoes not want to receive information about trajectories, there is no need to download these values from GPU. So, we can achieve free speed-up. - If it wants, we can apply pipeline processing with two asynchronous streams: compute and download. The first stream computes virtual trajectories, the second one downloads and processes them. After the synchronization stage, they swap places.
- Multiple host-device transfers should be replaced with the single call of
cudaMemcpy()for one large block. This will reduce excessive overheads, but the overall speed-up is expected to be relatively small. - If some virtual meteorite has burnt or collided, the related data is still being copied to host. Using atomics, we can re-enumerate meteorites and reduce size of the memory block with their partial trajectories.
Metadata
Metadata
Assignees
Labels
No labels