Make CudaSolver fast again

Our `PedanticCudaSolver` works fine, but it should be fast as well. If we implement the following ideas, `GoldSolver` could be completely replaced with `PedanticCudaSolver`.

* If `IResultFormatter` does not want to receive information about trajectories, there is no need to download these values from GPU. So, we can achieve free speed-up.
* If it wants, we can apply pipeline processing with two asynchronous streams: compute and download. The first stream computes virtual trajectories, the second one downloads and processes them. After the synchronization stage, they swap places.
* Multiple host-device transfers should be replaced with the single call of `cudaMemcpy()` for one large block. This will reduce excessive overheads, but the overall speed-up is expected to be relatively small.
* If some virtual meteorite has burnt or collided, the related data is still being copied to host. Using atomics, we can re-enumerate meteorites and reduce size of the memory block with their partial trajectories.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make CudaSolver fast again #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Make CudaSolver fast again #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions