Skip to content

Make CudaSolver fast again #4

@m-krivov

Description

@m-krivov

Our PedanticCudaSolver works fine, but it should be fast as well. If we implement the following ideas, GoldSolver could be completely replaced with PedanticCudaSolver.

  • If IResultFormatter does not want to receive information about trajectories, there is no need to download these values from GPU. So, we can achieve free speed-up.
  • If it wants, we can apply pipeline processing with two asynchronous streams: compute and download. The first stream computes virtual trajectories, the second one downloads and processes them. After the synchronization stage, they swap places.
  • Multiple host-device transfers should be replaced with the single call of cudaMemcpy() for one large block. This will reduce excessive overheads, but the overall speed-up is expected to be relatively small.
  • If some virtual meteorite has burnt or collided, the related data is still being copied to host. Using atomics, we can re-enumerate meteorites and reduce size of the memory block with their partial trajectories.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions