Skip to content

Incorrect results with asynchronous partitioning on CUDA devices and STARPU 1.4.  #37

@grisuthedragon

Description

@grisuthedragon

It seems that during the updates introduces between 1.3 and 1.4, the asynchronous partitioning is broken. In basic, we have a code

starpu_data_partion_plan(....) ; 

execute tasks on the partioned dataset

starpu_data_partition_clean(...); 

The submit / unsubmit we leave to the STARPU runtime. The kernels required for the computing the task are available as CPU and CUDA implementation. Now we observed the following cases.

StarPU 1.3.11 / CUDA 11.8 / GCC 12

  • CPU Only. Everything Correct
  • 64 CPU Cores + 1 CUDA Device: Everything Correct
  • 64 CPU Cores + 4 CUDA Devices: Everything Correct

StarPU 1.3.11 / CUDA 12.2/ GCC 12

  • CPU Only. Everything Correct
  • 64 CPU Cores + 1 CUDA Device: Everything Correct
  • 64 CPU Cores + 4 CUDA Devices: Everything Correct

StarPU 1.4.4 / CUDA 12.2/ GCC 12

  • CPU Only. Everything Correct
  • 64 CPU Cores + 1 CUDA Device: Wrong results.
  • 64 CPU Cores + 4 CUDA Devices: Wrong results.

The tasks are only gemm operations from CUBLAS or MKL.
Due to ongoing research, I could not share the code and does not have time to build an MWE til now. But in general it seems to have something in common with https://gitlab.inria.fr/starpu/starpu/-/issues/43.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions