-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Labels
questionFurther information is requestedFurther information is requested
Description
It seems that during the updates introduces between 1.3 and 1.4, the asynchronous partitioning is broken. In basic, we have a code
starpu_data_partion_plan(....) ;
execute tasks on the partioned dataset
starpu_data_partition_clean(...);
The submit / unsubmit we leave to the STARPU runtime. The kernels required for the computing the task are available as CPU and CUDA implementation. Now we observed the following cases.
StarPU 1.3.11 / CUDA 11.8 / GCC 12
- CPU Only. Everything Correct
- 64 CPU Cores + 1 CUDA Device: Everything Correct
- 64 CPU Cores + 4 CUDA Devices: Everything Correct
StarPU 1.3.11 / CUDA 12.2/ GCC 12
- CPU Only. Everything Correct
- 64 CPU Cores + 1 CUDA Device: Everything Correct
- 64 CPU Cores + 4 CUDA Devices: Everything Correct
StarPU 1.4.4 / CUDA 12.2/ GCC 12
- CPU Only. Everything Correct
- 64 CPU Cores + 1 CUDA Device: Wrong results.
- 64 CPU Cores + 4 CUDA Devices: Wrong results.
The tasks are only gemm operations from CUBLAS or MKL.
Due to ongoing research, I could not share the code and does not have time to build an MWE til now. But in general it seems to have something in common with https://gitlab.inria.fr/starpu/starpu/-/issues/43.
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested