Incorrect results with asynchronous partitioning on CUDA devices and STARPU 1.4. 

It seems that during the updates introduces between 1.3 and 1.4, the asynchronous partitioning is broken. In basic, we have a code
```
starpu_data_partion_plan(....) ; 

execute tasks on the partioned dataset

starpu_data_partition_clean(...); 
```
The submit / unsubmit we leave to the STARPU runtime. The kernels required for the computing the task are available as CPU and CUDA implementation. Now we observed the following cases. 

## StarPU 1.3.11 / CUDA 11.8 / GCC 12

- CPU Only. Everything Correct 
- 64 CPU Cores + 1 CUDA Device:  Everything Correct 
- 64 CPU Cores + 4 CUDA Devices:  Everything Correct 


## StarPU 1.3.11 / CUDA 12.2/ GCC 12

- CPU Only. Everything Correct 
- 64 CPU Cores + 1 CUDA Device:  Everything Correct 
- 64 CPU Cores + 4 CUDA Devices:  Everything Correct 

## StarPU 1.4.4 / CUDA 12.2/ GCC 12

- CPU Only. Everything Correct 
- 64 CPU Cores + 1 CUDA Device:  Wrong results. 
- 64 CPU Cores + 4 CUDA Devices:  Wrong results. 

The tasks are only gemm operations from CUBLAS or MKL. 
Due to ongoing research, I could not share the code and does not have time to build an MWE til now.  But in general it seems to have something in common with https://gitlab.inria.fr/starpu/starpu/-/issues/43. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorrect results with asynchronous partitioning on CUDA devices and STARPU 1.4. #37

StarPU 1.3.11 / CUDA 11.8 / GCC 12

StarPU 1.3.11 / CUDA 12.2/ GCC 12

StarPU 1.4.4 / CUDA 12.2/ GCC 12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect results with asynchronous partitioning on CUDA devices and STARPU 1.4. #37

Description

StarPU 1.3.11 / CUDA 11.8 / GCC 12

StarPU 1.3.11 / CUDA 12.2/ GCC 12

StarPU 1.4.4 / CUDA 12.2/ GCC 12

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions