-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
Hi!
I have just added starpu_data_invalidate_submit to my code. Of course, I did it with mistakes. Some cases were reported by StarPU, signaling that some data is not initialized to be read. But some cases were not. I found out, that marking data with starpu_data_set_reduction_methods makes it immune to such an assert if the data remains on the same device and its access mode is STARPU_R.
If the data is on GPU and reduction methods only support CPUs, then the following error is printed:
/trinity/home/al.mikhalev/Install/starpu-1.3-a100/lib/libstarpu-1.3.so.10(_starpu_redux_init_data_replicate+0x208)[0x155554ecd3d8]
/trinity/home/al.mikhalev/Install/starpu-1.3-a100/lib/libstarpu-1.3.so.10(_starpu_fetch_task_input_tail+0x210)[0x155554eb3ac0]
/trinity/home/al.mikhalev/Install/starpu-1.3-a100/lib/libstarpu-1.3.so.10(_starpu_cuda_driver_run_once+0x22e)[0x155554f1b72e]
/trinity/home/al.mikhalev/Install/starpu-1.3-a100/lib/libstarpu-1.3.so.10(_starpu_cuda_worker+0x95)[0x155554f1c155]
/lib64/libpthread.so.0(+0x7ea5)[0x155554bfaea5]
/lib64/libc.so.6(clone+0x6d)[0x155554403b0d]
a.out: ../../src/datawizard/reduction.c:92: _starpu_redux_init_data_replicate: Assertion `0 && "init_func"' failed.
Aborted
However, if the reduction methods are supported on device, where the data is allocated, then program is not stopped, no error is thrown and result of computations becomes silently wrong (undefined behavior).
Here is a simple program to reproduce:
#include <starpu.h>
#include <stdio.h>
void set_func(void *buffers[], void *cl_args)
{
float *x = (float *)STARPU_VARIABLE_GET_PTR(buffers[0]);
*x = 0;
printf("set_func\n");
}
void use_func(void *buffers[], void *cl_args)
{
float *x = (float *)STARPU_VARIABLE_GET_PTR(buffers[0]);
float one = 1.0;
printf("use_func\n");
}
void clear_func(void *buffers[], void *cl_args)
{
float *x = (float *)STARPU_VARIABLE_GET_PTR(buffers[0]);
*x = 0.0;
}
void acc_func(void *buffers[], void *cl_args)
{
float *x = (float *)STARPU_VARIABLE_GET_PTR(buffers[0]);
const float *y = (const float *)STARPU_VARIABLE_GET_PTR(buffers[0]);
*x += *y;
}
struct starpu_codelet set_codelet =
{
.cpu_funcs = {set_func},
.modes = {STARPU_W},
.nbuffers = 1
};
struct starpu_codelet use_codelet =
{
.cpu_funcs = {use_func},
.modes = {STARPU_R},
.nbuffers = 1
};
struct starpu_codelet clear_codelet =
{
.cpu_funcs = {clear_func},
.modes = {STARPU_W},
.nbuffers = 1
};
struct starpu_codelet acc_codelet =
{
.cpu_funcs = {acc_func},
.modes = {STARPU_RW, STARPU_R},
.nbuffers = 2
};
int main(int argc, char **argv)
{
float x;
starpu_data_handle_t x_handle;
starpu_init(NULL);
starpu_variable_data_register(&x_handle, STARPU_MAIN_RAM, (uintptr_t)&x,
sizeof(x));
starpu_data_set_reduction_methods(x_handle, &acc_codelet, &clear_codelet);
starpu_task_insert(&set_codelet, STARPU_W, x_handle, 0); // Init X
starpu_data_invalidate_submit(x_handle); // Invalidate X
//starpu_task_insert(&set_codelet, STARPU_W, x_handle, 0); // Init X is ignored
starpu_task_insert(&use_codelet, STARPU_R, x_handle, 0); // Read X right after invalidation without an error
starpu_task_wait_for_all();
starpu_data_unregister(x_handle);
starpu_shutdown();
printf("x=%f\n", x);
}
Metadata
Metadata
Assignees
Labels
No labels