Skip to content

Data invalidation and deinitialization do not signal about read dependency in a task afterwards in certain cases #39

@Muxas

Description

@Muxas

Hi!

I have just added starpu_data_invalidate_submit to my code. Of course, I did it with mistakes. Some cases were reported by StarPU, signaling that some data is not initialized to be read. But some cases were not. I found out, that marking data with starpu_data_set_reduction_methods makes it immune to such an assert if the data remains on the same device and its access mode is STARPU_R.

If the data is on GPU and reduction methods only support CPUs, then the following error is printed:

/trinity/home/al.mikhalev/Install/starpu-1.3-a100/lib/libstarpu-1.3.so.10(_starpu_redux_init_data_replicate+0x208)[0x155554ecd3d8]
/trinity/home/al.mikhalev/Install/starpu-1.3-a100/lib/libstarpu-1.3.so.10(_starpu_fetch_task_input_tail+0x210)[0x155554eb3ac0]
/trinity/home/al.mikhalev/Install/starpu-1.3-a100/lib/libstarpu-1.3.so.10(_starpu_cuda_driver_run_once+0x22e)[0x155554f1b72e]
/trinity/home/al.mikhalev/Install/starpu-1.3-a100/lib/libstarpu-1.3.so.10(_starpu_cuda_worker+0x95)[0x155554f1c155]
/lib64/libpthread.so.0(+0x7ea5)[0x155554bfaea5]
/lib64/libc.so.6(clone+0x6d)[0x155554403b0d]
a.out: ../../src/datawizard/reduction.c:92: _starpu_redux_init_data_replicate: Assertion `0 && "init_func"' failed.
Aborted

However, if the reduction methods are supported on device, where the data is allocated, then program is not stopped, no error is thrown and result of computations becomes silently wrong (undefined behavior).

Here is a simple program to reproduce:

#include <starpu.h>
#include <stdio.h>

void set_func(void *buffers[], void *cl_args)
{
    float *x = (float *)STARPU_VARIABLE_GET_PTR(buffers[0]);
    *x = 0;
    printf("set_func\n");
}

void use_func(void *buffers[], void *cl_args)
{
    float *x = (float *)STARPU_VARIABLE_GET_PTR(buffers[0]);
    float one = 1.0;
    printf("use_func\n");
}

void clear_func(void *buffers[], void *cl_args)
{
    float *x = (float *)STARPU_VARIABLE_GET_PTR(buffers[0]);
    *x = 0.0;
}

void acc_func(void *buffers[], void *cl_args)
{
    float *x = (float *)STARPU_VARIABLE_GET_PTR(buffers[0]);
    const float *y = (const float *)STARPU_VARIABLE_GET_PTR(buffers[0]);
    *x += *y;
}

struct starpu_codelet set_codelet =
{
    .cpu_funcs = {set_func},
    .modes = {STARPU_W},
    .nbuffers = 1
};

struct starpu_codelet use_codelet =
{
    .cpu_funcs = {use_func},
    .modes = {STARPU_R},
    .nbuffers = 1
};

struct starpu_codelet clear_codelet =
{
    .cpu_funcs = {clear_func},
    .modes = {STARPU_W},
    .nbuffers = 1
};

struct starpu_codelet acc_codelet =
{
    .cpu_funcs = {acc_func},
    .modes = {STARPU_RW, STARPU_R},
    .nbuffers = 2
};

int main(int argc, char **argv)
{
    float x;
    starpu_data_handle_t x_handle;
    starpu_init(NULL);
    starpu_variable_data_register(&x_handle, STARPU_MAIN_RAM, (uintptr_t)&x,
            sizeof(x));
    starpu_data_set_reduction_methods(x_handle, &acc_codelet, &clear_codelet);
    starpu_task_insert(&set_codelet, STARPU_W, x_handle, 0); // Init X
    starpu_data_invalidate_submit(x_handle); // Invalidate X
    //starpu_task_insert(&set_codelet, STARPU_W, x_handle, 0); // Init X is ignored
    starpu_task_insert(&use_codelet, STARPU_R, x_handle, 0); // Read X right after invalidation without an error
    starpu_task_wait_for_all();
    starpu_data_unregister(x_handle);
    starpu_shutdown();
    printf("x=%f\n", x);
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions