Skip to content

frep-loop: PHI problem with reductions #12

@huettern

Description

@huettern

For parallel reductions where the accumulators are all initialized to the same value, the PHI nodes are merged into the instructions and the reduction is made on the register of initial value, instead on "new" registers

  for(unsigned i=0; i < MATSIZE; ++i) {
    #pragma unroll 1
    for(unsigned j=0; j < MATSIZE; j+=UNROLL) {
      register double acc[UNROLL];
      
      #pragma unroll
      for (int u = 0; u < UNROLL; ++u) acc[u] = 0.0; // <- All is fine if this is e.g. c[i*MATSIZE+j+u];
      
      #pragma frep infer
      for(unsigned k=0; k < MATSIZE; ++k) {
        #pragma unroll
        for (int u = 0; u < UNROLL; ++u)
        {
          acc[u] += __builtin_ssr_pop(0)*__builtin_ssr_pop(1);
        }
      }
      
      #pragma unroll
      for (int u = 0; u < UNROLL; ++u) c[i*MATSIZE+j+u] = acc[u];
    }
  }

With 0.0 this results in the wrong assembly

fmadd.d	ft5, ft1, ft0, ft3
fmadd.d	ft6, ft1, ft0, ft3
fmadd.d	ft7, ft1, ft0, ft3
# ...

For inididual initial values (c[i*MATSIZE+j+u], the problem disappears

fmadd.d	ft3, ft1, ft0, ft3
fmadd.d	ft4, ft1, ft0, ft4
fmadd.d	ft5, ft1, ft0, ft5
fmadd.d	ft6, ft1, ft0, ft6

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions