Skip to content

Conversation

@therault
Copy link
Owner

Use the fact that the plan is splitting each matrix in regular squares to replace memorization of the plan with inline computation.
Loops are in dim^3 (worst case... number of gemms in each phase to be exact) to compute the big things (number of gemms in a phase, list of gemms in a phase).

This is not ideal yet: at the same time we were building the computational plan, we would build the communication plan. We can skip the computational plan building step now, but we still need to build the communication plan. Each task needs to know exactly what other tasks it passes data to, and because tasks are named with plan index, this means the communication tasks need to remember which communication phase is connected to which computation phase.

Storing the communication plan is much smaller, though, and the objects don't need to be sorted / ordered.

Hopefully this reduces the time spent building the plan significantly already. Still working to remove the plan building altogether.

therault and others added 11 commits October 13, 2021 16:19
…GEMMs run are computed from the matrix metadata and the dimension of the strategy, no need for memorizing those at construction time
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
SPMM: inline the local_gemm callback and fix some compiler issues
…the more efficient algorithm to build the communication plan; display the time spent in the constructor, and compute the flops with this time
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants