In the multivariate normal family, the sufficient statistics computation requires masking the upper-right-triangular elements of the matrix. We can do one of three things:
- Suppress the warning
- Break the matrix multiplication for the second moment matrix (D x D) into D vectorized matrix multiplications of increasing length. It would be interesting to know if this is faster or slower.
- Implement this cool DHAMM method in tensorflow. https://software.intel.com/en-us/articles/a-matrix-multiplication-routine-that-updates-only-the-upper-or-lower-triangular-part-of-the
This is low priority.