There is still some mathematical work to get the Rice approximation part of dirichlet_grad() to be numerically stable in single-precision. Currently it is only stable for double-precision, but consumer-grade GPUs are very slow at double-precision math.