https://github.com/markkho/msdm/blob/8bb0f88a0c28006a4e9e416027a1e59e2603f0c2/msdm/algorithms/vectorizedvalueiteration.py#L1 We should avoid scipy's softmax/logsumexp according to [Blanchard et al. (2020)](https://academic.oup.com/imajna/advance-article/doi/10.1093/imanum/draa038/5893596)