Here's a tensorflow implementation of entmax $\alpha=1.5$ mapping and loss in case someone's interested. https://gist.github.com/justheuristic/60167e77a95221586be315ae527c3cbd It should work on tf >= 1.8 and matches both outputs and gradients of the official pytorch implementation. Thanks lena-voita@ for assistance