The survey paper mainly suggested attention variants each of which is quadratic in computational complexity.
However, there have been many different advancements in this area from BigBird, Longformer, Sparse Transformer, Linformer, to dilated attention, all of which are better than quadratic, with some of them being linear as well.
Some investigation around these could be added to the Knowledge Base.