In DiT-ToCa/models.py, I noticed that you did not take attention module into the calculation of test_FLOPS. And you did not jump over the calculation of attention module in ToCa here.
So will this issue cause miscalculation of the FLOPs ratio between w and w/o ToCa?