About the training time

Thanks for your great work.

I am surprised by the outstanding performance on such a lightweight design.

In the paper, the model is trained over 3000 steps on Nvidia H20 GPUs with a batch size of 48 with only ∼1% additional trained parameters and just 2000 training pairs. 

Could you please disclose the rough training time? Thanks.