I wonder the benfit brought by V-shape. Assuming the batch size and the number of PP stages are the same, DualPipeV requires either twice the devices, twice the micro-batch size, or twice the micro-batch number. The latter two options take twice the time, while the first approach appears to be the same as the original one?