-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Performance of single iteration for m loop regresses as no. of threads increase. This casues loop iterations in range_fn to not scale linearly with no. of threads.
To be more concrete about what's going on: Outer loop (m loop) runs 255 times. To parallelise we call process_m_loop. process_m_loop recursively divides bigger range, starting with [1,255] into smaller ranges until the size of a range is >= set_len. set_len is equal to 255/no_of_threads, and defines suitable no. of iterations that must be assigned to a single thread. Once a given range has length smaller than set_len, m loops corresponding to it are processed serially.
The issue is that runtime of processing 255 loops does not decreases linearly with no. of threads available. For ex, if when thread = 1 it takes x seconds, then with 8 threads it should ideally take x/8 seconds but it does not. It takes a lot longer than that.
To figure out where time is being consumed as no. of threads increase, I timed the piece of code that performs single m loop iteration (starting here). Ideally the time for single loop should stay same irrespectice of no. of threads, but turns out it does not. For example, If thread is set to 1, then on my m1 it takes 132ms per loop. Incresing thread to 8 increases per loop time to around 250ms on average (it varies a lot).
I also multiplied per loop time with no. of times the loop runs (ie 255) and then divided it by no. of threads (for both cases when thread is 1 and 8). For example, if it takes x seconds per loop then I calculatd x * 255 / (no. of threads) and it matches perfectly with total time taken in loop part of range function (with a difference of a few ms due to other minor operations). This confirms that the main cause of this issue is due the fact the "single m loop iteration performance regresses with no. of threads". Moreover, it also confirms that the way threads are created in process_m_loop (using rayon::join) does not have any overhead.