Performance of single iteration for m loop regresses as no. of threads increase

Performance of single iteration for `m` loop regresses as no. of threads increase. This casues loop iterations in range_fn to not scale linearly with no. of threads. 

To be more concrete about what's going on: Outer loop (`m` loop) runs 255 times. To parallelise we call `process_m_loop`. `process_m_loop` recursively divides bigger range, starting with [1,255] into smaller ranges until the size of a range is >= set_len. set_len is equal to `255/no_of_threads`, and defines suitable no. of iterations that must be assigned to a single thread. Once a given range has length smaller than set_len, `m` loops corresponding to it are processed serially. 

The issue is that runtime of processing 255 loops does not decreases linearly with no. of threads available. For ex, if when thread = 1 it takes `x` seconds, then with 8 threads it should ideally take `x/8` seconds but it does not. It takes a lot longer than that. 

To figure out where time is being consumed as no. of threads increase, I timed the piece of code that performs single `m` loop iteration (starting [here](https://github.com/Janmajayamall/omr/blob/d4ba4a49988e010fb3b627f1722c5275176a0237/src/server/mod.rs#L182)). Ideally the time for single loop should stay same irrespectice of no. of threads, but turns out it does not. For example, If thread is set to 1, then on my m1 it takes 132ms per loop. Incresing thread to 8 increases per loop time to around 250ms on average (it varies a lot). 

I also multiplied per loop time with no. of times the loop runs (ie 255) and then divided it by no. of threads (for both cases when thread is 1 and 8). For example, if it takes `x` seconds per loop then I calculatd  `x * 255 / (no. of threads)` and it matches perfectly with total time taken in loop part of range function (with a difference of a few ms due to other minor operations). This confirms that the main cause of this issue is due the fact the "single `m` loop iteration performance regresses with no. of threads". Moreover, it also confirms that the way threads are created in `process_m_loop` (using `rayon::join`) does not have any overhead. 









Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance of single iteration for m loop regresses as no. of threads increase #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Performance of single iteration for m loop regresses as no. of threads increase #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions