Skip to content

Conversation

@Kerl13
Copy link
Member

@Kerl13 Kerl13 commented Feb 16, 2021

Some optimisations on the merge_{hi,lo} functions

@Kerl13
Copy link
Member Author

Kerl13 commented Feb 16, 2021

Before

================================== [ Unif ] ==================================
length     stdlib    timsort   speedup    lg(n!)    stdlib   timsort   speedup
------------------------------------------------------------------------------
 30000   0.007039   0.010258    -45.7%    402909    409951    447116     -9.1%
 60000   0.014751   0.021221    -43.9%    865809    879910    955407     -8.6%
 90000   0.023624   0.031385    -32.9%   1351355   1374063   1470859     -7.0%
120000   0.031248   0.045396    -45.3%   1851608   1879818   2056774     -9.4%
150000   0.040328   0.056562    -40.3%   2362797   2406163   2595284     -7.9%
------------------------------------------------------------------------------

================================= [ 5-Runs ] =================================
length     stdlib    timsort   speedup    lg(n!)    stdlib   timsort   speedup
------------------------------------------------------------------------------
 30000   0.004255   0.002176     48.9%    402909    260971     97146     62.8%
 60000   0.008782   0.003431     60.9%    865809    552184    191963     65.2%
 90000   0.013131   0.004885     62.8%   1351355    864474    294032     66.0%
120000   0.018827   0.007167     61.9%   1851608   1179318    387267     67.2%
150000   0.023528   0.008382     64.4%   2362797   1495840    488300     67.4%
------------------------------------------------------------------------------

After

================================== [ Unif ] ==================================
length     stdlib    timsort   speedup    lg(n!)    stdlib   timsort   speedup
------------------------------------------------------------------------------
 30000   0.007027   0.009505    -35.3%    402909    409951    447116     -9.1%
 60000   0.015158   0.019277    -27.2%    865809    879910    955407     -8.6%
 90000   0.023195   0.029690    -28.0%   1351355   1374063   1470859     -7.0%
120000   0.030921   0.042278    -36.7%   1851608   1879818   2056774     -9.4%
150000   0.040280   0.053288    -32.3%   2362797   2406163   2595284     -7.9%
------------------------------------------------------------------------------

================================= [ 5-Runs ] =================================
length     stdlib    timsort   speedup    lg(n!)    stdlib   timsort   speedup
------------------------------------------------------------------------------
 30000   0.004224   0.001736     58.9%    402909    260971     97146     62.8%
 60000   0.008755   0.002680     69.4%    865809    552184    191963     65.2%
 90000   0.013104   0.004630     64.7%   1351355    864474    294032     66.0%
120000   0.018372   0.006515     64.5%   1851608   1179318    387267     67.2%
150000   0.023722   0.007637     67.8%   2362797   1495840    488300     67.4%
------------------------------------------------------------------------------

@Niols
Copy link
Member

Niols commented Feb 17, 2021

Oh very nice!

@Kerl13
Copy link
Member Author

Kerl13 commented Feb 17, 2021

I implemented one optimisation here, which reduces the number of memory accesses.

Before the patch, the merge_{hi,lo} functions would perform 4 Array.get per recursive call.

  • 2 of them could easily be avoided (we were really accessing every element twice in a row)
  • of the remaining 2 memory access, one could be reused at the next recursive call. So I changed the logic of the functions so that the caller makes the memory accesses rather than the callee, this allows to "pass along" to the next recursive call the value that will be reused

→ we now perform one memory access per recursive call
→ downside: it clutters the code a bit…

We achieve this by using a representation of the form (beginning index,
ending index) for the slices rather than (beginning index, length).
This mostly reduces the number of arithmetic operations in merge_hi,
while merge_lo remains more or less untouched.

This has a noticeable impact on the in the benchmark!
@Kerl13
Copy link
Member Author

Kerl13 commented Feb 17, 2021

And with the last commit:

================================== [ Unif ] ==================================
length     stdlib    timsort   speedup    lg(n!)    stdlib   timsort   speedup
------------------------------------------------------------------------------
 30000   0.006782   0.008458    -24.7%    402909    409955    446234     -8.8%
 60000   0.014632   0.017702    -21.0%    865809    879918    956292     -8.7%
 90000   0.022037   0.027066    -22.8%   1351355   1374016   1472769     -7.2%
120000   0.030500   0.037782    -23.9%   1851608   1879819   2042806     -8.7%
150000   0.038565   0.048995    -27.0%   2362797   2406175   2590843     -7.7%
------------------------------------------------------------------------------

================================= [ 5-Runs ] =================================
length     stdlib    timsort   speedup    lg(n!)    stdlib   timsort   speedup
------------------------------------------------------------------------------
 30000   0.003933   0.001502     61.8%    402909    261693     97596     62.7%
 60000   0.008345   0.003069     63.2%    865809    553642    194825     64.8%
 90000   0.012528   0.004685     62.6%   1351355    856814    292144     65.9%
120000   0.017935   0.005648     68.5%   1851608   1169300    391936     66.5%
150000   0.022365   0.006656     70.2%   2362797   1499453    487716     67.5%
------------------------------------------------------------------------------

@Kerl13 Kerl13 marked this pull request as ready for review February 17, 2021 01:32
Copy link
Member

@Niols Niols left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments but I approve the changes! Don't we want to update merge as well to use two offsets? Like where is it clever to go from ben/len to ben/end?

src1 ofs1 len1
dest beg
src0 beg0 end0 x0
src1 beg1 end1 x1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be worth having comments like (* x0 = src0.(beg0) *) already at this point. It's readable in the assertions below but I think it's better to have it here!

assert (end0 >= beg0);
assert (end1 >= beg1);

(* This is used to optimise the case len0 = 1 below. *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment does not make sense to me. I think it would deserve more text!

src1 ofs1 len1
dest end_
src0 beg0 end0 x0 (* run0 *)
src1 beg1 end1 x1 (* run1 *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cf before

assert (x1 = src1.(end1));
assert (end0 >= beg0);
assert (end1 >= beg1);
(* This is used to optimise the case len1 = 1 below. *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cf before (+ missing space?)

merge_hi
cmp
dest (end_ - 1)
src0 beg0 (end0 - 1) src0.(end0 - 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use end0 - 1 twice here, is it worth having an intermediary value? (Same in other branch; same in merge_lo.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants