Faster merge functions #6

Kerl13 · 2021-02-16T23:58:20Z

Some optimisations on the merge_{hi,lo} functions

Kerl13 · 2021-02-16T23:59:43Z

Before

================================== [ Unif ] ==================================
length     stdlib    timsort   speedup    lg(n!)    stdlib   timsort   speedup
------------------------------------------------------------------------------
 30000   0.007039   0.010258    -45.7%    402909    409951    447116     -9.1%
 60000   0.014751   0.021221    -43.9%    865809    879910    955407     -8.6%
 90000   0.023624   0.031385    -32.9%   1351355   1374063   1470859     -7.0%
120000   0.031248   0.045396    -45.3%   1851608   1879818   2056774     -9.4%
150000   0.040328   0.056562    -40.3%   2362797   2406163   2595284     -7.9%
------------------------------------------------------------------------------

================================= [ 5-Runs ] =================================
length     stdlib    timsort   speedup    lg(n!)    stdlib   timsort   speedup
------------------------------------------------------------------------------
 30000   0.004255   0.002176     48.9%    402909    260971     97146     62.8%
 60000   0.008782   0.003431     60.9%    865809    552184    191963     65.2%
 90000   0.013131   0.004885     62.8%   1351355    864474    294032     66.0%
120000   0.018827   0.007167     61.9%   1851608   1179318    387267     67.2%
150000   0.023528   0.008382     64.4%   2362797   1495840    488300     67.4%
------------------------------------------------------------------------------

After

================================== [ Unif ] ==================================
length     stdlib    timsort   speedup    lg(n!)    stdlib   timsort   speedup
------------------------------------------------------------------------------
 30000   0.007027   0.009505    -35.3%    402909    409951    447116     -9.1%
 60000   0.015158   0.019277    -27.2%    865809    879910    955407     -8.6%
 90000   0.023195   0.029690    -28.0%   1351355   1374063   1470859     -7.0%
120000   0.030921   0.042278    -36.7%   1851608   1879818   2056774     -9.4%
150000   0.040280   0.053288    -32.3%   2362797   2406163   2595284     -7.9%
------------------------------------------------------------------------------

================================= [ 5-Runs ] =================================
length     stdlib    timsort   speedup    lg(n!)    stdlib   timsort   speedup
------------------------------------------------------------------------------
 30000   0.004224   0.001736     58.9%    402909    260971     97146     62.8%
 60000   0.008755   0.002680     69.4%    865809    552184    191963     65.2%
 90000   0.013104   0.004630     64.7%   1351355    864474    294032     66.0%
120000   0.018372   0.006515     64.5%   1851608   1179318    387267     67.2%
150000   0.023722   0.007637     67.8%   2362797   1495840    488300     67.4%
------------------------------------------------------------------------------

Niols · 2021-02-17T00:02:21Z

Oh very nice!

Kerl13 · 2021-02-17T00:05:35Z

I implemented one optimisation here, which reduces the number of memory accesses.

Before the patch, the merge_{hi,lo} functions would perform 4 Array.get per recursive call.

2 of them could easily be avoided (we were really accessing every element twice in a row)
of the remaining 2 memory access, one could be reused at the next recursive call. So I changed the logic of the functions so that the caller makes the memory accesses rather than the callee, this allows to "pass along" to the next recursive call the value that will be reused

→ we now perform one memory access per recursive call
→ downside: it clutters the code a bit…

We achieve this by using a representation of the form (beginning index, ending index) for the slices rather than (beginning index, length). This mostly reduces the number of arithmetic operations in merge_hi, while merge_lo remains more or less untouched. This has a noticeable impact on the in the benchmark!

Kerl13 · 2021-02-17T01:31:26Z

And with the last commit:

================================== [ Unif ] ==================================
length     stdlib    timsort   speedup    lg(n!)    stdlib   timsort   speedup
------------------------------------------------------------------------------
 30000   0.006782   0.008458    -24.7%    402909    409955    446234     -8.8%
 60000   0.014632   0.017702    -21.0%    865809    879918    956292     -8.7%
 90000   0.022037   0.027066    -22.8%   1351355   1374016   1472769     -7.2%
120000   0.030500   0.037782    -23.9%   1851608   1879819   2042806     -8.7%
150000   0.038565   0.048995    -27.0%   2362797   2406175   2590843     -7.7%
------------------------------------------------------------------------------

================================= [ 5-Runs ] =================================
length     stdlib    timsort   speedup    lg(n!)    stdlib   timsort   speedup
------------------------------------------------------------------------------
 30000   0.003933   0.001502     61.8%    402909    261693     97596     62.7%
 60000   0.008345   0.003069     63.2%    865809    553642    194825     64.8%
 90000   0.012528   0.004685     62.6%   1351355    856814    292144     65.9%
120000   0.017935   0.005648     68.5%   1851608   1169300    391936     66.5%
150000   0.022365   0.006656     70.2%   2362797   1499453    487716     67.5%
------------------------------------------------------------------------------

Niols

A few comments but I approve the changes! Don't we want to update merge as well to use two offsets? Like where is it clever to go from ben/len to ben/end?

Niols · 2021-02-18T15:21:01Z

src/lib/array/timsort.ml

-  src1 ofs1 len1
+  dest beg
+  src0 beg0 end0 x0
+  src1 beg1 end1 x1


I think it would be worth having comments like (* x0 = src0.(beg0) *) already at this point. It's readable in the assertions below but I think it's better to have it here!

Niols · 2021-02-18T15:28:30Z

src/lib/array/timsort.ml

+  assert (end0 >= beg0);
+  assert (end1 >= beg1);
+
+  (* This is used to optimise the case len0 = 1 below. *)


This comment does not make sense to me. I think it would deserve more text!

Niols · 2021-02-18T15:46:59Z

src/lib/array/timsort.ml

-  src1 ofs1 len1
+  dest end_
+  src0 beg0 end0 x0 (* run0 *)
+  src1 beg1 end1 x1 (* run1 *)


Niols · 2021-02-18T15:47:08Z

src/lib/array/timsort.ml

+  assert (x1 = src1.(end1));
+  assert (end0 >= beg0);
+  assert (end1 >= beg1);
+  (* This is used to optimise the case len1 = 1 below. *)


cf before (+ missing space?)

Niols · 2021-02-18T15:48:03Z

src/lib/array/timsort.ml

+      merge_hi
+        cmp
+        dest (end_ - 1)
+        src0 beg0 (end0 - 1) src0.(end0 - 1)


We use end0 - 1 twice here, is it worth having an intermediary value? (Same in other branch; same in merge_lo.)

Kerl13 added 3 commits February 17, 2021 00:56

Avoid two unnecessary blits

42156ba

in merge_* reduce the number of array accesses

c7271d0

merge_*: reuse memory access from previous calls

4388514

Kerl13 marked this pull request as ready for review February 17, 2021 01:32

Niols approved these changes Feb 18, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Faster merge functions #6

Faster merge functions #6

Uh oh!

Kerl13 commented Feb 16, 2021

Uh oh!

Kerl13 commented Feb 16, 2021

Uh oh!

Niols commented Feb 17, 2021

Uh oh!

Kerl13 commented Feb 17, 2021

Uh oh!

Kerl13 commented Feb 17, 2021

Uh oh!

Niols left a comment

Uh oh!

Niols Feb 18, 2021

Uh oh!

Niols Feb 18, 2021

Uh oh!

Niols Feb 18, 2021

Uh oh!

Niols Feb 18, 2021

Uh oh!

Niols Feb 18, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Faster merge functions #6

Are you sure you want to change the base?

Faster merge functions #6

Uh oh!

Conversation

Kerl13 commented Feb 16, 2021

Uh oh!

Kerl13 commented Feb 16, 2021

Before

After

Uh oh!

Niols commented Feb 17, 2021

Uh oh!

Kerl13 commented Feb 17, 2021

Uh oh!

Kerl13 commented Feb 17, 2021

Uh oh!

Niols left a comment

Choose a reason for hiding this comment

Uh oh!

Niols Feb 18, 2021

Choose a reason for hiding this comment

Uh oh!

Niols Feb 18, 2021

Choose a reason for hiding this comment

Uh oh!

Niols Feb 18, 2021

Choose a reason for hiding this comment

Uh oh!

Niols Feb 18, 2021

Choose a reason for hiding this comment

Uh oh!

Niols Feb 18, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants