Wall clock time benchmark

As discussed in #62 with @lacava (and discussed a bit in #24 by others last year), I think a wall clock time benchmark would be a really nice complement to comparing over a fixed number of evaluations.

I think fixing the number of evaluations is only one way of enforcing a level playing ground. One could also fix:
- The number of mutations
- The number of subtree evaluations (i.e., count the # of total operators evaluated)
- The number of FLOPS
- The number of copies

or any other way of bottlenecking the internal processes of a search code. I think by measuring against only one of these, it artificially biases a comparison against algorithms which might use more of one of these, for whatever reason.

Some algorithms are sample intensive, while other algorithms do a lot of work between steps. I think that only by comparing algorithms based on # of evaluations, this artificially biases any comparison against sample intensive algorithms.

An algorithm and its implementation are not easily separable, so I would argue that you really need to measure by wall clock time to see the whole picture. Not only is this much more helpful from a user's point of view, but it enables algorithms which are intrinsically designed for parallelism to actually demonstrate their performance increase. The same can be said for other algorithmic sacrifices, like required for rigid data structures, data batching, etc.

Of course, there is no single best solution and every different type of benchmark will provide additional info. So I think this wall clock benchmark should be included with the normal fixed-evaluation benchmark, using a separate set of tuned parameters, which would give a more complete picture of performance.

Finally, I note I have a conflict of interest since PySR/SymbolicRegression.jl are designed for parallelism and fast evaluation, but hopefully my above points are not too influenced by this!

Eager to hear others' thoughts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wall clock time benchmark #67

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Wall clock time benchmark #67

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions