Skip to content

Wall clock time benchmark #67

@MilesCranmer

Description

@MilesCranmer

As discussed in #62 with @lacava (and discussed a bit in #24 by others last year), I think a wall clock time benchmark would be a really nice complement to comparing over a fixed number of evaluations.

I think fixing the number of evaluations is only one way of enforcing a level playing ground. One could also fix:

  • The number of mutations
  • The number of subtree evaluations (i.e., count the # of total operators evaluated)
  • The number of FLOPS
  • The number of copies

or any other way of bottlenecking the internal processes of a search code. I think by measuring against only one of these, it artificially biases a comparison against algorithms which might use more of one of these, for whatever reason.

Some algorithms are sample intensive, while other algorithms do a lot of work between steps. I think that only by comparing algorithms based on # of evaluations, this artificially biases any comparison against sample intensive algorithms.

An algorithm and its implementation are not easily separable, so I would argue that you really need to measure by wall clock time to see the whole picture. Not only is this much more helpful from a user's point of view, but it enables algorithms which are intrinsically designed for parallelism to actually demonstrate their performance increase. The same can be said for other algorithmic sacrifices, like required for rigid data structures, data batching, etc.

Of course, there is no single best solution and every different type of benchmark will provide additional info. So I think this wall clock benchmark should be included with the normal fixed-evaluation benchmark, using a separate set of tuned parameters, which would give a more complete picture of performance.

Finally, I note I have a conflict of interest since PySR/SymbolicRegression.jl are designed for parallelism and fast evaluation, but hopefully my above points are not too influenced by this!

Eager to hear others' thoughts

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions