Skip to content

Performance statistics report #2444

@jeromekelleher

Description

@jeromekelleher

It would be very useful to have some simple microbenchmarks we could run from time-to-time. We've tried using ASV, but it's very heavyweight and collects masses of data that's never used.

Requirements:

  • Make a script perf_benchmark.py which runs a standard set of microbenchmarks and outputs the result to a file. The idea is this file should be updated before every release, so that we can spot any perf regressions and also so that we have a per-release history of the benchmarks in the history.
  • Update the developer docs to include how and when to run these benchmarks as part of the release process.

CPU time performance benchmarks:

Given a standard file

  • time to load
  • time to save
  • time to access ts.tables in a loop
  • time to access tables.nodes, tables.individuals etc
  • time to access columns, nodes.flags, etc
  • time to get first tree, ts.first()
  • time to seek to middle tree
  • time to iterate over all trees
  • time to access tree arrays, tree.parent etc.
  • time to decode all variants
  • time to iterate over all rows, with and without metadata
  • time to write to vcf (writing to devnull)

additional as I thought of them (BJ):

  • tree node arrays (postorder et al)
  • tree row accessors (ts.node(42))
  • tree accessors (tree.left_sib(42))
  • iterate over tree.nodes

It would be nice to track the memory usage here, but this may be much more difficult and not worth the effort.

Metadata

Metadata

Assignees

Labels

Infrastructure and toolsDevelopment infrastructure and toolsPerformanceThis issue addresses performance, either runtime or memory

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions