Skip to content
This repository was archived by the owner on Nov 13, 2025. It is now read-only.

Conversation

@connoraird
Copy link
Contributor

Use mean instead on min as it should be more reliable and make less strict

@connoraird connoraird self-assigned this Nov 12, 2025
@connoraird connoraird added the enhancement New feature or request label Nov 12, 2025
Copy link
Member

@paddyroddy paddyroddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Assume 15% was required for this to pass? Seems a fairly high level of degradation

@connoraird
Copy link
Contributor Author

Good idea. Assume 15% was required for this to pass? Seems a fairly high level of degradation

TBH the 15% is quite arbitrary. It was very inconsistent running on my local machine. However, the inconsistencies were usually less than 15% but not always.

@paddyroddy
Copy link
Member

Can the tolerance in the tests be changed?

@connoraird
Copy link
Contributor Author

Can the tolerance in the tests be changed?

As in on a per test basis? Or do you mean not hard code in nox and pass it in posargs? Posargs would be easy.

@paddyroddy
Copy link
Member

I had meant in the tests. But whatever you think is best. I just think 15% degradation seems too high.

@connoraird
Copy link
Contributor Author

I had meant in the tests. But whatever you think is best. I just think 15% degradation seems too high.

I can't see anyway to have a per test comparison. Unless we filtered the tests within nox via pytest -k ...

@connoraird
Copy link
Contributor Author

Looking at the runs for this PR, most of the stats seem reasonable but you do get some strange ones. For example, this line has a standard deviation much larger than the mean

Name (time in us)                                                                                  Min                    Max                  Mean              StdDev                Median                 IQR            Outliers          OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_broadcast_first[numpy]                                                                    29.3750 (1.72)     43,014.5230 (661.64)      35.0362 (1.92)     382.9808 (134.39)      30.8480 (1.75)       0.8800 (2.31)       1;1213  28,541.9155 (0.52)      12597           1

@connoraird
Copy link
Contributor Author

This snippet from pytest-benchmark docs seems relevant
Screenshot 2025-11-13 at 10 20 00

@paddyroddy
Copy link
Member

Does asv have this issue? Or pytest-codspeed?

@connoraird
Copy link
Contributor Author

Does asv have this issue? Or pytest-codspeed?

In both pytest-benchmark and pytest-codspeed, there is a "pedantic" mode in which you can specify how the stats should be collected. Perhaps that will be useful?

Also we could try using a different timer function as suggested here

Screenshot 2025-11-13 at 10 34 40

@connoraird
Copy link
Contributor Author

Does asv have this issue? Or pytest-codspeed?

I can't see why there would be a difference in relation to this unreliability. In my opinion, the issue is related to the fact we cannot guarantee the state of the machine/VM we are running on which will be the case no matter what tool we use.

@connoraird connoraird changed the title Attempt to make regression tests more reliable gh-21: Attempt to make regression tests more reliable Nov 13, 2025
@paddyroddy
Copy link
Member

Also we could try using a different timer function as suggested here

Let's give this a go

@connoraird
Copy link
Contributor Author

Closing as moved to glass repo glass-dev/glass#780

@connoraird connoraird closed this Nov 13, 2025
@paddyroddy paddyroddy deleted the connor/make-benchmarks-more-reliable branch November 13, 2025 16:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants