-
Notifications
You must be signed in to change notification settings - Fork 0
gh-21: Attempt to make regression tests more reliable #20
Conversation
paddyroddy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Assume 15% was required for this to pass? Seems a fairly high level of degradation
TBH the 15% is quite arbitrary. It was very inconsistent running on my local machine. However, the inconsistencies were usually less than 15% but not always. |
|
Can the tolerance in the tests be changed? |
As in on a per test basis? Or do you mean not hard code in nox and pass it in posargs? Posargs would be easy. |
|
I had meant in the tests. But whatever you think is best. I just think 15% degradation seems too high. |
I can't see anyway to have a per test comparison. Unless we filtered the tests within nox via |
|
Looking at the runs for this PR, most of the stats seem reasonable but you do get some strange ones. For example, this line has a standard deviation much larger than the mean |
|
This snippet from pytest-benchmark docs seems relevant |
|
Does |
In both pytest-benchmark and pytest-codspeed, there is a "pedantic" mode in which you can specify how the stats should be collected. Perhaps that will be useful? Also we could try using a different timer function as suggested here
|
…ure threshold a little more sensitive
I can't see why there would be a difference in relation to this unreliability. In my opinion, the issue is related to the fact we cannot guarantee the state of the machine/VM we are running on which will be the case no matter what tool we use. |
Let's give this a go |
|
Closing as moved to glass repo glass-dev/glass#780 |


Use mean instead on min as it should be more reliable and make less strict