feat: Support diffThreshold to handle comparison failures that depend on the image generation environment#176
Conversation
…e image generation environment
|
I missed CONTRIBUTING.md, reviewed it and addressed it accordingly 🙏 |
There was a problem hiding this comment.
Pull request overview
Adds a configurable pixel-diff tolerance (diffThreshold) for golden comparisons so minor rendering differences across environments don’t fail tests unexpectedly.
Changes:
- Introduces
diffThresholdonPlatformGoldensConfigandCiGoldensConfig(validated to be0.0 <= x < 1.0). - Adds
AlchemistFileComparatorand installs it during golden runs whendiffThreshold > 0. - Updates README and adds/extends unit tests around config + comparator behavior.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/src/alchemist_file_comparator.dart | New comparator that tolerates diffs within a configured threshold. |
| lib/src/alchemist_config.dart | Adds diffThreshold to golden config base class and propagates it through variants/copy/merge. |
| lib/src/golden_test_runner.dart | Installs/restores comparator based on diffThreshold during golden runs. |
| lib/src/golden_test.dart | Passes variantConfig.diffThreshold into the runner. |
| lib/alchemist.dart | Exports the new comparator. |
| README.md | Documents the new diffThreshold option. |
| test/src/alchemist_file_comparator_test.dart | New tests for comparator construction and comparison behavior. |
| test/src/alchemist_config_test.dart | New tests for config default/validation/copy/merge of diffThreshold. |
| test/src/golden_test_runner_test.dart | New tests for comparator installation/restoration and unsupported comparator handling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@mataku I think many (all?) of the comments from Copilot are valid here, can you take a look and address? Also, AFAICT, this is only configurable at the suite level, right? Meaning one cannot set a threshold at the test level? That may be OK for v1, but I could see a user wanting to only increase the threshold for a singular test as to not add too much tolerance across the whole suite and risk missing actual regressions. Another thing--It would be great if we could track the largest threshold across the suite and alert the user that they could lower it if the max diff is lower than the allowed threshold. That would help them avoid drift between the actual diff threshold and the one they've set. Curious for your thoughts on the feasibility of that. |
|
RE: my above comment, in this comment I had argued against a per-test threshold, and the reasons laid out there seem logical to me--Just wanted to express that I'm very OK landing a suite level threshold for now and we can see how that goes (I think whenever I've come across functionality like this in other frameworks, it's always suite-wide). A couple comments below that one the idea of a warning for a threshold being set unnecessarily high is mentioned... may be something to consider, but could always be a follow-up. |
…FileComparator type
…ntation The implementation does not print a warning when the diff passes within the threshold. Update the doc comments in AlchemistFileComparator, GoldensConfig, and README to reflect the actual behavior.
Basically yes, but as described in https://github.com/Betterment/alchemist#for-single-tests-or-groups, users can override AlchemistConfig per test by wrapping it in AlchemistConfig.runWithConfig. I think adding a diffThreshold argument directly to the goldenTest function would be a cleaner approach if do that. // some_screen_test.dart
goldenTest(
'my_test',
fileName: 'my_test',
builder: () => const HomeScreen(),
);
AlchemistConfig.runWithConfig(
config: AlchemistConfig.current().merge(
const AlchemistConfig(
ciGoldensConfig: CiGoldensConfig(diffThreshold: 0.5), // overridden in `my_test_2`
),
),
run: () {
goldenTest(
'my_test_2',
fileName: 'my_test_2',
builder: () => const HomeScreen(),
);
},
);
This could be achievable by tracking the diffPercent value and exposing it to users. Users can read this value in a tearDownAll in their flutter_test_config.dart to check if their configured threshold could be lowered. It may be difficult to hooking into the end of the test suite automatically from the library side because I don't know how to hook after all tests in flutter_test for now, so this requires a small amount of user setup. // AlchemistFileComparator
static double maxDiffPercent = 0;
// ... in compare method, track percent value
if (diffThreshold > 0 && result.diffPercent <= diffThreshold) {
if (result.diffPercent > maxDiffPercent) {
maxDiffPercent = result.diffPercent;
}
return true;
}
// flutter_test_config.dart
Future<void> testExecutable(FutureOr<void> Function() testMain) async {
tearDownAll(() {
final max = AlchemistFileComparator.maxDiffPercent;
if (max > 0) {
print('Max observed diff: $max. Consider lowering your diffThreshold.');
}
); |
|
@btrautmann I've addressed copilot comments. How about we discuss handling the case where the threshold is too high in a separate issue (or #90)? I will create a new one if needed. |
btrautmann
left a comment
There was a problem hiding this comment.
domain lgtm 🚀
platform lgtm 💪
Description
Closes #90
Adds a
diffThresholdoption toPlatformGoldensConfigandCiGoldensConfigto allow a configurable tolerance for pixel-level differences in golden tests.This is useful for handling minor rendering differences that can occur across platforms or Flutter versions such as the diff failures reported in above issue, where images are visually identical but fail due to sub-pixel rendering variations.
How it works
AlchemistFileComparatorwraps Flutter'sLocalFileComparatorand overrides the comparison logic.diffThreshold > 0and the diff percentage is within the threshold, the test passes and a warning is printed.diffThresholdmust be between0.0(inclusive) and1.0(exclusive). A value of0.0(the default) means no tolerance is applied and behavior is unchanged.Usage
Type of Change
Note
torelentuse may be OK, but I thinkthreshold(diffThreshold) use is better to understand.