feat: Support diffThreshold to handle comparison failures that depend on the image generation environment by mataku · Pull Request #176 · Betterment/alchemist

mataku · 2026-03-05T02:09:47Z

Description

Closes #90

Adds a diffThreshold option to PlatformGoldensConfig and CiGoldensConfig to allow a configurable tolerance for pixel-level differences in golden tests.

This is useful for handling minor rendering differences that can occur across platforms or Flutter versions such as the diff failures reported in above issue, where images are visually identical but fail due to sub-pixel rendering variations.

How it works

A new AlchemistFileComparator wraps Flutter's LocalFileComparator and overrides the comparison logic.
When diffThreshold > 0 and the diff percentage is within the threshold, the test passes and a warning is printed.
diffThreshold must be between 0.0 (inclusive) and 1.0 (exclusive). A value of 0.0 (the default) means no tolerance is applied and behavior is unchanged.
- Since a value of 1.0 would always pass tests regardless of any pixel diff, it is not practical to configure, so excluded 1.0 from the valid range.

Usage

AlchemistConfig.runWithConfig(
  config: AlchemistConfig(
    platformGoldensConfig: PlatformGoldensConfig(
      diffThreshold: 0.001, // allow up to 0.1% pixel difference
    ),
    ciGoldensConfig: CiGoldensConfig(
      diffThreshold: 0.1, // allow up to 10% pixel difference
    ),
  ),
  run: testMain,
);

Type of Change

✨ New feature (non-breaking change which adds functionality)
🛠️ Bug fix (non-breaking change which fixes an issue)
❌ Breaking change (fix or feature that would cause existing functionality to change)
🧹 Code refactor
✅ Build configuration change
📝 Documentation
🗑️ Chore

Note

torelent use may be OK, but I think threshold (diffThreshold) use is better to understand.
I have also added descriptions for the parameters in the README.md. Please let me know if any further changes are required for documentation.
tested with my sample repo: https://github.com/mataku/flutter_snippets/compare/feature/test-diff-threshold
- changed title but test passed because of diffThreshold value
- Test will fail if diffThreshold sets to 0

…e image generation environment

mataku · 2026-03-06T12:28:11Z

I missed CONTRIBUTING.md, reviewed it and addressed it accordingly 🙏

Copilot

Pull request overview

Adds a configurable pixel-diff tolerance (diffThreshold) for golden comparisons so minor rendering differences across environments don’t fail tests unexpectedly.

Changes:

Introduces diffThreshold on PlatformGoldensConfig and CiGoldensConfig (validated to be 0.0 <= x < 1.0).
Adds AlchemistFileComparator and installs it during golden runs when diffThreshold > 0.
Updates README and adds/extends unit tests around config + comparator behavior.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
lib/src/alchemist_file_comparator.dart	New comparator that tolerates diffs within a configured threshold.
lib/src/alchemist_config.dart	Adds `diffThreshold` to golden config base class and propagates it through variants/copy/merge.
lib/src/golden_test_runner.dart	Installs/restores comparator based on `diffThreshold` during golden runs.
lib/src/golden_test.dart	Passes `variantConfig.diffThreshold` into the runner.
lib/alchemist.dart	Exports the new comparator.
README.md	Documents the new `diffThreshold` option.
test/src/alchemist_file_comparator_test.dart	New tests for comparator construction and comparison behavior.
test/src/alchemist_config_test.dart	New tests for config default/validation/copy/merge of `diffThreshold`.
test/src/golden_test_runner_test.dart	New tests for comparator installation/restoration and unsupported comparator handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lib/src/golden_test_runner.dart

lib/src/alchemist_file_comparator.dart

lib/src/alchemist_config.dart

README.md

btrautmann · 2026-03-11T13:53:57Z

@mataku I think many (all?) of the comments from Copilot are valid here, can you take a look and address?

Also, AFAICT, this is only configurable at the suite level, right? Meaning one cannot set a threshold at the test level? That may be OK for v1, but I could see a user wanting to only increase the threshold for a singular test as to not add too much tolerance across the whole suite and risk missing actual regressions.

Another thing--It would be great if we could track the largest threshold across the suite and alert the user that they could lower it if the max diff is lower than the allowed threshold. That would help them avoid drift between the actual diff threshold and the one they've set. Curious for your thoughts on the feasibility of that.

btrautmann · 2026-03-11T13:57:35Z

RE: my above comment, in this comment I had argued against a per-test threshold, and the reasons laid out there seem logical to me--Just wanted to express that I'm very OK landing a suite level threshold for now and we can see how that goes (I think whenever I've come across functionality like this in other frameworks, it's always suite-wide).

A couple comments below that one the idea of a warning for a threshold being set unnecessarily high is mentioned... may be something to consider, but could always be a follow-up.

…FileComparator type

…ntation The implementation does not print a warning when the diff passes within the threshold. Update the doc comments in AlchemistFileComparator, GoldensConfig, and README to reflect the actual behavior.

mataku · 2026-03-11T15:06:14Z

@btrautmann

Also, AFAICT, this is only configurable at the suite level, right? Meaning one cannot set a threshold at the test level?

Basically yes, but as described in https://github.com/Betterment/alchemist#for-single-tests-or-groups, users can override AlchemistConfig per test by wrapping it in AlchemistConfig.runWithConfig. I think adding a diffThreshold argument directly to the goldenTest function would be a cleaner approach if do that.

// some_screen_test.dart
goldenTest(
    'my_test',
    fileName: 'my_test',
    builder: () => const HomeScreen(),
);

AlchemistConfig.runWithConfig(
  config: AlchemistConfig.current().merge(
    const AlchemistConfig(
      ciGoldensConfig: CiGoldensConfig(diffThreshold: 0.5), // overridden in `my_test_2`
    ),
  ),
  run: () {
    goldenTest(
      'my_test_2',
      fileName: 'my_test_2',
      builder: () => const HomeScreen(),
    );
  },
);

A couple comments below that one the idea of a warning for a threshold being set unnecessarily high is mentioned... may be something to consider, but could always be a follow-up.

This could be achievable by tracking the diffPercent value and exposing it to users. Users can read this value in a tearDownAll in their flutter_test_config.dart to check if their configured threshold could be lowered.

It may be difficult to hooking into the end of the test suite automatically from the library side because I don't know how to hook after all tests in flutter_test for now, so this requires a small amount of user setup.

// AlchemistFileComparator
static double maxDiffPercent = 0;

// ... in compare method, track percent value
if (diffThreshold > 0 && result.diffPercent <= diffThreshold) {
  if (result.diffPercent > maxDiffPercent) {
    maxDiffPercent = result.diffPercent;
  }
  return true;
}

// flutter_test_config.dart
Future<void> testExecutable(FutureOr<void> Function() testMain) async {
  tearDownAll(() {
    final max = AlchemistFileComparator.maxDiffPercent;
    if (max > 0) {
      print('Max observed diff: $max. Consider lowering your diffThreshold.');
    }
 );

mataku · 2026-03-13T11:05:04Z

@btrautmann I've addressed copilot comments. How about we discuss handling the case where the threshold is too high in a separate issue (or #90)? I will create a new one if needed.

btrautmann

domain lgtm 🚀
platform lgtm 💪

Support diffThreshold to handle comparison failures that depend on th…

be3d6be

…e image generation environment

mataku changed the title ~~Support diffThreshold to handle comparison failures that depend on the image generation environment~~ feat: Support diffThreshold to handle comparison failures that depend on the image generation environment Mar 5, 2026

mataku marked this pull request as ready for review March 5, 2026 02:26

mataku requested review from a team, Kirpal, btrautmann, jeroen-meijer, jolexxa and marcossevilla as code owners March 5, 2026 02:26

mataku added 2 commits March 5, 2026 23:58

Run dart format

f8c1079

Fix baseDir usage

8ec0c5c

mataku marked this pull request as draft March 5, 2026 16:04

fix: correct expected error type in golden test runner test

2357227

mataku marked this pull request as ready for review March 5, 2026 16:12

mataku added 2 commits March 6, 2026 01:33

Run dart analyzer issues

14547f6

Add missing alchemist_file_comparator method tests

6f65b0a

btrautmann requested a review from Copilot March 11, 2026 13:13

Copilot started reviewing on behalf of btrautmann March 11, 2026 13:14 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

mataku added 2 commits March 11, 2026 23:11

fix: move comparator setup into try block and restrict to exact Local…

6d827bc

…FileComparator type

docs: remove incorrect "warning is printed" from diffThreshold docume…

9ad8504

…ntation The implementation does not print a warning when the diff passes within the threshold. Update the doc comments in AlchemistFileComparator, GoldensConfig, and README to reflect the actual behavior.

btrautmann approved these changes Mar 13, 2026

View reviewed changes

btrautmann merged commit 571fd7f into Betterment:main Mar 13, 2026
8 checks passed

mataku deleted the feature/diff-threshold branch March 16, 2026 11:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support diffThreshold to handle comparison failures that depend on the image generation environment#176

feat: Support diffThreshold to handle comparison failures that depend on the image generation environment#176
btrautmann merged 8 commits intoBetterment:mainfrom
mataku:feature/diff-threshold

mataku commented Mar 5, 2026 •

edited

Loading

Uh oh!

mataku commented Mar 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

btrautmann commented Mar 11, 2026

Uh oh!

btrautmann commented Mar 11, 2026

Uh oh!

mataku commented Mar 11, 2026 •

edited

Loading

Uh oh!

mataku commented Mar 13, 2026

Uh oh!

btrautmann left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mataku commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How it works

Usage

Type of Change

Note

Uh oh!

mataku commented Mar 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

btrautmann commented Mar 11, 2026

Uh oh!

btrautmann commented Mar 11, 2026

Uh oh!

mataku commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mataku commented Mar 13, 2026

Uh oh!

btrautmann left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mataku commented Mar 5, 2026 •

edited

Loading

mataku commented Mar 11, 2026 •

edited

Loading