Skip to content

feat: Support diffThreshold to handle comparison failures that depend on the image generation environment#176

Merged
btrautmann merged 8 commits intoBetterment:mainfrom
mataku:feature/diff-threshold
Mar 13, 2026
Merged

feat: Support diffThreshold to handle comparison failures that depend on the image generation environment#176
btrautmann merged 8 commits intoBetterment:mainfrom
mataku:feature/diff-threshold

Conversation

@mataku
Copy link
Contributor

@mataku mataku commented Mar 5, 2026

Description

Closes #90

Adds a diffThreshold option to PlatformGoldensConfig and CiGoldensConfig to allow a configurable tolerance for pixel-level differences in golden tests.

This is useful for handling minor rendering differences that can occur across platforms or Flutter versions such as the diff failures reported in above issue, where images are visually identical but fail due to sub-pixel rendering variations.

How it works

  • A new AlchemistFileComparator wraps Flutter's LocalFileComparator and overrides the comparison logic.
  • When diffThreshold > 0 and the diff percentage is within the threshold, the test passes and a warning is printed.
  • diffThreshold must be between 0.0 (inclusive) and 1.0 (exclusive). A value of 0.0 (the default) means no tolerance is applied and behavior is unchanged.
    • Since a value of 1.0 would always pass tests regardless of any pixel diff, it is not practical to configure, so excluded 1.0 from the valid range.

Usage

AlchemistConfig.runWithConfig(
  config: AlchemistConfig(
    platformGoldensConfig: PlatformGoldensConfig(
      diffThreshold: 0.001, // allow up to 0.1% pixel difference
    ),
    ciGoldensConfig: CiGoldensConfig(
      diffThreshold: 0.1, // allow up to 10% pixel difference
    ),
  ),
  run: testMain,
);

Type of Change

  • ✨ New feature (non-breaking change which adds functionality)
  • 🛠️ Bug fix (non-breaking change which fixes an issue)
  • ❌ Breaking change (fix or feature that would cause existing functionality to change)
  • 🧹 Code refactor
  • ✅ Build configuration change
  • 📝 Documentation
  • 🗑️ Chore

Note

  • torelent use may be OK, but I think threshold (diffThreshold) use is better to understand.
  • I have also added descriptions for the parameters in the README.md. Please let me know if any further changes are required for documentation.
  • tested with my sample repo: https://github.com/mataku/flutter_snippets/compare/feature/test-diff-threshold
    • changed title but test passed because of diffThreshold value
    • Test will fail if diffThreshold sets to 0

@mataku mataku changed the title Support diffThreshold to handle comparison failures that depend on the image generation environment feat: Support diffThreshold to handle comparison failures that depend on the image generation environment Mar 5, 2026
@mataku mataku marked this pull request as ready for review March 5, 2026 02:26
@mataku mataku marked this pull request as draft March 5, 2026 16:04
@mataku mataku marked this pull request as ready for review March 5, 2026 16:12
@mataku
Copy link
Contributor Author

mataku commented Mar 6, 2026

I missed CONTRIBUTING.md, reviewed it and addressed it accordingly 🙏

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a configurable pixel-diff tolerance (diffThreshold) for golden comparisons so minor rendering differences across environments don’t fail tests unexpectedly.

Changes:

  • Introduces diffThreshold on PlatformGoldensConfig and CiGoldensConfig (validated to be 0.0 <= x < 1.0).
  • Adds AlchemistFileComparator and installs it during golden runs when diffThreshold > 0.
  • Updates README and adds/extends unit tests around config + comparator behavior.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
lib/src/alchemist_file_comparator.dart New comparator that tolerates diffs within a configured threshold.
lib/src/alchemist_config.dart Adds diffThreshold to golden config base class and propagates it through variants/copy/merge.
lib/src/golden_test_runner.dart Installs/restores comparator based on diffThreshold during golden runs.
lib/src/golden_test.dart Passes variantConfig.diffThreshold into the runner.
lib/alchemist.dart Exports the new comparator.
README.md Documents the new diffThreshold option.
test/src/alchemist_file_comparator_test.dart New tests for comparator construction and comparison behavior.
test/src/alchemist_config_test.dart New tests for config default/validation/copy/merge of diffThreshold.
test/src/golden_test_runner_test.dart New tests for comparator installation/restoration and unsupported comparator handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@btrautmann
Copy link
Contributor

@mataku I think many (all?) of the comments from Copilot are valid here, can you take a look and address?

Also, AFAICT, this is only configurable at the suite level, right? Meaning one cannot set a threshold at the test level? That may be OK for v1, but I could see a user wanting to only increase the threshold for a singular test as to not add too much tolerance across the whole suite and risk missing actual regressions.

Another thing--It would be great if we could track the largest threshold across the suite and alert the user that they could lower it if the max diff is lower than the allowed threshold. That would help them avoid drift between the actual diff threshold and the one they've set. Curious for your thoughts on the feasibility of that.

@btrautmann
Copy link
Contributor

RE: my above comment, in this comment I had argued against a per-test threshold, and the reasons laid out there seem logical to me--Just wanted to express that I'm very OK landing a suite level threshold for now and we can see how that goes (I think whenever I've come across functionality like this in other frameworks, it's always suite-wide).

A couple comments below that one the idea of a warning for a threshold being set unnecessarily high is mentioned... may be something to consider, but could always be a follow-up.

mataku added 2 commits March 11, 2026 23:11
…ntation

The implementation does not print a warning when the diff passes within
the threshold. Update the doc comments in AlchemistFileComparator,
GoldensConfig, and README to reflect the actual behavior.
@mataku
Copy link
Contributor Author

mataku commented Mar 11, 2026

@btrautmann

Also, AFAICT, this is only configurable at the suite level, right? Meaning one cannot set a threshold at the test level?

Basically yes, but as described in https://github.com/Betterment/alchemist#for-single-tests-or-groups, users can override AlchemistConfig per test by wrapping it in AlchemistConfig.runWithConfig. I think adding a diffThreshold argument directly to the goldenTest function would be a cleaner approach if do that.

// some_screen_test.dart
goldenTest(
    'my_test',
    fileName: 'my_test',
    builder: () => const HomeScreen(),
);

AlchemistConfig.runWithConfig(
  config: AlchemistConfig.current().merge(
    const AlchemistConfig(
      ciGoldensConfig: CiGoldensConfig(diffThreshold: 0.5), // overridden in `my_test_2`
    ),
  ),
  run: () {
    goldenTest(
      'my_test_2',
      fileName: 'my_test_2',
      builder: () => const HomeScreen(),
    );
  },
);


A couple comments below that one the idea of a warning for a threshold being set unnecessarily high is mentioned... may be something to consider, but could always be a follow-up.

This could be achievable by tracking the diffPercent value and exposing it to users. Users can read this value in a tearDownAll in their flutter_test_config.dart to check if their configured threshold could be lowered.

It may be difficult to hooking into the end of the test suite automatically from the library side because I don't know how to hook after all tests in flutter_test for now, so this requires a small amount of user setup.

// AlchemistFileComparator
static double maxDiffPercent = 0;

// ... in compare method, track percent value
if (diffThreshold > 0 && result.diffPercent <= diffThreshold) {
  if (result.diffPercent > maxDiffPercent) {
    maxDiffPercent = result.diffPercent;
  }
  return true;
}

// flutter_test_config.dart
Future<void> testExecutable(FutureOr<void> Function() testMain) async {
  tearDownAll(() {
    final max = AlchemistFileComparator.maxDiffPercent;
    if (max > 0) {
      print('Max observed diff: $max. Consider lowering your diffThreshold.');
    }
 );

@mataku
Copy link
Contributor Author

mataku commented Mar 13, 2026

@btrautmann I've addressed copilot comments. How about we discuss handling the case where the threshold is too high in a separate issue (or #90)? I will create a new one if needed.

Copy link
Contributor

@btrautmann btrautmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

domain lgtm 🚀
platform lgtm 💪

@btrautmann btrautmann merged commit 571fd7f into Betterment:main Mar 13, 2026
8 checks passed
@mataku mataku deleted the feature/diff-threshold branch March 16, 2026 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

request: Add diff tolerance

3 participants