Add script for test execution times comparison by wmlynik · Pull Request #411 · phoenix-rtos/phoenix-rtos-tests

wmlynik · 2025-08-11T14:08:24Z

Description

Motivation and Context

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Chore (refactoring, style fixes, git/CI config, submodule management, no code logic changes)

How Has This Been Tested?

Already covered by automatic testing.
New test added: (add PR link here).
Tested by hand on: (list targets here).

Checklist:

My change requires a change to the documentation.
I have updated the documentation accordingly.
I have added tests to cover my changes.
All new and existing linter checks and tests passed.
My changes generate no new compilation warnings for any of the targets.

Special treatment

This PR needs additional PRs to work (list the PRs, preferably in merge-order).
I will merge this PR by myself when appropriate.

gemini-code-assist

Summary of Changes

Hello @wmlynik, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've added a new Python script and a related configuration file to enhance our test analysis capabilities. The primary goal of this change is to provide a powerful tool for comparing test execution times between different runs, which will be invaluable for detecting performance changes. This new utility will allow us to quickly identify which tests have become slower or faster, and also track changes in test status or presence, streamlining our performance monitoring and debugging workflows.

Highlights

New Test Comparison Script: I've introduced a new Python script, cmp.py, which is a command-line utility designed to compare two JUnit XML test result files. This script is crucial for identifying performance regressions or improvements by analyzing the execution times of tests between different runs.
Flexible Filtering Options: The cmp.py script offers robust filtering capabilities, allowing users to focus comparisons on specific targets, directories, test suites, or individual test cases. This enables targeted analysis of performance changes within a large test suite.
Threshold-Based Performance Highlighting: The script can highlight significant performance changes by applying configurable absolute and relative thresholds. This means it can automatically flag tests that have increased or decreased in execution time beyond a specified percentage or absolute value, making it easier to spot critical changes.
Test Status and Failure Reporting: In addition to time comparisons, cmp.py can also report on differences in test status, such as tests that are present in one report but missing in another, or tests whose status has changed (e.g., from 'OK' to 'FAIL' or 'SKIP'). It can also list all failing tests from either report.
New Benchmark Definition File: I've added benchmarks/coremark_pro.yaml, a new YAML file that defines the structure and cases for the CoreMark-Pro benchmark. This file will likely be used by cmp.py to understand and filter test sets related to CoreMark-Pro, providing a structured way to manage benchmark definitions.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new script for comparing test execution times, along with a benchmark configuration file. The script is a great addition, but I've found several issues, including some critical bugs in the filtering logic that need to be addressed. I've also suggested improvements for maintainability and correctness in other parts of the script. Please review the comments for details.

cmp.py

github-actions · 2025-08-11T14:23:07Z

Unit Test Results

9 462 tests +16 8 873 ✅ +16 51m 7s ⏱️ + 3m 16s
561 suites +16 589 💤 ± 0
1 files ± 0 0 ❌ ± 0

Results for commit 6638ab3. ± Comparison against base commit 7688d5f.

♻️ This comment has been updated with latest results.

wmlynik · 2025-08-11T15:09:38Z

@gemini-code-assist review

gemini-code-assist

Code Review

This pull request introduces a new script for comparing test execution times, which is a valuable addition for performance tracking. The script is comprehensive and well-structured. My review focuses on improving correctness, robustness, and maintainability by addressing a potential bug in argument parsing, refactoring complex and duplicated code, and improving type hints for better clarity and static analysis.

cmp.py

nalajcie

I know this is just a draft, I've just pointed out some places the code might me improved upon - to make it more maintainable in the future.

I didn't look at the tests.

I'm not sure but maybe this tool could/should be put in a separate dir if it would help with extra tooling setup for auto-testing (mypy, tox, etc.). Not a requirement as we might just treat it as a development tool - but if we do have automatic tests, it might be good to run them in CI :)

nalajcie · 2025-09-05T07:36:15Z

test_compare/cmp.py

+import yaml
+from junitparser import Error, Failure, JUnitXml, Skipped
+
+ESCAPE_RED = "\033[31m"


use colorama? this way it would be compatible also with eg. windows?

Colorama doesn't support italics and underline, because it's not supported by cmd.exe on windows. Windows Terminal (the default in modern versions of Windows) supports ANSI codes. Should we adjust the styling to maintain compatibility with legacy console?

test_compare/cmp.py

nalajcie · 2025-09-05T08:07:07Z

test_compare/cmp.py

+    return only_old + only_new + with_children
+
+
+def find_fails(results, args, unfiltered=False, level=0, path=None):


could probably be implemented inside Testcase / Testsuite class

I don't see how the entire function could be implemented in Testcase / Testsuite class as it starts recursing from target level. It could make sense to move the part of logic executed at testsuite level to Testsuite class.

The code is really hard to follow, use classes as code decomposition tool (and for implementation details hiding) not only as a storage - in this particular example:

class Testsuite: # [...] def failures(self) -> something: """Returns failed testcases from this testsuite""" return [case for case in self.cases.values() if case.status == Status.FAIL]

If we want to filter something - we pass filter_fun(item) as an argument, not use global filtering function with some strange local variables as params

the Testcase should probably know it's fully-qualified name - that way filtering would be muuuch easier to implement (and more readable)

(the same approach should be done for other classes - move all code which depend only on the contents of the class to the class itself)

nalajcie · 2025-09-05T08:08:13Z

test_compare/cmp.py

+
+
+def find_fails(results, args, unfiltered=False, level=0, path=None):
+    if level == 3:


hard-coding max recursion level in multiple places, would be better to just always check if we have sub-results or not

The recursion levels are hard-coded because the hierarchy has fixed set of levels and there are differences between the levels. It should be now clearer after replacing the numbers with an enum

nalajcie · 2025-09-05T08:08:55Z

test_compare/cmp.py

+            else:
+                print(
+                    f"{ESCAPE_BOLD}{ESCAPE_YELLOW}Warning:{ESCAPE_RESET} "
+                    f"{count_fails(fails)} tests failed in the {name} file "


use real filename not an alias

we were talking about that for a while and we came to the conclusion that we want to keep the apprach with file1 and file2 (instead of old and new), because the file path may be too long and the file name may be the same - we can compare xmls with the same name from different directories.

nalajcie · 2025-09-05T08:09:20Z

test_compare/cmp.py

+
+
+if __name__ == "__main__":
+    main()


think when the program should exit with non-zero return code

nalajcie · 2025-09-05T08:15:32Z

test_compare/cmp.py

+                print(f"Failed tests in {name} file:")
+                print_fails(fails_filtered, args)
+            else:
+                print(


this is a script for diff'ing 2 JUnit XML files - use-case wise - I would like to see that the test started (or stopped) failing without any extra params as this is crucial info.

As the test results are taken from the CI where fails are not intended - we are leaving the warning if any fail happens and even if there is a fail that occur in both XML files it's good to leave the info about that. We really want to make the output clear and readable, but providing that there are some fails (let's say some known issues) we they can take the most of the output. We wanted to focus on mostly one utility in this script, the one that is not supported yet so we chosen times comparison - about fails we know from ci-support chat. Please let us know whether do you agree on that approach, there is the approach of fails investigation:

test_compare/cmp.py

JIRA: CI-578

Older Python versions (before 3.12) don't support nesting the same kind of quotes in f-string JIRA: CI-578

JIRA: CI-578

JIRA:: CI-578

JIRA: CI-578

gemini-code-assist bot reviewed Aug 11, 2025

View reviewed changes

wmlynik force-pushed the wmlynik/tests_compare branch 3 times, most recently from 6f4315c to 0d7c1d4 Compare August 11, 2025 14:56

gemini-code-assist bot reviewed Aug 11, 2025

View reviewed changes

wmlynik force-pushed the wmlynik/tests_compare branch from 5d73f2e to 18653f5 Compare August 11, 2025 15:16

wmlynik force-pushed the wmlynik/tests_compare branch 2 times, most recently from 97097ab to bf092b4 Compare September 3, 2025 14:54

damianloew reviewed Sep 4, 2025

View reviewed changes

cmp.py Outdated Show resolved Hide resolved

cmp.py Show resolved Hide resolved

cmp.py Outdated Show resolved Hide resolved

cmp.py Show resolved Hide resolved

cmp.py Outdated Show resolved Hide resolved

cmp.py Outdated Show resolved Hide resolved

nalajcie reviewed Sep 5, 2025

View reviewed changes

wmlynik force-pushed the wmlynik/tests_compare branch from 71d449f to 3901410 Compare September 9, 2025 11:34

wmlynik force-pushed the wmlynik/tests_compare branch 3 times, most recently from 4f40784 to 1a753ad Compare September 17, 2025 11:40

wmlynik force-pushed the wmlynik/tests_compare branch from ff476e7 to 0997a80 Compare October 23, 2025 15:03

github-actions bot reviewed Oct 28, 2025

View reviewed changes

test_compare/cmp.py Outdated Show resolved Hide resolved

wmlynik force-pushed the wmlynik/tests_compare branch 2 times, most recently from a98ec57 to 518e55c Compare October 28, 2025 11:51

wmlynik force-pushed the wmlynik/tests_compare branch from abc542f to 86b02e8 Compare December 17, 2025 14:12

Władysław Młynik added 8 commits December 17, 2025 15:40

test_compare: add script for comparing test results

e1702a6

JIRA: CI-578

test_compare: fix finding elements present only in one of the files

03263ed

JIRA: CI-578

test_compare: fix printing suites with no passing cases

7bfb6fc

JIRA: CI-578

test_compare: improve displaying tests missing in one of the files

9b18486

JIRA: CI-578

test_compare: refactor displaying time differences

fc50814

JIRA: CI-578

test_compare: add printings failed tests

1aa225b

JIRA: CI-578

test_compare: fix nested quotes for compatibility

e372867

Older Python versions (before 3.12) don't support nesting the same kind of quotes in f-string JIRA: CI-578

test_compare: fix filtering with multiple benchmarks specified

f1ed55f

JIRA: CI-578

Władysław Młynik added 23 commits December 17, 2025 15:40

test_compare: minor fixes

67c62d4

JIRA: CI-578

test_compare: simlify adding rows

a7bea8b

JIRA: CI-578

tests_compare: add tests

5a14546

JIRA: CI-578

tests_compare: improve threshold argument detection

c3eadf7

JIRA: CI-578

tests_comare: run python 3 by default

08f1002

JIRA: CI-578

tests_compare: fill empty location with suite name

2e86c3b

JIRA: CI-578

tests_compare: enable filtering when absolute threshold is set

1ee76b9

JIRA: CI-578

tests_compare: indicate failure by exit code

703607e

JIRA: CI-578

tests_compare: replace integer recursion depth checks with enum

0820c1b

JIRA: CI-578

tests_compare: add row classes with different printing methods

8308102

JIRA: CI-578

tests_compare: refactor filter function

af6cc2a

JIRA: CI-578

tests_compare: use enum instead of strings for status

8d26457

JIRA: CI-578

tests_compare: reorganize order of functions and classes

dbe51ae

JIRA: CI-578

tests_compare: use dataclasses for testsuites and testcases

c826b60

JIRA: CI-578

tests_compare: refactor case comparison

9a6d7e9

JIRA: CI-578

tests_compare: use dataclasses for compared results

a68a2b6

JIRA: CI-578

tests_compare: move row creation to class constructor

43465d9

JIRA: CI-578

tests_compare: remove quotes from benchmark file

4c54c3d

JIRA: CI-578

tests_compare: add type hints

aa11ae3

JIRA: CI-578

tests_compare: improve incorrect input detection

6e2c9d5

JIRA: CI-578

tests_compare: add comments and docstrings

d012695

JIRA: CI-578

tests_compare: remove unnecessary logic from print_rows

61e4189

JIRA:: CI-578

tests_compare: add tests for handling location being an empty string

0d8296d

JIRA: CI-578

wmlynik force-pushed the wmlynik/tests_compare branch from 86b02e8 to 2527c98 Compare December 17, 2025 14:40

tests_compare: move logic to classes

9614fe2

JIRA: CI-578

wmlynik force-pushed the wmlynik/tests_compare branch from 2527c98 to ee31bd2 Compare December 17, 2025 15:17

wmlynik requested review from damianloew and nalajcie December 17, 2025 15:47

tests_compare: change file argument names from old/new to file1/file2

6638ab3

JIRA: CI-578

wmlynik force-pushed the wmlynik/tests_compare branch from ee31bd2 to 6638ab3 Compare December 17, 2025 15:53

		return only_old + only_new + with_children


		def find_fails(results, args, unfiltered=False, level=0, path=None):



		def find_fails(results, args, unfiltered=False, level=0, path=None):
		if level == 3:

Conversation

wmlynik commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Types of changes

How Has This Been Tested?

Checklist:

Special treatment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Test Results

Uh oh!

wmlynik commented Aug 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nalajcie left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wmlynik commented Aug 11, 2025 •

edited

Loading

github-actions bot commented Aug 11, 2025 •

edited

Loading