Skip to content

Add option to allow missing or additional detected PyTorch test suites#4052

Open
Flamefire wants to merge 1 commit intoeasybuilders:developfrom
Flamefire:20260122124450_new_pr_pytorch
Open

Add option to allow missing or additional detected PyTorch test suites#4052
Flamefire wants to merge 1 commit intoeasybuilders:developfrom
Flamefire:20260122124450_new_pr_pytorch

Conversation

@Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Jan 22, 2026

(created using eb --new-pr)

This relaxes the PyTorch test evaluation a bit. The easyblock parses the XML files and compares that against the summary output in stdout of the test command. We have 2 cases:

1: There are more failures in the XML files than in the summary -> PyTorch didn't consider something as failed that we do. Very weird and might be an issue with the XML parser.
However this is only a minor issue as we counted too many failures (from the XML files) than might be actually present. So if the allowed-test-failure-count check still succeeds we can ignore this, at least for users.

2: The summary shows a failure we have not found in the XML files -> The XML report might be missing because the test crashed or otherwise didn't write its results.
This is an issue because one test ("suite") might contain 100s of test cases where many could have failed but we didn't count any of those failures.
Of course there might be only a single failure but we cannot know for sure, hence we fail.

I added 2 options: allow_extra_failures & allow_missing_failures for those 2 cases.

They can be set to True/False but also to a maximum number

@boegel boegel added this to the next release (5.2.1?) milestone Jan 28, 2026
@boegel
Copy link
Member

boegel commented Jan 28, 2026

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16
EB_ARGS="--installpath /tmp/$USER/pr4052-PyTorch-2.7.1-CUDA PyTorch-2.7.1-foss-2024a-CUDA-12.6.0.eb"

@boegelbot
Copy link

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=4052 EB_ARGS="--installpath /tmp/$USER/pr4052-PyTorch-2.7.1-CUDA PyTorch-2.7.1-foss-2024a-CUDA-12.6.0.eb" EB_CONTAINER= EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_4052 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 9510

Test results coming soon (I hope)...

Details

- notification for comment with ID 3809792319 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegel
Copy link
Member

boegel commented Jan 28, 2026

@boegelbot please test @ jsc-zen3
CORE_CNT=16
EB_ARGS="--installpath /tmp/$USER/pr4052-PyTorch-2.6.0 PyTorch-2.6.0-foss-2024a.eb"

@boegelbot
Copy link

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=4052 EB_ARGS="--installpath /tmp/$USER/pr4052-PyTorch-2.6.0 PyTorch-2.6.0-foss-2024a.eb" EB_CONTAINER= EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_4052 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 9520

Test results coming soon (I hope)...

Details

- notification for comment with ID 3811375867 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link

Test report by @boegelbot

Overview of tested easyconfigs (in order)

  • SUCCESS PyTorch-2.6.0-foss-2024a.eb

Build succeeded for 1 out of 1 (total: 47 hours 36 mins 39 secs) (1 easyconfigs in total)
jsczen3c2.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.7, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.23
See https://gist.github.com/boegelbot/890f3c9807ae1e02967f8c74b4c8d5a8 for a full test report.

@boegelbot
Copy link

Test report by @boegelbot

Overview of tested easyconfigs (in order)

  • SUCCESS PyTorch-2.7.1-foss-2024a-CUDA-12.6.0.eb

Build succeeded for 1 out of 1 (total: 52 hours 4 mins 5 secs) (1 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.7, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 590.44.01, Python 3.9.23
See https://gist.github.com/boegelbot/3833453c591f0f8716185274bcee7d7f for a full test report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants