Skip to content

[ci] Update rtol for test_classification#36556

Merged
vllm-bot merged 2 commits intovllm-project:mainfrom
angelayi:175928
Mar 11, 2026
Merged

[ci] Update rtol for test_classification#36556
vllm-bot merged 2 commits intovllm-project:mainfrom
angelayi:175928

Conversation

@angelayi
Copy link
Contributor

@angelayi angelayi commented Mar 9, 2026

Fixes pytorch/pytorch#175928

This test fails when updating torch to 2.11 on L4s, where the vllm output is tensor([0.1023, 0.8977]), while the hf output is tensor([0.1024, 0.8976]). However, if we narrow down the test case to just the failing prompt, this test also fails in torch 2.10. After chatting with @zou3519 we decided to just update the tolerance.

@angelayi angelayi requested a review from noooop as a code owner March 9, 2026 23:04
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the tolerance for a failing test in test_classification.py. The change involves adding an absolute tolerance (atol=1e-4) to the torch.allclose call and explicitly naming the relative tolerance (rtol). This is a standard approach to handle minor floating-point discrepancies that can occur across different hardware or library versions. The change is well-contained and the updated tolerance seems reasonable for this test case. No high or critical severity issues were found in this pull request.

Signed-off-by: angelayi <yiangela7@gmail.com>
@zou3519 zou3519 changed the title [ci] Update atol for test_classification [ci] Update rtol for test_classification Mar 10, 2026
@zou3519
Copy link
Collaborator

zou3519 commented Mar 10, 2026

@noooop any thoughts on this?

We observed in experiments that the test is flaky on main.

  1. I've had this exact test fail on some PRs. After re-trying, the failure goes away. So it looks like the numbers are non-deterministic
  2. on PyTorch 2.10, we have tried sending fewer sequences through the LLM. Some combination of fewer sequences also triggers the assertion error.

To fix the issue, we raised the tolerances up, which I think seems reasonable.

@zou3519 zou3519 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 11, 2026
@noooop
Copy link
Collaborator

noooop commented Mar 11, 2026

@noooop any thoughts on this?

We observed in experiments that the test is flaky on main.

  1. I've had this exact test fail on some PRs. After re-trying, the failure goes away. So it looks like the numbers are non-deterministic
  2. on PyTorch 2.10, we have tried sending fewer sequences through the LLM. Some combination of fewer sequences also triggers the assertion error.

To fix the issue, we raised the tolerances up, which I think seems reasonable.

Will seed_everything and flaky(reruns=3) help, like in #32909?


The recently merged #36385 modified this threshold, which may cause conflicts with this PR.

@zou3519
Copy link
Collaborator

zou3519 commented Mar 11, 2026

@noooop we can try. In the event that seed_everything() doesn't fix the issue, would you be OK with us updating the tolerances? An alternative could also be that we change the prompts.

@zou3519
Copy link
Collaborator

zou3519 commented Mar 11, 2026

Actually I just read #36385 - changing the tolerances that way might make this problem go away too, we'll test that as well.

@noooop
Copy link
Collaborator

noooop commented Mar 11, 2026

I’m okay with changing the tolerances. The threshold is very very tight and often causes flakiness when upgrading dependencies. We also frequently catch minor numerical issues from this case, so I’m not sure if it’s a bug or a feature.

e.g.

#27329 (comment)

However, I’ve been very busy recently and don’t have time to dig into the deep numerical precision issues.

@vllm-bot vllm-bot merged commit 13e79fc into vllm-project:main Mar 11, 2026
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[release 2.11][vllm] Language Models Tests (Extra Standard) 1 test_models[float-jason9693/Qwen2.5-1.5B-apeach]

4 participants