add testing for `Qwen3.5-0.8B` to GSM8K eval by puririshi98 · Pull Request #36291 · vllm-project/vllm

puririshi98 · 2026-03-06T22:53:41Z

part of #36264
adds testing for Qwen3.5-0.8B to GSM8K eval

Signed-off-by: Rishi Puri <riship@nvidia.com>

gemini-code-assist

Code Review

This pull request introduces a new accuracy evaluation test for the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B model. The test suite is comprehensive, leveraging lm-evaluation-harness to check for correctness across various precisions (bfloat16, float16, fp8) and on multiple benchmarks (gsm8k, arc_easy, hellaswag). It also includes sanity checks and a smoke test. My review found a couple of inconsistencies in the documentation within the new test file that should be addressed for clarity and correctness.

gemini-code-assist · 2026-03-06T22:55:10Z

tests/v1/attention/test-qwen-deepseek-1.5b-accuracy.py

+Covers:
+  - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
+  - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B


The docstring states that this test covers both the 1.5B and 7B versions of the DeepSeek-R1-Distill-Qwen model. However, the SMALL_DEEPSEEK_MODELS list (line 149) only includes the configuration for the 1.5B model. Please either add the configuration for the 7B model to the test suite or update the docstring to accurately reflect the tested models.

gemini-code-assist · 2026-03-06T22:55:10Z

tests/v1/attention/test-qwen-deepseek-1.5b-accuracy.py

+  - hellaswag (10-shot)                – sentence completion
+
+Usage:
+    pytest tests/lm_eval_correctness/test_deepseek_small_accuracy.py -v


The file path in the Usage example is incorrect. It points to tests/lm_eval_correctness/test_deepseek_small_accuracy.py, but the actual file path is tests/v1/attention/test-qwen-deepseek-1.5-accuracy.py. This should be updated to reflect the correct path to make the test runnable as documented.

Suggested change

pytest tests/lm_eval_correctness/test_deepseek_small_accuracy.py -v

pytest tests/v1/attention/test-qwen-deepseek-1.5-accuracy.py -v

mergify · 2026-03-06T23:06:33Z

Hi @puririshi98, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

…accuracy.py Signed-off-by: Rishi Puri <riship@nvidia.com>

mergify · 2026-03-09T03:48:34Z

Hi @puririshi98, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Rishi Puri <riship@nvidia.com>

mergify · 2026-03-09T04:55:20Z

Hi @puririshi98, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

- Test DeepSeek-R1-Distill-Qwen-1.5B across multiple precisions (bf16, fp16, fp8, auto) - Evaluate on GSM8K (5-shot), ARC Easy (25-shot), and HellaSwag (10-shot) - Use vLLM's native evaluate_gsm8k() harness instead of lm-eval for GSM8K - Include FP8 regression test with 6% threshold - Set default GPU memory utilization to 0.3 for test environments Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Rishi Puri <riship@nvidia.com>

Signed-off-by: Rishi Puri <riship@nvidia.com>

jasonlizhengjian · 2026-03-10T16:19:09Z

/gemini review

gemini-code-assist

Code Review

The pull request introduces a new configuration file for the Qwen3.5-0.8B model and updates the models-small.txt list to include it. The changes are straightforward and align with the stated objective of adding new model evaluations. No critical or high-severity issues were identified in the changes.

_{Note: Security Review is unavailable for this PR.}

jasonlizhengjian

LGTM , please change the title to reflect the contents. Also consider adding a more descriptive PR description

Create test-qwen-deepseek-1.5-accuracy.py

877c553

Signed-off-by: Rishi Puri <riship@nvidia.com>

puririshi98 mentioned this pull request Mar 6, 2026

[Tracking issue]: NVIDIA CI improvements #36264

Open

mergify bot added deepseek Related to DeepSeek models qwen Related to Qwen models v1 labels Mar 6, 2026

gemini-code-assist bot reviewed Mar 6, 2026

View reviewed changes

Rename test-qwen-deepseek-1.5-accuracy.py to test-qwen-deepseek-1.5b-…

27705ef

…accuracy.py Signed-off-by: Rishi Puri <riship@nvidia.com>

puririshi98 and others added 4 commits March 8, 2026 21:45

Update test-qwen-deepseek-1.5b-accuracy.py

46cb85c

Signed-off-by: Rishi Puri <riship@nvidia.com>

Create __init__.py

8071388

Signed-off-by: Rishi Puri <riship@nvidia.com>

Update test-qwen-deepseek-1.5b-accuracy.py

043a58f

Signed-off-by: Rishi Puri <riship@nvidia.com>

Merge branch 'main' into patch-3

fd677e8

puririshi98 and others added 6 commits March 9, 2026 05:14

Merge branch 'main' into patch-3

5766643

Merge branch 'main' into patch-3

cfe9be2

Delete tests/v1/attention/test-qwen-deepseek-1.5b-accuracy.py

807825b

Signed-off-by: Rishi Puri <riship@nvidia.com>

Delete tests/v1/attention/__init__.py

445fb44

Signed-off-by: Rishi Puri <riship@nvidia.com>

Update models-small.txt

89c2828

Signed-off-by: Rishi Puri <riship@nvidia.com>

puririshi98 requested a review from mgoin as a code owner March 9, 2026 22:58

Create Qwen3.5-0.8B.yaml

b20ef27

Signed-off-by: Rishi Puri <riship@nvidia.com>

gemini-code-assist bot reviewed Mar 10, 2026

View reviewed changes

jasonlizhengjian reviewed Mar 10, 2026

View reviewed changes

puririshi98 changed the title ~~Create test-qwen-deepseek-1.5-accuracy.py~~ add testing for Qwen3.5-0.8B to GSM8K eval Mar 10, 2026

puririshi98 added 2 commits March 10, 2026 15:27

Merge branch 'main' into patch-3

1a50c2e

Merge branch 'main' into patch-3

6b8d135

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add testing for `Qwen3.5-0.8B` to GSM8K eval#36291

add testing for `Qwen3.5-0.8B` to GSM8K eval#36291
puririshi98 wants to merge 15 commits intovllm-project:mainfrom
puririshi98:patch-3

puririshi98 commented Mar 6, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 6, 2026

Uh oh!

gemini-code-assist bot Mar 6, 2026

Uh oh!

mergify bot commented Mar 6, 2026

Uh oh!

mergify bot commented Mar 9, 2026

Uh oh!

mergify bot commented Mar 9, 2026

Uh oh!

jasonlizhengjian commented Mar 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

jasonlizhengjian left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	pytest tests/lm_eval_correctness/test_deepseek_small_accuracy.py -v
	pytest tests/v1/attention/test-qwen-deepseek-1.5-accuracy.py -v

Uh oh!

Conversation

puririshi98 commented Mar 6, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 6, 2026

Uh oh!

mergify bot commented Mar 9, 2026

Uh oh!

mergify bot commented Mar 9, 2026

Uh oh!

jasonlizhengjian commented Mar 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

jasonlizhengjian left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

puririshi98 commented Mar 6, 2026 •

edited by github-actions bot

Loading