Skip to content

Conversation

@aug6th
Copy link

@aug6th aug6th commented Oct 17, 2025

The eval() method was calling _eval() with only 4 parameters (coordinate, boxes_type, boxes_size, boxes_coordinate) on line 31, but _eval() requires 5 parameters including image_size (line 35-41).

This caused a TypeError when the eval() method was called:
TypeError: _eval() missing 1 required positional argument: 'image_size'

The image_size parameter is essential because _eval() uses it on line 67 to scale normalized coordinates (values between 0 and 1) to pixel coordinates.

This fix adds:

  1. Line 30: Extract image_size from the PIL Image object
  2. Line 32-34: Pass image_size as the 5th parameter to _eval()

This brings eval.py into consistency with all other evaluation scripts (qwen25_vllm_osworld_g_jedi.py, screenspot_v2, screenspot_pro, aguvis, operator) which all correctly pass image_size to _eval().

Note: This bug only affected the eval() method used for standalone testing (line 90-92). The _eval() method itself was always correct and is used properly by all production evaluation scripts.

The eval() method was calling _eval() with only 4 parameters (coordinate,
boxes_type, boxes_size, boxes_coordinate) on line 31, but _eval() requires
5 parameters including image_size (line 35-41).

This caused a TypeError when the eval() method was called:
  TypeError: _eval() missing 1 required positional argument: 'image_size'

The image_size parameter is essential because _eval() uses it on line 67 to
scale normalized coordinates (values between 0 and 1) to pixel coordinates.

This fix adds:
1. Line 30: Extract image_size from the PIL Image object
2. Line 32-34: Pass image_size as the 5th parameter to _eval()

This brings eval.py into consistency with all other evaluation scripts
(qwen25_vllm_osworld_g_jedi.py, screenspot_v2, screenspot_pro, aguvis, operator)
which all correctly pass image_size to _eval().

Note: This bug only affected the eval() method used for standalone testing
(line 90-92). The _eval() method itself was always correct and is used
properly by all production evaluation scripts.
@Timothyxxx Timothyxxx requested a review from YangJL2003 October 17, 2025 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant