Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions ais_bench/benchmark/configs/datasets/mmmu/mmmu_gen.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@
from ais_bench.benchmark.datasets import MMMUDataset, MMMUEvaluator


START_TEXT_PROMPT = "Question: "
END_TEXT_PROMPT = "Please select the correct answer from the options above. \n"
OPTIONS_PROMPT = "\nOptions:\n"
START_TEXT_PROMPT = ""
END_TEXT_PROMPT = "Answer with the option's letter from the given choices directly."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

【review】不应修改原有的配置文件,新增配置文件mmmu_gen_glm4.6v.py

OPTIONS_PROMPT = "\n"

mmmu_reader_cfg = dict(
input_columns=['question', 'image'],
Expand Down
2 changes: 1 addition & 1 deletion ais_bench/benchmark/configs/datasets/mmstar/mmstar_gen.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
template=dict(
round=[
dict(role="HUMAN", prompt_mm={
"text": {"type": "text", "text": "Question: {question}\n"},
"text": {"type": "text", "text": "{question}\nAnswer with the option's letter from the given choices directly."},
"image": {"type": "image_url", "image_url": {"url": "file://{image}"}},
})
]
Expand Down
9 changes: 8 additions & 1 deletion ais_bench/benchmark/datasets/mmmu.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from ais_bench.benchmark.datasets.utils.datasets import get_data_path, toliststr, decode_base64_to_image_file, get_content_str

from .base import BaseDataset
import re

logger = AISLogger()
IMAGE_MAP_LEN = 64
Expand Down Expand Up @@ -349,7 +350,13 @@ def score(self, predictions, references):
pred = pred.replace(char, '')
detail = {'pred': pred, 'answer': refer, 'correct': False}
choices = json.loads(refer['choices'])
infer_res = can_infer(pred, choices)
# infer_res = can_infer(pred, choices)
pattern = r'<\|begin_of_box\|>(.*?)<\|end_of_box\|>'
match = re.search(pattern, pred, re.DOTALL)
if match is not None:
infer_res = match.group(1)
else:
infer_res = ""
Comment on lines +356 to +359
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This if/else block can be simplified to a single line using a conditional expression, which is more concise. Also, for performance, it's recommended to compile the regex pattern on line 354 once outside the loop to avoid recompilation on each iteration.

            infer_res = match.group(1) if match else ""

overall_key = '[' + refer['split'] + ']: Overall'
key_category = '[' + refer['split'] + ']: ' + refer['category']
key_l2_category = '[' + refer['split'] + ']: ' + refer['l2-category']
Expand Down
10 changes: 8 additions & 2 deletions ais_bench/benchmark/datasets/mmstar.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
from ais_bench.benchmark.utils.prompt import AIS_CONTENT_TAG, AIS_TEXT_START, AIS_IMAGE_START

from .base import BaseDataset
import re

IMAGE_MAP_LEN = 64
logger = AISLogger()
Expand Down Expand Up @@ -103,8 +104,13 @@ def score(self, predictions, references):
for pred, refer in zip(predictions, references):
detail = {'pred': pred, 'answer': refer, 'correct': False}
choices = json.loads(refer['choices'])
infer_res = can_infer(pred, choices)

# infer_res = can_infer(pred, choices)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

【review】应保留原有MMStarEvaluator逻辑,新增MMStarEvaluatorGLM重写score函数并在新增配置文件mmstar_gen_glm.py中配置

pattern = r'<\|begin_of_box\|>(.*?)<\|end_of_box\|>'
match = re.search(pattern, pred, re.DOTALL)
if match is not None:
infer_res = match.group(1)
else:
infer_res = ""
Comment on lines +110 to +113
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This if/else block can be simplified to a single line using a conditional expression. This improves conciseness. Additionally, for performance and maintainability, consider compiling the regex pattern on line 108 once outside the loop and reusing it. This pattern is also duplicated in mmmu.py.

            infer_res = match.group(1) if match else ""

key_category = refer['category']
score = 1 if infer_res == refer['answer'] else 0
if score == 1:
Expand Down
2 changes: 1 addition & 1 deletion ais_bench/benchmark/datasets/textvqa.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ def __init__(self):
'?',
'!',
]
self.special_tokens = ['☞', '☟', '☜', '<unk>', '<|im_end|>']
self.special_tokens = ['☞', '☟', '☜', '<unk>', '<|im_end|>', '<|end_of_box|>', '<|begin_of_box|>']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

【review】此处修改可能会导致部分模型精度与过往结果不一致,可使用ais_bench/benchmark/configs/datasets/textvqa/glm4v_textvqa_gen_base64.py配置进行测试


def process_punctuation(self, in_text):
if len(in_text) > MAX_TARGET_LENGTH:
Expand Down
Loading