-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Hi, Thank you for your excellent work and for open-sourcing this project.
I am currently testing the VBench 5s setting (which I understand to be a more solid/stable configuration), but I’ve encountered a discrepancy between my local reproduction results and those reported in the paper.
Here are the details of my setup:
Prompts: I used the all_dimension_extended.txt prompt file.
Comparison: In the attached Image 1, the top row represents the official reported results, while the second row shows my local test results.
Specific Scores: The screenshots also include the detailed scores for each individual dimension for MEMFLOW.
Observations:
As shown in the logs, my local scores for "Quality," "Semantic," and "Total" are consistently lower than the official benchmarks (e.g., Total Score: ~83.34 vs Official: 85.14).
I am a bit confused by this gap and would appreciate any insights you could share.
Thank you for your time and help!
