Discrepancy in VBench results for MEMFLOW (5s setting)

Hi, Thank you for your excellent work and for open-sourcing this project.

I am currently testing the VBench 5s setting (which I understand to be a more solid/stable configuration), but I’ve encountered a discrepancy between my local reproduction results and those reported in the paper.

Here are the details of my setup:

Prompts: I used the all_dimension_extended.txt prompt file.

Comparison: In the attached Image 1, the top row represents the official reported results, while the second row shows my local test results.

Specific Scores: The screenshots also include the detailed scores for each individual dimension for MEMFLOW.

Observations:
As shown in the logs, my local scores for "Quality," "Semantic," and "Total" are consistently lower than the official benchmarks (e.g., Total Score: ~83.34 vs Official: 85.14).
I am a bit confused by this gap and would appreciate any insights you could share. 

Thank you for your time and help!


<img width="1897" height="301" alt="Image" src="https://github.com/user-attachments/assets/29f33b63-d25e-46f6-b193-6aa9aa6469b6" />

<img width="1409" height="504" alt="Image" src="https://github.com/user-attachments/assets/170ad019-5f36-4d96-af97-b2630767a7d7" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancy in VBench results for MEMFLOW (5s setting) #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Discrepancy in VBench results for MEMFLOW (5s setting) #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions