Skip to content

Inconsistency in GLM4.7 Official Reported Mean Score #8

@Meteor-xx

Description

@Meteor-xx

The GLM4.7 mean score reported in the official report is 8.4%. However, the domain-specific mean scores are listed as:

  • 12.9% (IB)
  • 14.7% (Law)
  • 12.9% (MC)

Assuming there are 160 tasks per domain, these domain-level averages imply an overall mean of approximately 13.5%, which is significantly higher than the reported 8.4%.

This discrepancy suggests a potential inconsistency in how the overall mean score was calculated or reported.


❓ Questions

  • Was the overall mean computed using a different weighting scheme (e.g., uneven number of samples per domain)?
  • Are there additional domains or tasks included in the overall score that are not listed here?
  • Could this be a reporting or calculation error?

📎 Suggestion

It would be helpful to clarify:

  • The exact methodology used to compute the overall mean score
  • Whether all domains are equally weighted
  • The full breakdown of tasks contributing to the final score

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions