-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Description
The GLM4.7 mean score reported in the official report is 8.4%. However, the domain-specific mean scores are listed as:
- 12.9% (IB)
- 14.7% (Law)
- 12.9% (MC)
Assuming there are 160 tasks per domain, these domain-level averages imply an overall mean of approximately 13.5%, which is significantly higher than the reported 8.4%.
This discrepancy suggests a potential inconsistency in how the overall mean score was calculated or reported.
❓ Questions
- Was the overall mean computed using a different weighting scheme (e.g., uneven number of samples per domain)?
- Are there additional domains or tasks included in the overall score that are not listed here?
- Could this be a reporting or calculation error?
📎 Suggestion
It would be helpful to clarify:
- The exact methodology used to compute the overall mean score
- Whether all domains are equally weighted
- The full breakdown of tasks contributing to the final score
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels