Inconsistency in GLM4.7 Official Reported Mean Score

The GLM4.7 mean score reported in the official report is **8.4%**. However, the domain-specific mean scores are listed as:

- **12.9%** (IB)  
- **14.7%** (Law)  
- **12.9%** (MC)

Assuming there are **160 tasks per domain**, these domain-level averages imply an overall mean of approximately **13.5%**, which is significantly higher than the reported **8.4%**.

This discrepancy suggests a potential inconsistency in how the overall mean score was calculated or reported.

---

### ❓ Questions

- Was the overall mean computed using a different weighting scheme (e.g., uneven number of samples per domain)?
- Are there additional domains or tasks included in the overall score that are not listed here?
- Could this be a reporting or calculation error?

---

### 📎 Suggestion

It would be helpful to clarify:
- The exact methodology used to compute the overall mean score  
- Whether all domains are equally weighted  
- The full breakdown of tasks contributing to the final score  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistency in GLM4.7 Official Reported Mean Score #8

❓ Questions

📎 Suggestion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistency in GLM4.7 Official Reported Mean Score #8

Description

❓ Questions

📎 Suggestion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions