Grok 4, Sonnet-4 and new run updates

I see a lot of exciting updates on X https://x.com/METR_Evals/status/1950740117020389870

and on the top chart here: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/

Would love to see all those juicy details in [`data/external/all_runs.jsonl`](https://github.com/METR/eval-analysis-public/blob/main/data/external/all_runs.jsonl)! 

Thanks for this incredibly important project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Grok 4, Sonnet-4 and new run updates #27

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Grok 4, Sonnet-4 and new run updates #27

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions