-
Notifications
You must be signed in to change notification settings - Fork 54
Open
Description
It would be great if all high-scoring small models on MMLU-Pro could be validated to provide reliable and complete scores. These small models are valuable as they're fast and cheap to run while showcasing important trends in model and distillation efficiency.
Small, high-scoring models
QwQ Family
- QwQ-32B-Preview (32B)
- QwQ-32B (32B)
Microsoft/Phi Family
- Phi-4 (14B)
- Phi-4-mini (5.6B)
- Phi3-medium-4k (14B)
Qwen Family
- Qwen2.5-32B (32B)
- Qwen2.5-14B (14B)
Google Family
- Gemma-3-27B-it (27B)
- Gemma-3-12B-it (12B)
- Gemma-2-27B-it (27B)
Mistral Family
- Mistral-Small-instruct (24B)
- Mistral-Small-base (24B)
Other Models
- SkyThought-T1 (32B)
- Reka 3 (21B)
- RRD2.5-9B (9B)
- EXAONE-3.5-32B-Instruct (32B)
- Internlm3-8B-Instruct (8B)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels