Skip to content

pwspen/nuggetbench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

How well can LLMs recognize what geographical areas that chicken nuggets resemble?

Test yourself on the dataset here (18 images)

ModelAccuracy
Human baseline17/18
google/gemini-3-pro-preview9/18
anthropic/claude-opus-4.57/18
qwen/qwen3-vl-235b-a22b-instruct5/18
x-ai/grok-4-fast4/18
openai/gpt-5.22/18

What does this look like to you?

On one hand, it's a chicken nugget. On the other, it.. looks strangely familiar?

To GPT-5, it's Great Britain. To Gemini 3 and Qwen 3, it's Italy. Grok 4 thinks it's Taiwan. Opus gets it right: Argentina.

This benchmark uses all the images I can find on the internet that show chicken nuggets that are clearly shaped like prominent geographical regions (US states, countries, and continents).

Today, we're in the benchmaxxing, Goodhart's Law era of AI progress. If it can be verified, it will be trained on. This causes models to be better at things that are commonly used as measures of their intelligence, but it's unclear to what extent the capability gain from training on narrow tasks applies outside of that domain (like it would for humans). For example, models are fantastic at reading text, but horrible at basic visual tasks.

This benchmark tests for something that is pointless and stupid to train for, while also requiring visual acuity and world knowledge. The hope is that this gives a better check of model ability than more sensible or common measures.

See /tables for per-model results.

See /tables/answers.md for the dataset and to try it for yourself.

To run the benchmark for yourself, clone this repo. You must have uv installed, and an OPENROUTER_API_KEY set as an environment variable. Then, do uv run main.py.

About

Can LLMs see when chicken nuggets resemble geographical areas?

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published