Hi, Amazing work! Do you have an explanation for why the ADE20K linear probing performance drops when increasing the number of samples seen from 2B to 4B to 8B? <img width="995" alt="Image" src="https://github.com/user-attachments/assets/4a9a9dd5-0032-4d86-8827-5ca971c677ba" /> Thanks!