Skip to content

Conversation

@NavneelSinghal
Copy link

This adds a few experiments for linear probing:

  1. Pooling spatially adjacent patches to get bigger and fewer patches, which gives us effectively 5 + n/k^2 patches instead of 5 + n.
  2. The above but without register tokens.
  3. Tied low rank projections (random or learned) from d to d' for each patch token, then a combined linear probe, which gives us effectively 5 + n * d'/d patches.

With rudimentary experiments on BACH, the second method seems to perform better than a CLS-only linear probe, the first seems to roughly match it, and the third one underperforms it quite a bit.

@CLAassistant
Copy link

CLAassistant commented Sep 12, 2025

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants