Description
Hi, thank you for open-sourcing this model.
I am encountering a KeyError when trying to load nvidia/C-RADIOv3-Lfrom Hugging Face using AutoModel.from_pretrained with trust_remote_code=True
It seems that the custom loading logic in dinov2_arch.py expects specific keys (ls1.gamma or ls1.grandma) that are missing from the provided state dictionary (weights).
Reproduction Code
import torch
from PIL import Image
from transformers import AutoModel, CLIPImageProcessor
hf_repo = "nvidia/C-RADIOv3-L"
# Load model
image_processor = CLIPImageProcessor.from_pretrained(hf_repo)
model = AutoModel.from_pretrained(hf_repo, trust_remote_code=True)
model.eval().cuda()
# Test run
image = Image.open('./assets/radio.png').convert('RGB')
pixel_values = image_processor(images=image, return_tensors='pt', do_resize=True).pixel_values
pixel_values = pixel_values.cuda()
summary, features = model(pixel_values)
Traceback
File "/home/user/.cache/huggingface/modules/transformers_modules/nvidia/C_hyphen_RADIOv3_hyphen_L/be4e27f93f34e86072243d8596fa06cbea669fd7/dinov2_arch.py", line 309, in _load_from_state_dict
raise KeyError(f"Couldn't find the key {key_a} nor {key_b} in the state dict!")
KeyError: "Couldn't find the key blocks.0.ls1.gamma nor blocks.0.ls1.grandma in the state dict!"