-
-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Milestone
Description
Summary
Failed on commit: 5c2e86b
System MacOS.
________________________________________________________________________________ test_multiple_texts_batch_not_nan[mps] _________________________________________________________________________________
device = 'mps'
@pytest.mark.parametrize("device", DEVICES)
def test_multiple_texts_batch_not_nan(device):
"""Test that attention does look back - model uses previous context."""
model, processor, collator = setup_model()
# model.eval()
# Move model to specified device
model = model.to(torch.device(device))
# Test sequences with shared suffix but different prefix
texts = ["1", "1 2 3"]
dataset = make_dataset(texts)
batch = dataset_to_batch(model, processor, collator, dataset)
# TODO: this fails on mps device, because of the attention mask
# ONLY when no_grad is used https://github.com/huggingface/transformers/issues/40858
with torch.no_grad():
outputs = model(**batch)
> assert not torch.isnan(outputs.loss).any(), "Loss contains NaN values"
E AssertionError: Loss contains NaN values
E assert not tensor(True, device='mps:0')
E + where tensor(True, device='mps:0') = <built-in method any of Tensor object at 0x14d4cc690>()
E + where <built-in method any of Tensor object at 0x14d4cc690> = tensor(True, device='mps:0').any
E + where tensor(True, device='mps:0') = <built-in method isnan of type object at 0x108970ae8>(tensor(nan, device='mps:0'))
E + where <built-in method isnan of type object at 0x108970ae8> = torch.isnan
E + and tensor(nan, device='mps:0') = CausalLMOutput(loss=tensor(nan, device='mps:0'), logits=tensor([[[[ nan, nan, nan, ..., nan, nan, nan],\n [ nan, nan, nan, ..., nan, nan, nan],\n [ nan, nan, nan, ..., nan, nan, nan]],\n\n [[ nan, nan, nan, ..., nan, nan, nan],\n [ nan, nan, nan, ..., nan, nan, nan],\n [ nan, nan, nan, ..., nan, nan, nan]],\n\n [[ nan, nan, nan, ..., nan, nan, nan],\n [ nan, nan, nan, ..., nan, nan, nan],\n [ nan, nan, nan, ..., nan, nan, nan]],\n\n [[ nan, nan, nan, ..., nan, nan, nan],\n [ nan, nan, nan, ..., nan, nan, nan],\n [ nan, nan, nan, ..., nan, nan, nan]]],\n\n\n [[[16.8472, -1.3858, 4.5274, ..., -5.5535, -3.3135, -6.5287],\n [16.9890, 0.1578, 4.9075, ..., -8.6196, -1.3776, -6.3025],\n [17.2965, -0.3873, 4.3442, ..., -5.2206, -1.9989, -6.2980]],\n\n [[16.8405, -1.4201, 4.5283, ..., -5.528....5410, -2.6632, 4.4903, ..., -4.9362, -2.2061, -7.2465]],\n\n [[16.8396, -1.4124, 4.5395, ..., -5.5245, -3.2927, -6.5416],\n [16.5975, -1.3105, 5.6921, ..., -7.9449, -2.9000, -5.9860],\n [17.2833, -0.1983, 6.2825, ..., -6.3641, -2.2251, -5.7550]],\n\n [[16.8234, -1.4100, 4.5218, ..., -5.5249, -3.3002, -6.5371],\n [16.3851, -1.0723, 5.3863, ..., -5.4043, -0.5154, -6.8901],\n [17.6096, -2.5716, 5.6546, ..., -5.3469, -2.6412, -7.0366]]]],\n device='mps:0'), hidden_states=(tensor([[[ nan, nan, nan, ..., nan, nan, nan],\n [ nan, nan, nan, ..., nan, nan, nan],\n [ nan, nan, nan, ..., nan, nan, nan],\n [ nan, nan, nan, ..., nan, nan, nan]],\n\n [[ -6.4927, -7.4816, -2.3053, ..., -12.4484, 12.1303, -6.5202],\n [ -6.7761, -7.4153, -1.9971, ..., -12.7430, 12.1251, -6.6804],\n [ -6.5594, -7.4046, -1.9136, ..., -12.6101, 12.1708, -6.5584],\n [ -6.5865, -7.3691, -2.0802, ..., -12.8613, 11.9366, -6.5969]]],\n device='mps:0'),), attentions=None).loss
tests/test_model.py:140: AssertionError
----------------------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------------------
Loading pretrained model from WinKawaks/vit-tiny-patch16-224
Loading pretrained model from prajjwal1/bert-tiny
Loading pretrained model from EleutherAI/pythia-70m
Loading pretrained model from EleutherAI/pythia-70m
Image Encoder Total parameters: 5,561,472
Bytes Encoder Total parameters: 529,152
Latent Transformer Total parameters: 18,915,328
Bytes Decoder Total parameters: 19,312,640
Final Model Total parameters: 44,745,600
----------------------------------------------------------------------------------------- Captured stderr call ------------------------------------------------------------------------------------------
Fetching 1 files: 100%|██████████| 1/1 [00:00<00:00, 2016.49it/s]
Fetching 1 files: 100%|██████████| 1/1 [00:00<00:00, 2007.80it/s]
Some weights of ViTModel were not initialized from the model checkpoint at WinKawaks/vit-tiny-patch16-224 and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
------------------------------------------------------------------------------------------- Captured log call -------------------------------------------------------------------------------------------
WARNING image_latent_transformer.model:model_utils.py:33 Flash Attention not available, using default attention
FAILED tests/test_model.py::test_multiple_texts_batch_not_nan[mps] - AssertionError: Loss contains NaN valuesReproduce
pytestMetadata
Metadata
Assignees
Labels
No labels