[test] failed test_multiple_texts_batch_not_nan[mps]

# Summary

Failed on commit: 5c2e86b6a1dda0fed228dc49ba9222e747f9f40b

System MacOS.

```bash
________________________________________________________________________________ test_multiple_texts_batch_not_nan[mps] _________________________________________________________________________________

device = 'mps'

    @pytest.mark.parametrize("device", DEVICES)
    def test_multiple_texts_batch_not_nan(device):
        """Test that attention does look back - model uses previous context."""
        model, processor, collator = setup_model()
        # model.eval()
    
        # Move model to specified device
        model = model.to(torch.device(device))
    
        # Test sequences with shared suffix but different prefix
        texts = ["1", "1 2 3"]
    
        dataset = make_dataset(texts)
        batch = dataset_to_batch(model, processor, collator, dataset)
    
        # TODO: this fails on mps device, because of the attention mask
        #   ONLY when no_grad is used https://github.com/huggingface/transformers/issues/40858
        with torch.no_grad():
            outputs = model(**batch)
>       assert not torch.isnan(outputs.loss).any(), "Loss contains NaN values"
E       AssertionError: Loss contains NaN values
E       assert not tensor(True, device='mps:0')
E        +  where tensor(True, device='mps:0') = <built-in method any of Tensor object at 0x14d4cc690>()
E        +    where <built-in method any of Tensor object at 0x14d4cc690> = tensor(True, device='mps:0').any
E        +      where tensor(True, device='mps:0') = <built-in method isnan of type object at 0x108970ae8>(tensor(nan, device='mps:0'))
E        +        where <built-in method isnan of type object at 0x108970ae8> = torch.isnan
E        +        and   tensor(nan, device='mps:0') = CausalLMOutput(loss=tensor(nan, device='mps:0'), logits=tensor([[[[    nan,     nan,     nan,  ...,     nan,     nan,     nan],\n          [    nan,     nan,     nan,  ...,     nan,     nan,     nan],\n          [    nan,     nan,     nan,  ...,     nan,     nan,     nan]],\n\n         [[    nan,     nan,     nan,  ...,     nan,     nan,     nan],\n          [    nan,     nan,     nan,  ...,     nan,     nan,     nan],\n          [    nan,     nan,     nan,  ...,     nan,     nan,     nan]],\n\n         [[    nan,     nan,     nan,  ...,     nan,     nan,     nan],\n          [    nan,     nan,     nan,  ...,     nan,     nan,     nan],\n          [    nan,     nan,     nan,  ...,     nan,     nan,     nan]],\n\n         [[    nan,     nan,     nan,  ...,     nan,     nan,     nan],\n          [    nan,     nan,     nan,  ...,     nan,     nan,     nan],\n          [    nan,     nan,     nan,  ...,     nan,     nan,     nan]]],\n\n\n        [[[16.8472, -1.3858,  4.5274,  ..., -5.5535, -3.3135, -6.5287],\n          [16.9890,  0.1578,  4.9075,  ..., -8.6196, -1.3776, -6.3025],\n          [17.2965, -0.3873,  4.3442,  ..., -5.2206, -1.9989, -6.2980]],\n\n         [[16.8405, -1.4201,  4.5283,  ..., -5.528....5410, -2.6632,  4.4903,  ..., -4.9362, -2.2061, -7.2465]],\n\n         [[16.8396, -1.4124,  4.5395,  ..., -5.5245, -3.2927, -6.5416],\n          [16.5975, -1.3105,  5.6921,  ..., -7.9449, -2.9000, -5.9860],\n          [17.2833, -0.1983,  6.2825,  ..., -6.3641, -2.2251, -5.7550]],\n\n         [[16.8234, -1.4100,  4.5218,  ..., -5.5249, -3.3002, -6.5371],\n          [16.3851, -1.0723,  5.3863,  ..., -5.4043, -0.5154, -6.8901],\n          [17.6096, -2.5716,  5.6546,  ..., -5.3469, -2.6412, -7.0366]]]],\n       device='mps:0'), hidden_states=(tensor([[[     nan,      nan,      nan,  ...,      nan,      nan,      nan],\n         [     nan,      nan,      nan,  ...,      nan,      nan,      nan],\n         [     nan,      nan,      nan,  ...,      nan,      nan,      nan],\n         [     nan,      nan,      nan,  ...,      nan,      nan,      nan]],\n\n        [[ -6.4927,  -7.4816,  -2.3053,  ..., -12.4484,  12.1303,  -6.5202],\n         [ -6.7761,  -7.4153,  -1.9971,  ..., -12.7430,  12.1251,  -6.6804],\n         [ -6.5594,  -7.4046,  -1.9136,  ..., -12.6101,  12.1708,  -6.5584],\n         [ -6.5865,  -7.3691,  -2.0802,  ..., -12.8613,  11.9366,  -6.5969]]],\n       device='mps:0'),), attentions=None).loss

tests/test_model.py:140: AssertionError
----------------------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------------------
Loading pretrained model from WinKawaks/vit-tiny-patch16-224
Loading pretrained model from prajjwal1/bert-tiny
Loading pretrained model from EleutherAI/pythia-70m
Loading pretrained model from EleutherAI/pythia-70m
Image Encoder Total parameters: 5,561,472
Bytes Encoder Total parameters: 529,152
Latent Transformer Total parameters: 18,915,328
Bytes Decoder Total parameters: 19,312,640
Final Model Total parameters: 44,745,600
----------------------------------------------------------------------------------------- Captured stderr call ------------------------------------------------------------------------------------------
Fetching 1 files: 100%|██████████| 1/1 [00:00<00:00, 2016.49it/s]
Fetching 1 files: 100%|██████████| 1/1 [00:00<00:00, 2007.80it/s]
Some weights of ViTModel were not initialized from the model checkpoint at WinKawaks/vit-tiny-patch16-224 and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
------------------------------------------------------------------------------------------- Captured log call -------------------------------------------------------------------------------------------
WARNING  image_latent_transformer.model:model_utils.py:33 Flash Attention not available, using default attention

```
```bash
FAILED tests/test_model.py::test_multiple_texts_batch_not_nan[mps] - AssertionError: Loss contains NaN values
```

# Reproduce
```bash
pytest
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[test] failed test_multiple_texts_batch_not_nan[mps] #27

Summary

Reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[test] failed test_multiple_texts_batch_not_nan[mps] #27

Description

Summary

Reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions