Skip to content

[test] failed test_multiple_texts_batch_not_nan[mps] #27

@rggdmonk

Description

@rggdmonk

Summary

Failed on commit: 5c2e86b

System MacOS.

________________________________________________________________________________ test_multiple_texts_batch_not_nan[mps] _________________________________________________________________________________

device = 'mps'

    @pytest.mark.parametrize("device", DEVICES)
    def test_multiple_texts_batch_not_nan(device):
        """Test that attention does look back - model uses previous context."""
        model, processor, collator = setup_model()
        # model.eval()
    
        # Move model to specified device
        model = model.to(torch.device(device))
    
        # Test sequences with shared suffix but different prefix
        texts = ["1", "1 2 3"]
    
        dataset = make_dataset(texts)
        batch = dataset_to_batch(model, processor, collator, dataset)
    
        # TODO: this fails on mps device, because of the attention mask
        #   ONLY when no_grad is used https://github.com/huggingface/transformers/issues/40858
        with torch.no_grad():
            outputs = model(**batch)
>       assert not torch.isnan(outputs.loss).any(), "Loss contains NaN values"
E       AssertionError: Loss contains NaN values
E       assert not tensor(True, device='mps:0')
E        +  where tensor(True, device='mps:0') = <built-in method any of Tensor object at 0x14d4cc690>()
E        +    where <built-in method any of Tensor object at 0x14d4cc690> = tensor(True, device='mps:0').any
E        +      where tensor(True, device='mps:0') = <built-in method isnan of type object at 0x108970ae8>(tensor(nan, device='mps:0'))
E        +        where <built-in method isnan of type object at 0x108970ae8> = torch.isnan
E        +        and   tensor(nan, device='mps:0') = CausalLMOutput(loss=tensor(nan, device='mps:0'), logits=tensor([[[[    nan,     nan,     nan,  ...,     nan,     nan,     nan],\n          [    nan,     nan,     nan,  ...,     nan,     nan,     nan],\n          [    nan,     nan,     nan,  ...,     nan,     nan,     nan]],\n\n         [[    nan,     nan,     nan,  ...,     nan,     nan,     nan],\n          [    nan,     nan,     nan,  ...,     nan,     nan,     nan],\n          [    nan,     nan,     nan,  ...,     nan,     nan,     nan]],\n\n         [[    nan,     nan,     nan,  ...,     nan,     nan,     nan],\n          [    nan,     nan,     nan,  ...,     nan,     nan,     nan],\n          [    nan,     nan,     nan,  ...,     nan,     nan,     nan]],\n\n         [[    nan,     nan,     nan,  ...,     nan,     nan,     nan],\n          [    nan,     nan,     nan,  ...,     nan,     nan,     nan],\n          [    nan,     nan,     nan,  ...,     nan,     nan,     nan]]],\n\n\n        [[[16.8472, -1.3858,  4.5274,  ..., -5.5535, -3.3135, -6.5287],\n          [16.9890,  0.1578,  4.9075,  ..., -8.6196, -1.3776, -6.3025],\n          [17.2965, -0.3873,  4.3442,  ..., -5.2206, -1.9989, -6.2980]],\n\n         [[16.8405, -1.4201,  4.5283,  ..., -5.528....5410, -2.6632,  4.4903,  ..., -4.9362, -2.2061, -7.2465]],\n\n         [[16.8396, -1.4124,  4.5395,  ..., -5.5245, -3.2927, -6.5416],\n          [16.5975, -1.3105,  5.6921,  ..., -7.9449, -2.9000, -5.9860],\n          [17.2833, -0.1983,  6.2825,  ..., -6.3641, -2.2251, -5.7550]],\n\n         [[16.8234, -1.4100,  4.5218,  ..., -5.5249, -3.3002, -6.5371],\n          [16.3851, -1.0723,  5.3863,  ..., -5.4043, -0.5154, -6.8901],\n          [17.6096, -2.5716,  5.6546,  ..., -5.3469, -2.6412, -7.0366]]]],\n       device='mps:0'), hidden_states=(tensor([[[     nan,      nan,      nan,  ...,      nan,      nan,      nan],\n         [     nan,      nan,      nan,  ...,      nan,      nan,      nan],\n         [     nan,      nan,      nan,  ...,      nan,      nan,      nan],\n         [     nan,      nan,      nan,  ...,      nan,      nan,      nan]],\n\n        [[ -6.4927,  -7.4816,  -2.3053,  ..., -12.4484,  12.1303,  -6.5202],\n         [ -6.7761,  -7.4153,  -1.9971,  ..., -12.7430,  12.1251,  -6.6804],\n         [ -6.5594,  -7.4046,  -1.9136,  ..., -12.6101,  12.1708,  -6.5584],\n         [ -6.5865,  -7.3691,  -2.0802,  ..., -12.8613,  11.9366,  -6.5969]]],\n       device='mps:0'),), attentions=None).loss

tests/test_model.py:140: AssertionError
----------------------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------------------
Loading pretrained model from WinKawaks/vit-tiny-patch16-224
Loading pretrained model from prajjwal1/bert-tiny
Loading pretrained model from EleutherAI/pythia-70m
Loading pretrained model from EleutherAI/pythia-70m
Image Encoder Total parameters: 5,561,472
Bytes Encoder Total parameters: 529,152
Latent Transformer Total parameters: 18,915,328
Bytes Decoder Total parameters: 19,312,640
Final Model Total parameters: 44,745,600
----------------------------------------------------------------------------------------- Captured stderr call ------------------------------------------------------------------------------------------
Fetching 1 files: 100%|██████████| 1/1 [00:00<00:00, 2016.49it/s]
Fetching 1 files: 100%|██████████| 1/1 [00:00<00:00, 2007.80it/s]
Some weights of ViTModel were not initialized from the model checkpoint at WinKawaks/vit-tiny-patch16-224 and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
------------------------------------------------------------------------------------------- Captured log call -------------------------------------------------------------------------------------------
WARNING  image_latent_transformer.model:model_utils.py:33 Flash Attention not available, using default attention
FAILED tests/test_model.py::test_multiple_texts_batch_not_nan[mps] - AssertionError: Loss contains NaN values

Reproduce

pytest

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions