I try to get the attention weight of the model like this:
outputs = self.model( vision_x=vision_x, lang_x=lang_x, attention_mask=attention_mask, clear_conditioned_layers=clear_conditioned_layers, past_key_values=past_key_values, use_cache=(past_key_values is not None), output_attentions=True, )
However, the attention weight tuple it returns is tuple of None.
I step into the code and find out it might be a bug in MPT codes in "huggingface/modules/transformers_modules/". The parameter output_attentions has been omitted during the calling of function MPTBlock. forward() in blocks.py.
I try to fix this bug but when running it, the code returns back to its original version.
Is there any solution to it?
I try to get the attention weight of the model like this:
outputs = self.model( vision_x=vision_x, lang_x=lang_x, attention_mask=attention_mask, clear_conditioned_layers=clear_conditioned_layers, past_key_values=past_key_values, use_cache=(past_key_values is not None), output_attentions=True, )However, the attention weight tuple it returns is tuple of None.
I step into the code and find out it might be a bug in MPT codes in "huggingface/modules/transformers_modules/". The parameter
output_attentionshas been omitted during the calling of functionMPTBlock. forward()in blocks.py.I try to fix this bug but when running it, the code returns back to its original version.
Is there any solution to it?