-
Notifications
You must be signed in to change notification settings - Fork 239
Open
Description
In the following code in deberta/bert.py, why are we not passing in the mask to the MaskedLayerNorm in line 38? If the mask is not needed, can't we directly call hidden_states = self.LayerNorm(hidden_states)?
DeBERTa/DeBERTa/deberta/bert.py
Lines 26 to 39 in 4d7fe0b
| class BertSelfOutput(nn.Module): | |
| def __init__(self, config): | |
| super().__init__() | |
| self.dense = nn.Linear(config.hidden_size, config.hidden_size) | |
| self.LayerNorm = LayerNorm(config.hidden_size, config.layer_norm_eps) | |
| self.dropout = StableDropout(config.hidden_dropout_prob) | |
| self.config = config | |
| def forward(self, hidden_states, input_states, mask=None): | |
| hidden_states = self.dense(hidden_states) | |
| hidden_states = self.dropout(hidden_states) | |
| hidden_states += input_states | |
| hidden_states = MaskedLayerNorm(self.LayerNorm, hidden_states) | |
| return hidden_states |
Thank you in advance!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels