-
Notifications
You must be signed in to change notification settings - Fork 157
Open
Description
Bug description
When we train an XLNet model with CLM masking, the model prints out its own evaluation metrics (ndcg@k, recall@k, etc.) from trainer.evaluate() step. If we want to apply our own custom metric func using numpy something like below, the metric values do not match, but they match if we use MLM masking instead.
def recall(predicted_items: np.ndarray, real_items: np.ndarray) -> float:
bs, top_k = predicted_items.shape
valid_rows = real_items != 0
# reshape predictions and labels to compare
# the top-10 predicted item-ids with the label id.
real_items = real_items.reshape(bs, 1, -1)
predicted_items = predicted_items.reshape(bs, 1, top_k)
num_relevant = real_items.shape[-1]
predicted_correct_sum = (predicted_items == real_items).sum(-1)
predicted_correct_sum = predicted_correct_sum[valid_rows]
recall_per_row = predicted_correct_sum / num_relevant
return np.mean(recall_per_row)
Steps/Code to reproduce bug
coming soon.
Expected behavior
Environment details
- Transformers4Rec version:
- Platform:
- Python version:
- Huggingface Transformers version:
- PyTorch version (GPU?):
- Tensorflow version (GPU?):