-
Notifications
You must be signed in to change notification settings - Fork 0
Description
In the code below, we used two models of quite different capacities: For bert-score and bertscore-sentence-MNLI, we used Roberta-Large, which is about 1.6GB (default for bert-score implemented in HF's evaluate library). But for bertscore-sentence, which is built on top of sentence-bert, we used all-MiniLM-L6-v2, which has only 80MB. So this gives our bertscore-sentence approach a huge disadvantage. Of course, we pick that one to be fast in pilot studies.
I think if we use a large-capacity model for bertscore-sentence, we can further boost our sentence-based pair-wise approach.
There are two directions we can try:
-
A quick one is we just use a larger model trained by sentence-bert project. Let's try two
all-mpnet-base-v2andall-roberta-large-v1. The former one is still much smaller than Roberta-large but has higher scores according to sentence-bert leader board while the latter one is just RoBERTa-large but trained using sentence-bert's dot-product loss. Thus let's test both of these two versions below:sent_embedder = sentence_transformers.SentenceTransformer("all-mpnet-base-v2") sent_embedder = sentence_transformers.SentenceTransformer("all-roberta-large-v1")
BTW, we can use HF's
transformerslibrary for Sentence-Bert as well. In this way, we don't have importing bothtransformersandsentence_transformers. We can consolidate all code under one framework. -
A slower but completely fair approach: we also use RoBERTa-large (generally trained, not on MNLI) to embed the sentence and extract the embedding corresponding to the [CLS] token. For how to do it, see here.