Skip to content

Conversation

@hengran
Copy link

@hengran hengran commented Jan 30, 2026

Checklist

  • My model has a model sheet, report, or similar
  • My model has a reference implementation in mteb/models/model_implementations/, this can be as an API. Instruction on how to add a model can be found here
    • No, but there is an existing PR ___
  • The results submitted are obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not trained on the evaluation dataset including training splits. If I have, I have disclosed it clearly.

@KennethEnevoldsen
Copy link
Contributor

@hengran we need the model implemented in mteb to merge a PRs here (see the guide in the checklist).

@KennethEnevoldsen KennethEnevoldsen added the waiting for review of implementation This PR is waiting for an implementation review before merging the results. label Jan 30, 2026
@hengran
Copy link
Author

hengran commented Jan 31, 2026

@hengran we need the model implemented in mteb to merge a PRs here (see the guide in the checklist).

Okay, I have submitted the PR for the boom model implemented in mteb.

@github-actions
Copy link

github-actions bot commented Jan 31, 2026

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: ICT-TIME-and-Querit/BOOM_4B_v1
Tasks: AILACasedocs, AILAStatutes, AfriSentiClassification, AlloProfClusteringS2S.v2, AlloprofReranking, AmazonCounterfactualClassification, AppsRetrieval, ArXivHierarchicalClusteringP2P, ArXivHierarchicalClusteringS2S, ArguAna, ArmenianParaphrasePC, AskUbuntuDupQuestions, BIOSSES, BUCC.v2, Banking77Classification, BelebeleRetrieval, BibleNLPBitextMining, BigPatentClustering.v2, BiorxivClusteringP2P.v2, BornholmBitextMining, BrazilianToxicTweetsClassification, BulgarianStoreReviewSentimentClassfication, CEDRClassification, CLSClusteringP2P.v2, COIRCodeSearchNetRetrieval, CQADupstackGamingRetrieval, CQADupstackUnixRetrieval, CSFDSKMovieReviewSentimentClassification, CTKFactsNLI, CUREv1, CataloniaTweetClassification, ChatDoctorRetrieval, ClimateFEVERHardNegatives, CodeEditSearchRetrieval, CodeFeedbackMT, CodeFeedbackST, CodeSearchNetCCRetrieval, CodeSearchNetRetrieval, CodeTransOceanContest, CodeTransOceanDL, Core17InstructionRetrieval, CosQA, CovidRetrieval, CyrillicTurkicLangClassification, CzechProductReviewSentimentClassification, DBpediaClassification, DS1000Retrieval, DalajClassification, DiaBlaBitextMining, EstonianValenceClassification, FEVERHardNegatives, FaroeseSTS, FiQA2018, FilipinoShopeeReviewsClassification, FinParaSTS, FinQARetrieval, FinanceBenchRetrieval, FinancialPhrasebankClassification, FloresBitextMining, FreshStackRetrieval, GermanSTSBenchmark, GreekLegalCodeClassification, GujaratiNewsClassification, HALClusteringS2S.v2, HC3FinanceRetrieval, HagridRetrieval, HotpotQAHardNegatives, HumanEvalRetrieval, IN22GenBitextMining, ImdbClassification, IndicCrosslingualSTS, IndicGenBenchFloresBitextMining, IndicLangClassification, IndonesianIdClickbaitClassification, IsiZuluNewsClassification, ItaCaseholdClassification, JSICK, KorHateSpeechMLClassification, KorSarcasmClassification, KurdishSentimentClassification, LEMBPasskeyRetrieval, LegalBenchCorporateLobbying, LegalQuAD, LegalSummarization, MBPPRetrieval, MIRACLRetrievalHardNegatives, MLQARetrieval, MTOPDomainClassification, MacedonianTweetSentimentClassification, MalteseNewsClassification, MasakhaNEWSClassification, MasakhaNEWSClusteringS2S, MassiveIntentClassification, MassiveScenarioClassification, MedrxivClusteringP2P.v2, MedrxivClusteringS2S.v2, MindSmallReranking, MultiEURLEXMultilabelClassification, MultiHateClassification, NTREXBitextMining, NepaliNewsClassification, News21InstructionRetrieval, NollySentiBitextMining, NordicLangClassification, NorwegianCourtsBitextMining, NusaParagraphEmotionClassification, NusaTranslationBitextMining, NusaX-senti, NusaXBitextMining, OdiaNewsClassification, OpusparcusPC, PAC, PawsXPairClassification, PlscClusteringP2P.v2, PoemSentimentClassification, PolEmo2.0-OUT, PpcPC, PunjabiNewsClassification, RTE3, Robust04InstructionRetrieval, RomaniBibleClustering, RuBQReranking, SCIDOCS, SIB200ClusteringS2S, SICK-R, STS12, STS13, STS14, STS15, STS17, STS22.v2, STSB, STSBenchmark, STSES, ScalaClassification, SemRel24STS, SentimentAnalysisHindi, SinhalaNewsClassification, SiswatiNewsClassification, SlovakMovieReviewSentimentClassification, SpartQA, SprintDuplicateQuestions, StackExchangeClustering.v2, StackExchangeClusteringP2P.v2, StackOverflowQA, StatcanDialogueDatasetRetrieval, SummEvalSummarization.v2, SwahiliNewsClassification, SwednClusteringP2P, SwissJudgementClassification, SyntheticText2SQL, T2Reranking, TERRa, TRECCOVID, Tatoeba, TempReasonL1, Touche2020Retrieval.v3, ToxicConversationsClassification, TswanaNewsClassification, TweetSentimentExtractionClassification, TweetTopicSingleClassification, TwentyNewsgroupsClustering.v2, TwitterHjerneRetrieval, TwitterSemEval2015, TwitterURLCorpus, VoyageMMarcoReranking, WebLINXCandidatesReranking, WikiCitiesClustering, WikiClusteringP2P.v2, WikiSQLRetrieval, WikipediaRerankingMultilingual, WikipediaRetrievalMultilingual, WinoGrande, XNLI, indonli

Results for ICT-TIME-and-Querit/BOOM_4B_v1

task_name ICT-TIME-and-Querit/BOOM_4B_v1 google/gemini-embedding-001 intfloat/multilingual-e5-large Max result Model with max result In Training Data
AILACasedocs 0.3751 0.4833 0.2643 0.6541 bflhc/Octen-Embedding-8B False
AILAStatutes 0.3553 0.4877 0.2084 0.9313 bflhc/Octen-Embedding-8B False
AfriSentiClassification 0.4443 0.5356 0.455 0.5688 tencent/KaLM-Embedding-Gemma3-12B-2511 False
AlloProfClusteringS2S.v2 0.5922 0.5636 0.3328 0.5965 Qwen/Qwen3-Embedding-8B False
AlloprofReranking 0.7831 0.8177 0.6944 0.8540 bflhc/Octen-Embedding-8B False
AmazonCounterfactualClassification 0.8966 0.8820 0.6965 0.9696 GeoGPT-Research-Project/GeoEmbedding False
AppsRetrieval 0.5765 0.9375 0.3255 0.9729 voyageai/voyage-4-large (embed_dim=2048) False
ArXivHierarchicalClusteringP2P 0.6498 0.6492 0.5569 0.6869 NovaSearch/jasper_en_vision_language_v1 False
ArXivHierarchicalClusteringS2S 0.6390 0.6384 0.5367 0.6548 Qwen/Qwen3-Embedding-8B False
ArguAna 0.5783 0.8644 0.5436 0.8979 voyageai/voyage-3-m-exp False
ArmenianParaphrasePC 0.9479 0.9689 0.9493 0.9703 tencent/KaLM-Embedding-Gemma3-12B-2511 False
AskUbuntuDupQuestions 0.6566 0.6424 0.5924 0.7528 IEITYuan/Yuan-embedding-2.0-en False
BIOSSES 0.8522 0.8897 0.8457 0.9692 Gameselo/STS-multilingual-mpnet-base-v2 False
BUCC.v2 0.9886 0.9899 0.9878 0.9902 GritLM/GritLM-7B False
Banking77Classification 0.8699 0.9427 0.7492 0.9427 google/gemini-embedding-001 False
BelebeleRetrieval 0.7728 0.9073 0.7791 0.9380 clips/e5-base-trm-nl False
BibleNLPBitextMining 0.1212 0.2072 0.1665 0.9899 deepvk/USER-bge-m3 False
BigPatentClustering.v2 0.4195 0.3806 0.3147 0.4453 Salesforce/SFR-Embedding-2_R False
BiorxivClusteringP2P.v2 0.5438 0.5386 0.372 0.8417 codefuse-ai/F2LLM-4B False
BornholmBitextMining 0.5491 0.5169 0.4416 0.7633 Qwen/Qwen3-Embedding-8B False
BrazilianToxicTweetsClassification 0.2014 0.2802 0.2123 0.3157 tencent/KaLM-Embedding-Gemma3-12B-2511 False
BulgarianStoreReviewSentimentClassfication 0.7907 0.7813 0.6385 0.8044 Linq-AI-Research/Linq-Embed-Mistral False
CEDRClassification 0.5000 0.5742 0.4484 0.7301 sergeyzh/BERTA False
CLSClusteringP2P.v2 0.4644 0.4268 0.4037 0.7572 Qwen/Qwen3-Embedding-8B False
COIRCodeSearchNetRetrieval 0.7486 0.8106 nan 0.8979 codefuse-ai/C2LLM-7B False
CQADupstackGamingRetrieval 0.6253 0.7068 0.587 0.8161 IEITYuan/Yuan-embedding-2.0-en False
CQADupstackUnixRetrieval 0.4954 0.5369 0.3988 0.7198 voyageai/voyage-3-m-exp False
CSFDSKMovieReviewSentimentClassification 0.4863 0.4938 0.3484 0.6456 tencent/KaLM-Embedding-Gemma3-12B-2511 False
CTKFactsNLI 0.8316 0.8759 0.7984 0.8993 omarelshehy/arabic-english-sts-matryoshka False
CUREv1 0.6078 0.5957 0.5162 0.6782 voyageai/voyage-4-large (embed_dim=2048) False
CataloniaTweetClassification 0.4998 0.5451 0.504 0.7790 Bytedance/Seed1.6-embedding-1215 False
ChatDoctorRetrieval 0.6935 0.7352 0.5687 0.7722 voyageai/voyage-4-large (embed_dim=2048) False
ClimateFEVERHardNegatives 0.3444 0.3106 0.26 0.5905 IEITYuan/Yuan-embedding-2.0-en False
CodeEditSearchRetrieval 0.6608 0.8161 0.5038 0.9214 Bytedance/Seed1.6-embedding-1215 False
CodeFeedbackMT 0.8340 0.5628 0.4278 0.9432 codefuse-ai/C2LLM-7B False
CodeFeedbackST 0.8217 0.8533 0.7426 0.9067 voyageai/voyage-code-3 False
CodeSearchNetCCRetrieval 0.7675 0.8469 0.7783 0.9790 codefuse-ai/C2LLM-7B False
CodeSearchNetRetrieval 0.8999 0.9133 0.8412 0.9397 voyageai/voyage-code-3 False
CodeTransOceanContest 0.8885 0.8953 0.7403 0.9496 voyageai/voyage-code-3 False
CodeTransOceanDL 0.3444 0.3147 0.3128 0.4419 jinaai/jina-embeddings-v4 False
Core17InstructionRetrieval 0.0577 0.0769 -0.0162 0.1461 nvidia/llama-embed-nemotron-8b False
CosQA 0.2634 0.5024 0.348 0.5218 google/text-embedding-005 False
CovidRetrieval 0.8008 0.7913 0.7561 0.9606 TencentBAC/Conan-embedding-v2 False
CyrillicTurkicLangClassification 0.6736 0.9530 0.4085 0.9905 tencent/KaLM-Embedding-Gemma3-12B-2511 False
CzechProductReviewSentimentClassification 0.6374 0.6816 0.5714 0.7667 Bytedance/Seed1.6-embedding-1215 False
DBpediaClassification 0.9114 0.9476 0.8828 0.9926 Qwen/Qwen3-Embedding-8B False
DS1000Retrieval 0.6366 0.6870 nan 0.7129 voyageai/voyage-4-large (embed_dim=2048) False
DalajClassification 0.4992 0.5047 0.5001 0.6213 tencent/KaLM-Embedding-Gemma3-12B-2511 False
DiaBlaBitextMining 0.8519 0.8723 0.8483 0.8865 nvidia/llama-embed-nemotron-8b False
EstonianValenceClassification 0.4922 0.5352 0.4289 0.6456 tencent/KaLM-Embedding-Gemma3-12B-2511 False
FEVERHardNegatives 0.8935 0.8898 0.8379 0.9453 ByteDance-Seed/Seed1.5-Embedding False
FaroeseSTS 0.7527 0.8612 0.7239 0.9739 Gameselo/STS-multilingual-mpnet-base-v2 False
FiQA2018 0.5489 0.6178 0.4381 0.8206 ai-sage/Giga-Embeddings-instruct False
FilipinoShopeeReviewsClassification 0.4477 0.4845 0.3527 0.5159 tencent/KaLM-Embedding-Gemma3-12B-2511 False
FinParaSTS 0.2179 0.2860 0.2492 0.3456 bflhc/Octen-Embedding-8B False
FinQARetrieval 0.5316 0.6464 nan 0.8897 voyageai/voyage-4-large (embed_dim=2048) False
FinanceBenchRetrieval 0.8177 0.9157 nan 0.9459 bflhc/Octen-Embedding-8B False
FinancialPhrasebankClassification 0.9117 0.8864 0.8394 0.9515 Qwen/Qwen3-Embedding-8B False
FloresBitextMining 0.5954 0.8371 0.8108 0.8596 intfloat/multilingual-e5-large-instruct False
FreshStackRetrieval 0.3583 0.3979 0.2519 0.5776 bflhc/Octen-Embedding-8B False
GermanSTSBenchmark 0.8257 0.8809 0.8408 0.9541 Gameselo/STS-multilingual-mpnet-base-v2 False
GreekLegalCodeClassification 0.3602 0.4376 0.3713 0.8052 Bytedance/Seed1.6-embedding-1215 False
GujaratiNewsClassification 0.8999 0.9205 0.7674 0.9343 Bytedance/Seed1.6-embedding-1215 False
HALClusteringS2S.v2 0.3345 0.3200 0.2261 0.3228 Qwen/Qwen3-Embedding-8B False
HC3FinanceRetrieval 0.7003 0.7758 nan 0.8242 nvidia/NV-Embed-v2 False
HagridRetrieval 0.9893 0.9931 0.9891 0.9931 google/gemini-embedding-001 False
HotpotQAHardNegatives 0.7592 0.8701 0.7055 0.8701 google/gemini-embedding-001 False
HumanEvalRetrieval 0.9714 0.9910 nan 0.9977 bflhc/MoD-Embedding False
IN22GenBitextMining 0.7681 0.9375 0.7675 0.9375 google/gemini-embedding-001 False
ImdbClassification 0.9617 0.9498 0.8867 0.9737 Qwen/Qwen3-Embedding-8B False
IndicCrosslingualSTS 0.5332 0.6287 0.4387 0.8477 Gameselo/STS-multilingual-mpnet-base-v2 False
IndicGenBenchFloresBitextMining 0.8843 0.9677 0.8875 0.9881 Sailesh97/Hinvec False
IndicLangClassification 0.8139 0.8769 0.2025 0.9930 Bytedance/Seed1.6-embedding-1215 False
IndonesianIdClickbaitClassification 0.6026 0.6700 0.6122 0.7560 nvidia/llama-embed-nemotron-8b False
IsiZuluNewsClassification 0.2350 0.4053 0.3241 0.4053 google/gemini-embedding-001 False
ItaCaseholdClassification 0.7213 0.7330 0.6679 0.9439 bigscience/sgpt-bloom-7b1-msmarco False
JSICK 0.8257 0.8499 0.7981 0.8963 bflhc/Octen-Embedding-8B False
KorHateSpeechMLClassification 0.1831 0.1769 0.1049 0.7625 Bytedance/Seed1.6-embedding-1215 False
KorSarcasmClassification 0.8194 0.6051 0.5679 0.6479 tencent/KaLM-Embedding-Gemma3-12B-2511 False
KurdishSentimentClassification 0.7218 0.8639 0.7708 0.9403 Bytedance/Seed1.6-embedding-1215 False
LEMBPasskeyRetrieval 0.8500 0.3850 0.3825 1.0000 tencent/KaLM-Embedding-Gemma3-12B-2511 False
LegalBenchCorporateLobbying 0.9428 0.9598 0.8972 0.9696 voyageai/voyage-3-large False
LegalQuAD 0.5849 0.6553 0.4317 0.7675 baseline/bm25s False
LegalSummarization 0.6468 0.7122 0.621 0.7921 voyageai/voyage-3.5 False
MBPPRetrieval 0.9144 0.9416 nan 0.9608 voyageai/voyage-4-large (embed_dim=2048) False
MIRACLRetrievalHardNegatives 0.6560 0.7042 0.5923 0.7305 nvidia/llama-embed-nemotron-8b False
MLQARetrieval 0.7887 0.8416 0.7566 0.8416 google/gemini-embedding-001 False
MTOPDomainClassification 0.9702 0.9927 0.9097 0.9995 voyageai/voyage-3-m-exp False
MacedonianTweetSentimentClassification 0.6431 0.7183 0.6192 0.7547 Qwen/Qwen3-Embedding-4B False
MalteseNewsClassification 0.3491 0.3738 0.2395 0.6938 Bytedance/Seed1.6-embedding-1215 False
MasakhaNEWSClassification 0.7828 0.8355 0.7754 0.9009 Bytedance/Seed1.6-embedding-1215 False
MasakhaNEWSClusteringS2S 0.4711 0.5745 0.3804 0.7365 Bytedance/Seed1.6-embedding-1215 False
MassiveIntentClassification 0.6705 0.8192 0.6025 0.9194 voyageai/voyage-3-m-exp False
MassiveScenarioClassification 0.8145 0.9208 0.7178 0.9930 voyageai/voyage-3-m-exp False
MedrxivClusteringP2P.v2 0.4796 0.4716 0.3431 0.7199 codefuse-ai/F2LLM-4B False
MedrxivClusteringS2S.v2 0.4606 0.4501 0.3152 0.7023 codefuse-ai/F2LLM-4B False
MindSmallReranking 0.3295 0.3295 0.3024 0.3437 Kingsoft-LLM/QZhou-Embedding False
MultiEURLEXMultilabelClassification 0.0479 0.0528 0.0516 0.0968 Bytedance/Seed1.6-embedding-1215 False
MultiHateClassification 0.6808 0.7247 0.6357 0.8374 tencent/KaLM-Embedding-Gemma3-12B-2511 False
NTREXBitextMining 0.7869 0.9364 0.914 0.9456 tencent/KaLM-Embedding-Gemma3-12B-2511 False
NepaliNewsClassification 0.9564 0.9814 0.8847 0.9817 tencent/KaLM-Embedding-Gemma3-12B-2511 False
News21InstructionRetrieval 0.0032 0.1026 -0.0006 0.1145 google/embeddinggemma-300m False
NollySentiBitextMining 0.3598 0.6871 0.675 0.8083 nvidia/llama-embed-nemotron-8b False
NordicLangClassification 0.6913 0.8597 0.8015 0.9384 tencent/KaLM-Embedding-Gemma3-12B-2511 False
NorwegianCourtsBitextMining 0.9430 0.9342 0.9404 0.9447 OrdalieTech/Solon-embeddings-large-0.1 False
NusaParagraphEmotionClassification 0.4976 0.5638 0.4166 0.8374 Bytedance/Seed1.6-embedding-1215 False
NusaTranslationBitextMining 0.7761 0.7752 0.672 0.9222 Qwen/Qwen3-Embedding-8B False
NusaX-senti 0.7222 0.8031 0.7055 0.8482 Bytedance/Seed1.6-embedding-1215 False
NusaXBitextMining 0.7121 0.8252 0.7267 0.9056 Bytedance/Seed1.6-embedding-1215 False
OdiaNewsClassification 0.8741 0.9184 0.8001 0.9715 Bytedance/Seed1.6-embedding-1215 False
OpusparcusPC 0.9453 0.9662 0.9451 0.9696 tencent/KaLM-Embedding-Gemma3-12B-2511 False
PAC 0.7083 0.7168 0.7033 0.8811 Bytedance/Seed1.6-embedding-1215 False
PawsXPairClassification 0.5730 0.5999 0.5473 0.7557 Bytedance/Seed1.6-embedding-1215 False
PlscClusteringP2P.v2 0.7498 0.7431 0.7161 0.7542 tencent/KaLM-Embedding-Gemma3-12B-2511 False
PoemSentimentClassification 0.6670 0.5966 0.5067 0.8642 Bytedance/Seed1.6-embedding-1215 False
PolEmo2.0-OUT 0.6864 0.7753 0.3648 0.8006 nvidia/llama-embed-nemotron-8b False
PpcPC 0.9190 0.9550 0.9218 0.9554 tencent/KaLM-Embedding-Gemma3-12B-2511 False
PunjabiNewsClassification 0.8471 0.8261 0.807 0.8879 Bytedance/Seed1.6-embedding-1215 False
RTE3 0.8926 0.8955 0.8752 0.9173 Bytedance/Seed1.6-embedding-1215 False
Robust04InstructionRetrieval -0.0771 -0.0241 -0.0748 0.1244 Qwen/Qwen3-Embedding-4B False
RomaniBibleClustering 0.4268 0.4322 0.4092 0.4589 tencent/KaLM-Embedding-Gemma3-12B-2511 False
RuBQReranking 0.6666 0.7384 0.756 0.8051 ai-sage/Giga-Embeddings-instruct False
SCIDOCS 0.1189 0.2515 0.1745 0.5986 IEITYuan/Yuan-embedding-2.0-en False
SIB200ClusteringS2S 0.3814 0.4174 0.3945 0.5126 sbintuitions/sarashina-embedding-v2-1b False
SICK-R 0.7988 0.8275 0.8023 0.9465 Gameselo/STS-multilingual-mpnet-base-v2 False
STS12 0.7716 0.8155 0.8002 0.9546 Gameselo/STS-multilingual-mpnet-base-v2 False
STS13 0.8681 0.8989 0.8155 0.9776 Gameselo/STS-multilingual-mpnet-base-v2 False
STS14 0.7950 0.8541 0.7772 0.9753 Gameselo/STS-multilingual-mpnet-base-v2 False
STS15 0.8706 0.9044 0.8931 0.9811 Gameselo/STS-multilingual-mpnet-base-v2 False
STS17 0.8494 0.8858 0.8214 0.9342 infgrad/Jasper-Token-Compression-600M False
STS22.v2 0.6748 0.7169 0.643 0.7718 Kingsoft-LLM/QZhou-Embedding False
STSB 0.8131 0.8550 0.8236 0.9199 Gameselo/STS-multilingual-mpnet-base-v2 False
STSBenchmark 0.8706 0.8908 0.8729 0.9504 Kingsoft-LLM/QZhou-Embedding False
STSES 0.7882 0.8175 0.8021 0.8231 google/embeddinggemma-300m False
ScalaClassification 0.5129 0.5185 0.5157 0.8626 tencent/KaLM-Embedding-Gemma3-12B-2511 False
SemRel24STS 0.6503 0.7314 0.6266 0.8112 VPLabs/SearchMap_Preview False
SentimentAnalysisHindi 0.7521 0.7606 0.642 0.8001 Qwen/Qwen3-Embedding-8B False
SinhalaNewsClassification 0.6842 0.8229 0.6682 0.8547 tencent/KaLM-Embedding-Gemma3-12B-2511 False
SiswatiNewsClassification 0.4838 0.6238 0.535 0.7837 Lajavaness/bilingual-embedding-small False
SlovakMovieReviewSentimentClassification 0.9033 0.9035 0.7441 0.9539 Bytedance/Seed1.6-embedding-1215 False
SpartQA 0.1104 0.1030 0.0565 0.8483 tencent/KaLM-Embedding-Gemma3-12B-2511 False
SprintDuplicateQuestions 0.9162 0.9690 0.9314 0.9838 Kingsoft-LLM/QZhou-Embedding False
StackExchangeClustering.v2 0.7834 0.9207 0.4643 0.9207 google/gemini-embedding-001 False
StackExchangeClusteringP2P.v2 0.4999 0.5091 0.3854 0.5510 Kingsoft-LLM/QZhou-Embedding False
StackOverflowQA 0.9334 0.9671 0.8889 0.9720 Bytedance/Seed1.6-embedding-1215 False
StatcanDialogueDatasetRetrieval 0.4938 0.5111 0.1063 0.5807 jinaai/jina-embeddings-v4 False
SummEvalSummarization.v2 0.3573 0.3828 0.3141 0.3893 annamodels/LGAI-Embedding-Preview False
SwahiliNewsClassification 0.6590 0.6605 0.5969 0.6753 Qwen/Qwen3-Embedding-8B False
SwednClusteringP2P 0.3764 0.4584 0.3691 0.6213 Qwen/Qwen3-Embedding-4B False
SwissJudgementClassification 0.5616 0.5786 0.5362 0.7791 Bytedance/Seed1.6-embedding-1215 False
SyntheticText2SQL 0.7277 0.6996 0.5307 0.7875 Qwen/Qwen3-Embedding-8B False
T2Reranking 0.6494 0.6795 0.6632 0.7315 tencent/Youtu-Embedding False
TERRa 0.5755 0.6392 0.5842 0.7957 ai-sage/Giga-Embeddings-instruct False
TRECCOVID 0.8206 0.8631 0.7115 0.9833 IEITYuan/Yuan-embedding-2.0-en False
Tatoeba 0.6802 0.8197 0.7573 0.9394 OrlikB/KartonBERT-USE-base-v1 False
TempReasonL1 0.0117 0.0296 0.0114 0.0805 nvidia/llama-embed-nemotron-8b False
Touche2020Retrieval.v3 0.5062 0.5239 0.4959 0.7465 Qwen/Qwen3-Embedding-4B False
ToxicConversationsClassification 0.8887 0.8875 0.6601 0.9759 voyageai/voyage-3-m-exp False
TswanaNewsClassification 0.3205 0.5337 0.47 0.6417 Bytedance/Seed1.6-embedding-1215 False
TweetSentimentExtractionClassification 0.7849 0.6988 0.628 0.8823 voyageai/voyage-3-m-exp False
TweetTopicSingleClassification 0.7791 0.7111 0.6532 0.8561 Bytedance/Seed1.6-embedding-1215 False
TwentyNewsgroupsClustering.v2 0.6588 0.5737 0.3921 0.8758 GeoGPT-Research-Project/GeoEmbedding False
TwitterHjerneRetrieval 0.7393 0.9802 0.3522 0.9802 google/gemini-embedding-001 False
TwitterSemEval2015 0.7345 0.7917 0.7528 0.8946 voyageai/voyage-large-2-instruct False
TwitterURLCorpus 0.8668 0.8705 0.8583 0.9571 TencentBAC/Conan-embedding-v2 False
VoyageMMarcoReranking 0.6003 0.6673 0.6821 0.7351 jinaai/jina-reranker-v3 False
WebLINXCandidatesReranking 0.1414 0.1097 0.0778 0.1792 Bytedance/Seed1.6-embedding-1215 False
WikiCitiesClustering 0.9015 0.9163 0.755 0.9357 Qwen/Qwen3-Embedding-4B False
WikiClusteringP2P.v2 0.2954 0.2823 0.256 0.3295 tencent/KaLM-Embedding-Gemma3-12B-2511 False
WikiSQLRetrieval 0.8865 0.8814 nan 0.9892 bflhc/Octen-Embedding-8B False
WikipediaRerankingMultilingual 0.8721 0.9224 0.8981 0.9308 jinaai/jina-reranker-v3 False
WikipediaRetrievalMultilingual 0.9098 0.9420 0.9111 0.9420 google/gemini-embedding-001 False
WinoGrande 0.2485 0.6052 0.5498 0.8989 tencent/KaLM-Embedding-Gemma3-12B-2511 False
XNLI 0.8059 0.8526 0.7477 0.9291 Bytedance/Seed1.6-embedding-1215 False
indonli 0.5763 0.6069 0.5174 0.6722 Bytedance/Seed1.6-embedding-1215 False
Average 0.6434 0.6895 0.5779 0.7960 nan -

Model have high performance on these tasks: KorSarcasmClassification,HALClusteringS2S.v2


@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented Jan 31, 2026

related PR embeddings-benchmark/mteb#4022

zhanghengran and others added 2 commits February 1, 2026 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

waiting for review of implementation This PR is waiting for an implementation review before merging the results.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants