Question about MS MARCO dataset query count in Table 14

Hi,

Thank you for your great work and for open-sourcing the code!

In issue #12 you mentioned:  
"In our paper and code we used BEIR dataset. We used test split for NQ and HotpotQA, and train split for MS-MARCO."

I'm still a bit confused: Table 14 in the paper lists 6,980 queries for the MS MARCO training set, but the official MS MARCO train.tsv contains far more (~500k+ unique queries with relevance labels).

Could you please share the exact 6,980-query subset used in the paper (or the file/link/processing step that loads/selects these queries)?

Thank you so much for your time and for any clarification!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about MS MARCO dataset query count in Table 14 #28

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about MS MARCO dataset query count in Table 14 #28

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions