How to bulid index for dataset in BEIR?

I tried to download scifact and use corpus.json in the zip file to bulit faiss index on it with Contriever (from Hugging Face). But no matter what type of index I choose, its performance is very poor. In fact, I don't even know which part of corpus.json should be used as "contents"，So I only used "abstract" as the "contents" and "doc_id" as "id".
Below are the instructions I wrote following the guidelines from Pyserini：
1. python -m pyserini.encode input --corpus /home/scifact.jsonl --fields text --delimiter "\n" --shard-id 0 --shard-num 1 output --embeddings /home/encoding --to-faiss encoder --encoder /home/facebook/contriever --fields text --batch 32 --fp16  
2. python -m pyserini.index.faiss --input /home/encoding --output /home/index --hnsw  (I tried all types of index Pyserini supports)
Could you tell me how to handle several datasets in BEIR and use Pyserini to build indexes for them? (It would be best if there are instructions or processed .jsonl file samples.)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to bulid index for dataset in BEIR? #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to bulid index for dataset in BEIR? #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions