The EM and Precision are too low.

gpt-4.1-mini/quality/Dalk: Accuracy = 0.3936
gpt-4.1-mini/Popqa/RAPTOR: Acc = 0.0222, EM = 0.0000, F1 = 0.0189, P = 0.0143, R = 0.0556
gpt-4o/multihop-rag/RAPTOR: Acc = 0.5814, EM = 0.0012, F1 = 0.0263, P = 0.0146, R = 0.3602
gpt-4o/multihop-rag/Dalk: Acc = 0.6491, EM = 0.0814, F1 = 0.1258, P = 0.1080, R = 0.3347
gpt-4o/multihop-rag/HippoRAG: Acc = 0.6463, EM = 0.0000, F1 = 0.0213, P = 0.0111, R = 0.3550
gpt-4o/quality/RAPTOR: Accuracy = 0.4752
gpt-4-turbo/multihop-rag/default: Acc = 0.4730, EM = 0.1166, F1 = 0.1752, P = 0.1585, R = 0.2452

Here is some results I have. The Acc and Recall are close to the table in the paper, but the EM and Precision are too low.
Do you have any suggestion for this issue? For example, do we need limit the output length prompts cause the generated answer is redundant now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The EM and Precision are too low. #78

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

The EM and Precision are too low. #78

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions