Skip to content

The EM and Precision are too low. #78

@rockcor

Description

@rockcor

gpt-4.1-mini/quality/Dalk: Accuracy = 0.3936
gpt-4.1-mini/Popqa/RAPTOR: Acc = 0.0222, EM = 0.0000, F1 = 0.0189, P = 0.0143, R = 0.0556
gpt-4o/multihop-rag/RAPTOR: Acc = 0.5814, EM = 0.0012, F1 = 0.0263, P = 0.0146, R = 0.3602
gpt-4o/multihop-rag/Dalk: Acc = 0.6491, EM = 0.0814, F1 = 0.1258, P = 0.1080, R = 0.3347
gpt-4o/multihop-rag/HippoRAG: Acc = 0.6463, EM = 0.0000, F1 = 0.0213, P = 0.0111, R = 0.3550
gpt-4o/quality/RAPTOR: Accuracy = 0.4752
gpt-4-turbo/multihop-rag/default: Acc = 0.4730, EM = 0.1166, F1 = 0.1752, P = 0.1585, R = 0.2452

Here is some results I have. The Acc and Recall are close to the table in the paper, but the EM and Precision are too low.
Do you have any suggestion for this issue? For example, do we need limit the output length prompts cause the generated answer is redundant now.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions