Problems Reproducing HippoRAG Results

Hi,

I am currently trying to reproduce the HippoRAG results reported in the paper using the MultihopRAG dataset.

However, I am observing significantly lower performance:
- Accuracy: ~42 (vs. 53 in the paper)
- Recall: ~21 (vs. 47 in the paper)

I have a few questions regarding reproducibility:

1. Is there a specific commit hash corresponding to the experiments reported in the paper?

2. Could you clarify which hyperparameters in `HippoRAG.yaml` were used for the reported results?
   - I first ran it without changing the hyperparameters, then I set:
     - `llm_model_max_token_size = 8000`
     - `top_k = 4` (for all three parameters)
     - other parameters unchanged
   - Both lead to very similar (low) results for me. 

3. My generated knowledge graph has ~22k nodes and ~15k edges,
   while the preprint reports ~35,953 nodes and ~37,173 edges.
   - Were those numbers obtained using Llama-3-8B, or a different model?

Any clarification would be greatly appreciated!

Thanks for making the code available.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems Reproducing HippoRAG Results #87

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Problems Reproducing HippoRAG Results #87

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions