Add FEVER Metric for Fact Verification Evaluation #710

mkhi238 · 2025-10-09T06:05:16Z

Description

This PR adds the FEVER (Fact Extraction and VERification) metric to the evaluate library.

What does this metric do?

The FEVER metric evaluates fact verification systems against evidence retrieved from Wikipedia. It includes:

Label accuracy: Measures how often predicted labels match gold labels
FEVER score: The official metric - requires both correct label AND complete evidence retrieval
Evidence precision/recall/F1: Micro-averaged metrics for evidence retrieval

Checklist

Implementation in fever.py
Comprehensive test suite with 12 tests (all passing)
README.md with metric card and examples
Gradio app for interactive testing
Code formatted with black and isort
All tests pass

Tests:

All tests pass successfully:

test_fever.py
............
----------------------------------------------------------------------
Ran 12 tests in 0.142s

OK

References

FEVER Paper: https://arxiv.org/abs/1803.05355
FEVER Dataset: https://fever.ai/dataset/

Note: This is my first contribution to the evaluate library. I'd appreciate any feedback or suggestions for improvement!

lhoestq

lgtm !

lhoestq · 2025-11-14T14:53:01Z

the metric is available now, with its page at https://huggingface.co/spaces/evaluate-metric/fever :)

mkhi238 added 2 commits October 9, 2025 01:01

Add FEVER metric for fact verification

58f75ad

Apply code formatting with black and isort

dfc1480

lhoestq approved these changes Nov 14, 2025

View reviewed changes

lhoestq merged commit 9e0a446 into huggingface:main Nov 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add FEVER Metric for Fact Verification Evaluation #710

Add FEVER Metric for Fact Verification Evaluation #710

Uh oh!

mkhi238 commented Oct 9, 2025 •

edited

Loading

Uh oh!

lhoestq left a comment

Uh oh!

lhoestq commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add FEVER Metric for Fact Verification Evaluation #710

Add FEVER Metric for Fact Verification Evaluation #710

Uh oh!

Conversation

mkhi238 commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

What does this metric do?

Checklist

Tests:

References

Uh oh!

lhoestq left a comment

Choose a reason for hiding this comment

Uh oh!

lhoestq commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mkhi238 commented Oct 9, 2025 •

edited

Loading