Skip to content

Add BARTScore metric (NeurIPS 2021) #336

@vkehfdl1

Description

@vkehfdl1

Description

Implement BARTScore (Yuan et al., NeurIPS 2021) as a generation evaluation metric.

What it measures

Directional semantic evaluation: faithfulness, precision, recall, F1.

How it works

  • Uses pretrained BART model to compute log-probability of one text given another
  • Three configurations:
    • BARTScore(context → answer): faithfulness to retrieved context
    • BARTScore(reference → answer): precision
    • BARTScore(answer → reference): recall
  • Information-theoretic grounding (conditional log-likelihood)
  • Deterministic, no LLM-as-judge

Why

  • Published at NeurIPS — strong venue credibility
  • Provides directional evaluation (faithfulness vs precision vs recall)
  • Complements AlignScore (NLI-based) with probability-based approach → triangulation
  • NeurIPS 2026 ED Track submission

Reference

Yuan et al., "BARTScore: Evaluating Generated Text as Text Generation," NeurIPS 2021

Metadata

Metadata

Assignees

No one assigned

    Labels

    New MetricAdding new metric supportneurips-2026NeurIPS 2026 ED Track submission priority

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions