Skip to content

Add AlignScore metric (ACL 2023) #335

@vkehfdl1

Description

@vkehfdl1

Description

Implement AlignScore (Zha et al., ACL 2023) as a generation evaluation metric.

What it measures

Factual consistency / faithfulness of generated text to source context.

How it works

  • RoBERTa-based model trained on 15 datasets (4.7M examples): NLI, fact verification, QA, paraphrase
  • Splits generated text into chunks, scores each against source, aggregates
  • Outperforms GPT-3.5-based evaluation on SummaC and TRUE benchmarks
  • Fully deterministic, no LLM-as-judge

Why

  • Replaces RAGAS Faithfulness with academically rigorous alternative
  • Reference-free (evaluates against retrieved context only)
  • NeurIPS 2026 ED Track submission

Reference

Zha et al., "AlignScore: Evaluating Factual Consistency with a Unified Alignment Function," ACL 2023

Metadata

Metadata

Assignees

No one assigned

    Labels

    New MetricAdding new metric supportneurips-2026NeurIPS 2026 ED Track submission priority

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions