Add AlignScore metric (ACL 2023)

## Description
Implement AlignScore (Zha et al., ACL 2023) as a generation evaluation metric.

## What it measures
Factual consistency / faithfulness of generated text to source context.

## How it works
- RoBERTa-based model trained on 15 datasets (4.7M examples): NLI, fact verification, QA, paraphrase
- Splits generated text into chunks, scores each against source, aggregates
- **Outperforms GPT-3.5-based evaluation** on SummaC and TRUE benchmarks
- Fully deterministic, no LLM-as-judge

## Why
- Replaces RAGAS Faithfulness with academically rigorous alternative
- Reference-free (evaluates against retrieved context only)
- NeurIPS 2026 ED Track submission

## Reference
Zha et al., "AlignScore: Evaluating Factual Consistency with a Unified Alignment Function," ACL 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AlignScore metric (ACL 2023) #335

Description

What it measures

How it works

Why

Reference

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add AlignScore metric (ACL 2023) #335

Description

Description

What it measures

How it works

Why

Reference

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions