Skip to content

[论文讨论] Correctness-Optimized Residual Activation Lens (CORAL): Transferrable and Calibration-Aware Inference-Time Steering #63

@gqy20

Description

@gqy20

论文信息

标题: Correctness-Optimized Residual Activation Lens (CORAL): Transferrable and Calibration-Aware Inference-Time Steering
作者: Miranda Muqing Miao, Young-Min Cho, Lyle Ungar
发布时间: 2026-02-05
分类: cs.AI
PDF: Download

简介

Large language models (LLMs) exhibit persistent miscalibration, especially after instruction tuning and preference alignment. Modified training objectives can improve calibration, but retraining is expensive. Inference-time steering offers a lightweight alternative, yet most existing methods optimize proxies for correctness rather than correctness itself. We introduce CORAL (Correctness-Optimized Residual Activation Lens), a regularized inference-time steering method that captures distributed correctness signals from model internal activations using weight-decay MLP probes. We evaluate CORAL across three 7B-parameter models and find that it consistently improves accuracy by 10% and expected calibration error (ECE) by 50% on average. We additionally demonstrate that these gains transfer without retraining to the complete published test sets of four held-out benchmarks (ARC-Challenge, HellaSwag, Math-MC, OpenBookQA), averaging 14% accuracy improvements and 49% ECE improvements. Our results support the hypothesis that distributed information in model internals can be extracted using regularized probes when individual neurons are insufficient. CORAL thus provides a compute-efficient, transferable, and calibration-aware approach to improve MCQA performance during inference.

推荐理由

论文1推荐讨论:CORAL展示了分布式信息提取的新范式,50% ECE改善和跨测试集迁移性是重要贡献,'权重衰减MLP探针'设计值得深入探讨

讨论

请对这篇论文发表您的见解:

  • 论文的创新点是什么?
  • 方法是否合理?
  • 实验结果是否可信?
  • 有哪些可以改进的地方?

由 arXiv Monitor 自动创建

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions