Skip to content

Reproduction issue: Nemotron-8B always calls transfer_to_human_agents and never calls call_expert; unable to reproduce numbers in the paper #21

@jiayuww

Description

@jiayuww

Hi, this is an interesting work! I was wondering if you could share more details about the configuration used to obtain the Tau2Bench results (e.g., settings, prompts, or domains evaluated).

I’ve tried reproducing the results on the airline, retail, and telecom domains, but I consistently observe that the model always calls transfer_to_human_agents and never invokes call_expert to trigger the available expert models. I therefore cannot reproduce the numbers in the paper and there is a large gap.

Could you help clarify what might be missing or misconfigured on my side? Or could you release the full trace directly?

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions