Reproduction issue: Nemotron-8B always calls transfer_to_human_agents and never calls call_expert; unable to reproduce numbers in the paper

Hi, this is an interesting work! I was wondering if you could share more details about the configuration used to obtain the Tau2Bench results (e.g., settings, prompts, or domains evaluated).

I’ve tried reproducing the results on the airline, retail, and telecom domains, but I consistently observe that the model always calls `transfer_to_human_agents` and never invokes `call_expert` to trigger the available expert models. I therefore cannot reproduce the numbers in the paper and there is a large gap. 

Could you help clarify what might be missing or misconfigured on my side? Or could you release the full trace directly?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduction issue: Nemotron-8B always calls transfer_to_human_agents and never calls call_expert; unable to reproduce numbers in the paper #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reproduction issue: Nemotron-8B always calls transfer_to_human_agents and never calls call_expert; unable to reproduce numbers in the paper #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions