[Question] Missing multi-turn rollout / custom_repeat_by_counts implementation for RLVER reproduction

Hi authors,

Thank you for releasing this codebase and for the great work on RLVER: *Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents* (ICLR 2026 submission).

I am trying to reproduce the RLVER PPO experiments using this repo, but I ran into an inconsistency between the paper and the released code, and I would really appreciate some clarification.

---

### 1. Multi-turn rollout & `custom_repeat_by_counts`

In the RL training loop, there is a version of `fit()` (in `ray_trainer_think.py`) that uses multi-turn rollouts and operates on per-sample dialogue turn counts:

turn_count_list = gen_batch_output.non_tensor_batch['dialogue_turns'].tolist()
print("turn_count_list", turn_count_list)
consolidated_turn_count = []
current_count = turn_count_list[0]
remaining = current_count
for count in turn_count_list:
    if count == current_count and remaining > 0:
        remaining -= 1
    else:
        consolidated_turn_count.append(current_count)
        current_count = count
        remaining = current_count - 1

if remaining == 0:
    consolidated_turn_count.append(current_count)

batch = batch.repeat(repeat_times=self.config.actor_rollout_ref.rollout.n, interleave=True)
batch = batch.custom_repeat_by_counts(repeat_counts=consolidated_turn_count, interleave=True)
batch = batch.union(gen_batch_output)
However, in the DataProto class shipped in this repo, there is no custom_repeat_by_counts method, so running this version of fit() raises:

AttributeError: 'DataProto' object has no attribute 'custom_repeat_by_counts'
At the same time, there is another simpler version of fit() which only does:

batch = batch.repeat(repeat_times=self.config.actor_rollout_ref.rollout.n, interleave=True)
batch = batch.union(gen_batch_output)
self._balance_batch(batch, metrics=metrics)
This version does not use dialogue_turns or per-sample repeat counts, and seems more like the standard VERL PPO loop (without the multi-turn alignment logic described in the RLVER paper).

2. My questions
To faithfully reproduce the RLVER results on multi-turn conversations, I’m not sure which implementation I should follow, and what the intended behavior is. Concretely, could you please clarify:

Where is DataProto.custom_repeat_by_counts defined?

Is there an internal / modified verl version that includes this method, which hasn’t been pushed to this repo yet?

If possible, could you share the implementation or the intended semantics of custom_repeat_by_counts (e.g., how it uses consolidated_turn_count to expand the batch)?

Which fit() implementation corresponds to the RLVER paper experiments?

Is the RLVER paper using the multi-turn code path that relies on dialogue_turns + custom_repeat_by_counts?

Or are the released results based on the simpler batch.repeat(...); _balance_batch(...) version?

Multi-turn reward shaping details
Since RLVER uses multi-turn simulated users and turn-level emotion scores, it would be very helpful if you could confirm:

How the dialogue is grouped into episodes when dialogue_turns varies per sample;

How the per-turn rewards are aggregated / aligned with the PPO rollout n (e.g., when rollout.n=1 vs n>1).

3. Minimal reproduction
Task: RLVER PPO training with multi-turn environment (vllm_multi_turn_via_chat)

Model: e.g., Qwen/Qwen2.5-3B-Instruct

Code path: use the fit() version that calls batch.custom_repeat_by_counts(...)

Error:

AttributeError: 'DataProto' object has no attribute 'custom_repeat_by_counts'
If I instead switch to the simpler fit() version (without custom_repeat_by_counts), the code runs, but then I’m not confident that this matches the multi-turn rollout and reward handling described in the RLVER paper.

Thank you very much for your time, and for any guidance you can provide. Having these details (or the missing helper function) would greatly help the community to reproduce and build upon RLVER.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] Missing multi-turn rollout / custom_repeat_by_counts implementation for RLVER reproduction #25

1. Multi-turn rollout & `custom_repeat_by_counts`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Missing multi-turn rollout / custom_repeat_by_counts implementation for RLVER reproduction #25

Description

1. Multi-turn rollout & custom_repeat_by_counts

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Multi-turn rollout & `custom_repeat_by_counts`