Repetition and shuffling on vllm rollout

Thanks for the great work! Noticed that the samples are reordered to distribute across the GPUs to reduce the long-tailed effect of imbalance across GPUs, so compared to the original verl there is a repeat without interleave [here](https://github.com/SkyworkAI/Skywork-OR1/blob/64e96afa213ae89d0ad21932106d3b8aafe9ace2/verl/trainer/ppo/ray_trainer.py#L971).

However, my question is, the original rollout.n was passed to vllm rollout, and I don't seem to find places that set this to 1. Wouldn't this extra repeat cause each prompt to be evaluated by n * n times? The vllm config was taken from config [here](https://github.com/SkyworkAI/Skywork-OR1/blob/64e96afa213ae89d0ad21932106d3b8aafe9ace2/verl/workers/rollout/vllm_rollout/vllm_rollout.py#L122-L124).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repetition and shuffling on vllm rollout #48

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Repetition and shuffling on vllm rollout #48

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions