Skip to content

Conversation

@befelix
Copy link

@befelix befelix commented Jan 16, 2026

Description

Currently the replay buffer only works for Episode inputs, which makes it rather inflexible for downstream adaptations of torchforge. This PR proposes to make the ReplayBuffer generic in terms of what data is stored.

The main downside of this generality is that one needs to explicitly specify the policy age when adding data.

Test plan

n/a

Currently the replay buffer only works for `Episode` inputs, which makes it rather inflexible for downstream use-cases. This PR proposes to make the `ReplayBuffer` generic in terms of what data is stored.

The main downside of this generality is that one needs to explicitly specify the policy age when adding data.
@meta-cla
Copy link

meta-cla bot commented Jan 16, 2026

Hi @befelix!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 16, 2026
@meta-cla
Copy link

meta-cla bot commented Jan 16, 2026

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@felipemello1
Copy link
Contributor

felipemello1 commented Jan 16, 2026

thanks for the PR! This is good feedback. I need to look into it a bit more in detail, but i imagine that this would also require changes in the collate fn. I am working on refactoring some components, including the collate and episode class. Do you mind if i get back to you next week?

@felipemello1 felipemello1 self-assigned this Jan 16, 2026
@befelix
Copy link
Author

befelix commented Jan 16, 2026

I need to look into it a bit more in detail, but i imagine that this would also require changes in the collate fn.

No rush, but at least in the current PR the collate for the GRPO app should be the same; it only adds the flexibility to do something else.

I am working on refactoring some components, including the collate and episode class.

Regarding collates; we also have a custom implementation of both that remove torchforge's left-padding to reduce log-prob mismatches between vllm and torch titan (at least for the default qwen models). This is also something we could upstream if you'd like, but it's a larger change since log_prob extraction doesn't work as conveniently anymore. Not sure if you'd want this or would prefer to directly switch to sequence packing.

@befelix befelix marked this pull request as ready for review January 16, 2026 17:19
@felipemello1
Copy link
Contributor

felipemello1 commented Jan 16, 2026

Thanks @befelix ! I became aware yesterday that we are doing left padding -.-`. Working on it now. Also, fyi,
fixed this today: #706
And am going to merge our loss PR soon: #699

Since you guys are doing so much work with Forge, it would be nice to touch bases and gather some feedback, see what we can prioritize to help you guys. (its fine if you guys are busy and cant at the moment). Just let me know and I can shoot you an email and schedule some 30min meeting

@felipemello1
Copy link
Contributor

Not sure if you'd want this or would prefer to directly switch to sequence packing.

I am a big fan of sequence packing, but there are so many other bottlenecks that we haven't been able to get there yet. We will though.

@befelix
Copy link
Author

befelix commented Jan 16, 2026

It would be nice to chat. Can you ping me at felix.berkenkamp@aleph-alpha-research.com to schedule a call?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants