Generic Replay Buffer #705

befelix · 2026-01-16T10:15:28Z

Description

Currently the replay buffer only works for Episode inputs, which makes it rather inflexible for downstream adaptations of torchforge. This PR proposes to make the ReplayBuffer generic in terms of what data is stored.

The main downside of this generality is that one needs to explicitly specify the policy age when adding data.

Test plan

n/a

Currently the replay buffer only works for `Episode` inputs, which makes it rather inflexible for downstream use-cases. This PR proposes to make the `ReplayBuffer` generic in terms of what data is stored. The main downside of this generality is that one needs to explicitly specify the policy age when adding data.

meta-cla · 2026-01-16T10:15:33Z

Hi @befelix!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

meta-cla · 2026-01-16T14:18:19Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

felipemello1 · 2026-01-16T15:01:50Z

thanks for the PR! This is good feedback. I need to look into it a bit more in detail, but i imagine that this would also require changes in the collate fn. I am working on refactoring some components, including the collate and episode class. Do you mind if i get back to you next week?

befelix · 2026-01-16T17:18:08Z

I need to look into it a bit more in detail, but i imagine that this would also require changes in the collate fn.

No rush, but at least in the current PR the collate for the GRPO app should be the same; it only adds the flexibility to do something else.

I am working on refactoring some components, including the collate and episode class.

Regarding collates; we also have a custom implementation of both that remove torchforge's left-padding to reduce log-prob mismatches between vllm and torch titan (at least for the default qwen models). This is also something we could upstream if you'd like, but it's a larger change since log_prob extraction doesn't work as conveniently anymore. Not sure if you'd want this or would prefer to directly switch to sequence packing.

felipemello1 · 2026-01-16T17:47:02Z

Thanks @befelix ! I became aware yesterday that we are doing left padding -.-`. Working on it now. Also, fyi,
fixed this today: #706
And am going to merge our loss PR soon: #699

Since you guys are doing so much work with Forge, it would be nice to touch bases and gather some feedback, see what we can prioritize to help you guys. (its fine if you guys are busy and cant at the moment). Just let me know and I can shoot you an email and schedule some 30min meeting

felipemello1 · 2026-01-16T17:48:20Z

Not sure if you'd want this or would prefer to directly switch to sequence packing.

I am a big fan of sequence packing, but there are so many other bottlenecks that we haven't been able to get there yet. We will though.

befelix · 2026-01-16T18:00:45Z

It would be nice to chat. Can you ping me at felix.berkenkamp@aleph-alpha-research.com to schedule a call?

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 16, 2026

felipemello1 self-assigned this Jan 16, 2026

befelix marked this pull request as ready for review January 16, 2026 17:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic Replay Buffer #705

Generic Replay Buffer #705

befelix commented Jan 16, 2026

Uh oh!

meta-cla bot commented Jan 16, 2026

Uh oh!

meta-cla bot commented Jan 16, 2026

Uh oh!

felipemello1 commented Jan 16, 2026 •

edited

Loading

Uh oh!

befelix commented Jan 16, 2026

Uh oh!

felipemello1 commented Jan 16, 2026 •

edited

Loading

Uh oh!

felipemello1 commented Jan 16, 2026

Uh oh!

befelix commented Jan 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Generic Replay Buffer #705

Are you sure you want to change the base?

Generic Replay Buffer #705

Conversation

befelix commented Jan 16, 2026

Description

Test plan

Uh oh!

meta-cla bot commented Jan 16, 2026

Action Required

Process

Uh oh!

meta-cla bot commented Jan 16, 2026

Uh oh!

felipemello1 commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

befelix commented Jan 16, 2026

Uh oh!

felipemello1 commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

felipemello1 commented Jan 16, 2026

Uh oh!

befelix commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

felipemello1 commented Jan 16, 2026 •

edited

Loading

felipemello1 commented Jan 16, 2026 •

edited

Loading

befelix commented Jan 16, 2026 •

edited

Loading