Skip to content

ScaleRL needs FP32 on the inference side too #19

@hallerite

Description

@hallerite

#16 implements the first step to get ScaleRL-like stability improvements by upscaling the logits to FP32 on the training backend, but we still need this from the inference backend, too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions