-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Responses from the training engines include templates and CoT. OpenClaw does not filter this out for every model, resulting in unwanted text in responses. To address this, the inference backends strip CoT and templates. But for training we need exact-token rollouts to remain on-policy. To address this, we have a hacky cache in the inference backends that retrieve the unfiltered responses from the filtered ones. We should build a more robust approach compared to this hack.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels