Major update #48

lintangsutawika · 2025-12-28T19:21:21Z

More rewards (include cosine rewards)
Handle both step-wise and non-step-wise training
Add non-thinking
maintain max length but train on shorter (adjustable) train length. This is so that rollouts length and training can be decoupled.

Major update branch with cherry-picked custom finish tool

lintangsutawika added 30 commits December 28, 2025 13:56

add utility to truncate prompts

84ab1f4

add swe-smith option

8f60b52

adjust so trajectories can be saved to gcs

337ee36

add step_wise flag

2ee6bd1

process both step wise or non-step wise. Mask non-step wise

c7c0e49

fix working_dir

4601e0c

fix reward return

06e10b1

add max_train_length

767ae8a

apply patch for swe-smith

5bc750f

add additional args

f648708

many different types of rewards

a6773b2

cosine_rewards with option to turn off parts

ec0ce46

fix turn count

86646ea

fix turn count

3277171

add length reward

ec86b20

hf checkpoint every 10

a78bc49

add format reward

e07cb9c

temporarily hardcode tools

c2b6bd4

latest version

f22d1a7

add templates

fe05c86

add starter for no think

fc338ef

temporariliy remove tool

f89fcf7

rename

ff7c5d0

removed unused lines

5de22ae

update message

5a5648f

add weight calculation for reward

fc04637

fix padding

97bc1d8

add new config

01568bc

checkpoint for working system

b1c4376

add weighted f05

97efad2

lintangsutawika and others added 4 commits January 20, 2026 21:58

adjust esp

57c2b37

code with instruct-style masking logic

de73b60

code with Lintang's original masking logic

0cd54c1

Merge pull request #63 from OpenHands/major-update-custom-finish

9a8c0dd

Major update branch with cherry-picked custom finish tool

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major update #48

Major update #48

Uh oh!

lintangsutawika commented Dec 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Major update #48

Are you sure you want to change the base?

Major update #48

Uh oh!

Conversation

lintangsutawika commented Dec 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants