Skip to content

Commit c98fee9

Browse files
authored
Add rl example (#49)
* update sampler * update sampler * updat * update sampler * update cpu env * update compat * update * update * update * update * update server * fix * fix * fix smaple * fix smaple * update * update * update server
1 parent 197cd2a commit c98fee9

File tree

20 files changed

+533
-187
lines changed

20 files changed

+533
-187
lines changed

cookbook/client/tinker/transformer/grpo.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,14 +34,14 @@
3434
logger = get_logger()
3535

3636
# ========== Configuration ==========
37-
BASE_MODEL = 'Qwen/Qwen2.5-0.5B-Instruct'
37+
BASE_MODEL = 'Qwen/Qwen2.5-3B-Instruct'
3838
NUM_GENERATIONS = 4
3939
MAX_NEW_TOKENS = 1024
4040
LEARNING_RATE = 1e-5
41-
MAX_STEPS = 10
42-
BATCH_SIZE = 1
41+
MAX_STEPS = 100
42+
BATCH_SIZE = 2
4343
TEMPERATURE = 1.0
44-
SYNC_INTERVAL = 5 # Save weights for sampler every N steps
44+
SYNC_INTERVAL = 2 # Save weights for sampler every N steps
4545
LORA_RANK = 8
4646

4747

0 commit comments

Comments
 (0)