Skip to content

Commit aa86181

Browse files
authored
Fix moe (#58)
* update load * update * update * update * update * update * update * update * update * update * fix lint * update * update * update * update * update * update * update * fix lint * update * fix * update
1 parent d2c0ab7 commit aa86181

File tree

32 files changed

+1180
-942
lines changed

32 files changed

+1180
-942
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,3 +152,4 @@ megatron_output/
152152
ast_index_file.py
153153
test_cookbook/
154154
/test*.py
155+
swanlog/

.pre-commit-config.yaml

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -22,23 +22,23 @@ repos:
2222
hooks:
2323
- id: pyupgrade
2424
args: [--py38-plus]
25-
exclude: ^client_tools/
25+
exclude: ^(examples/|cookbook/|client_tools/|src/twinkle_client/)
2626

2727
- repo: https://github.com/pre-commit/pre-commit-hooks
2828
rev: v6.0.0
2929
hooks:
3030
- id: trailing-whitespace
31-
exclude: ^client_tools/
31+
exclude: ^(client_tools/|src/twinkle_client/)
3232
- id: check-yaml
33-
exclude: ^client_tools/
33+
exclude: ^(client_tools/|src/twinkle_client/)
3434
- id: end-of-file-fixer
35-
exclude: ^client_tools/
35+
exclude: ^(client_tools/|src/twinkle_client/)
3636
- id: requirements-txt-fixer
37-
exclude: ^client_tools/
37+
exclude: ^(client_tools/|src/twinkle_client/)
3838
- id: double-quote-string-fixer
39-
exclude: ^client_tools/
39+
exclude: ^(client_tools/|src/twinkle_client/)
4040
- id: check-merge-conflict
41-
exclude: ^client_tools/
41+
exclude: ^(client_tools/|src/twinkle_client/)
4242
- id: mixed-line-ending
4343
args: ["--fix=lf"]
44-
exclude: ^client_tools/
44+
exclude: ^(client_tools/|src/twinkle_client/)

cookbook/client/tinker/grpo.py

Lines changed: 0 additions & 278 deletions
This file was deleted.

cookbook/client/tinker/megatron/server_config.yaml

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,9 @@ applications:
5656
device_mesh:
5757
device_type: cuda
5858
dp_size: 4
59+
queue_config:
60+
rps_limit: 20 # Max requests per second
61+
tps_limit: 10000 # Max tokens per second
5962
deployments:
6063
- name: SamplerManagement
6164
autoscaling_config:
@@ -77,7 +80,9 @@ applications:
7780
args:
7881
use_megatron: true # Use HuggingFace Transformers backend
7982
model_id: "ms://Qwen/Qwen3-30B-A3B-Instruct-2507" # ModelScope model identifier
80-
nproc_per_node: 4 # Number of GPU processes per node
83+
max_length: 10240 # model max length
84+
max_loras: 5 # model max loras
85+
nproc_per_node: 4 # Number of GPU processes per node
8186
device_group:
8287
name: model
8388
ranks: [4,5,6,7] # GPU rank indices
@@ -88,11 +93,12 @@ applications:
8893
ep_size: 2
8994

9095
queue_config:
91-
rps_limit: 100 # Max requests per second
92-
tps_limit: 100000 # Max tokens per second
96+
rps_limit: 20 # Max requests per second
97+
tps_limit: 10000 # Max tokens per second
9398
adapter_config:
94-
per_token_adapter_limit: 30 # Max concurrent LoRA adapters
95-
adapter_timeout: 1800 # Seconds before idle adapter unload
99+
per_token_adapter_limit: 3 # Max concurrent LoRA adapters
100+
adapter_timeout: 30 # Seconds before idle adapter unload
101+
adapter_max_lifetime: 36000 # Maximum lifetime of an adapter in seconds (e.g., 10 hours)
96102
deployments:
97103
- name: ModelManagement
98104
autoscaling_config:

cookbook/client/tinker/megatron/server_config_7b.yaml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,10 +50,12 @@ applications:
5050
dp_size: 2
5151
queue_config:
5252
rps_limit: 100 # Max requests per second
53-
tps_limit: 100000 # Max tokens per second
53+
tps_limit: 10000 # Max tokens per second for a single user
54+
max_input_tokens: 10000 # Maximum input tokens per request
5455
adapter_config:
55-
per_token_adapter_limit: 30 # Max concurrent LoRA adapters
56-
adapter_timeout: 1800 # Seconds before idle adapter unload
56+
adapter_timeout: 30 # Seconds before idle adapter unload
57+
adapter_max_lifetime: 36000 # Maximum lifetime of an adapter in seconds (e.g., 10 hours)
58+
per_token_adapter_limit: 30
5759
deployments:
5860
- name: ModelManagement
5961
autoscaling_config:

0 commit comments

Comments
 (0)