Skip to content

Commit 5cba3a1

Browse files
Fix router (#69)
1 parent ce24a2a commit 5cba3a1

File tree

22 files changed

+110
-74
lines changed

22 files changed

+110
-74
lines changed

ROADMAP.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@
6565
- [ ] 支持DPO对齐训练
6666
- [ ] 支持colocate RL训练
6767
- [ ] Preprocess支持batched
68+
- [ ] 对多replica的支持和粘滞路由
6869

6970
### 网络能力
7071

@@ -84,5 +85,6 @@
8485
- [ ] Support for DPO alignment training
8586
- [ ] Support for colocate RL training
8687
- [ ] Support for batched preprocessing
88+
- [ ] Support for multiple replicas and sticky routing
8789

8890
### Networking Capabilities

cookbook/client/tinker/megatron/server_config.yaml

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,6 @@ applications:
3333
runtime_env:
3434
env_vars:
3535
TWINKLE_TRUST_REMOTE_CODE: "0"
36-
DEVICE_COUNT_PER_PHYSICAL_NODE: "8"
3736

3837
# 3. Sampler Service - Runs inference / sampling using vLLM engine
3938
# Used for generating text from the model (e.g., evaluating LoRA results).
@@ -52,7 +51,7 @@ applications:
5251
device_group: # Logical device group for the sampler
5352
name: sampler
5453
gpus_per_worker: 1
55-
ranks: [0,1,2,3] # GPU rank indices to use
54+
ranks: 4 # GPU rank indices to use
5655
device_type: cuda
5756
device_mesh:
5857
device_type: cuda
@@ -71,7 +70,6 @@ applications:
7170
runtime_env:
7271
env_vars:
7372
TWINKLE_TRUST_REMOTE_CODE: "0"
74-
DEVICE_COUNT_PER_PHYSICAL_NODE: "8"
7573

7674
# 2. Model Service (commented out) - Would host the base model for training.
7775
# Uncomment and configure if you need a training model worker.
@@ -86,7 +84,7 @@ applications:
8684
nproc_per_node: 4 # Number of GPU processes per node
8785
device_group:
8886
name: model
89-
ranks: [4,5,6,7] # GPU rank indices
87+
ranks: 4 # GPU rank indices
9088
device_type: cuda
9189
device_mesh:
9290
device_type: cuda
@@ -111,4 +109,3 @@ applications:
111109
runtime_env:
112110
env_vars:
113111
TWINKLE_TRUST_REMOTE_CODE: "0"
114-
DEVICE_COUNT_PER_PHYSICAL_NODE: "8"

cookbook/client/tinker/megatron/server_config_7b.yaml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,6 @@ applications:
6767
runtime_env:
6868
env_vars:
6969
TWINKLE_TRUST_REMOTE_CODE: "0"
70-
DEVICE_COUNT_PER_PHYSICAL_NODE: "8"
7170

7271
# 3. Sampler Service - Runs inference / sampling using vLLM engine
7372
# Used for generating text from the model (e.g., evaluating LoRA results).
@@ -104,4 +103,3 @@ applications:
104103
runtime_env:
105104
env_vars:
106105
TWINKLE_TRUST_REMOTE_CODE: "0"
107-
DEVICE_COUNT_PER_PHYSICAL_NODE: "8"

cookbook/client/tinker/self_congnition.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ def train():
4444

4545
# Connect to the Twinkle server running locally
4646
service_client = init_tinker_compat_client(
47-
base_url='http://www.modelscope.cn/twinkle', api_key=os.environ.get('MODELSCOPE_TOKEN'))
47+
base_url='localhost:9000', api_key=os.environ.get('MODELSCOPE_TOKEN'))
4848

4949
# Create a LoRA training client for the base model (rank=16 for the LoRA adapter)
5050
training_client = service_client.create_lora_training_client(base_model=base_model, rank=16)
@@ -68,9 +68,10 @@ def train():
6868
optim_result = optim_future.result()
6969

7070
# Compute weighted average log-loss per token for monitoring
71-
logprobs = np.concatenate([output['logprobs'].tolist() for output in fwdbwd_result.loss_fn_outputs])
72-
weights = np.concatenate([example.loss_fn_inputs['weights'].tolist() for example in input_datum])
73-
print(f'Loss per token: {-np.dot(logprobs, weights) / weights.sum():.4f}')
71+
# logprobs = np.concatenate([output['logprobs'].tolist() for output in fwdbwd_result.loss_fn_outputs])
72+
# weights = np.concatenate([example.loss_fn_inputs['weights'].tolist() for example in input_datum])
73+
# print(f'Loss per token: {-np.dot(logprobs, weights) / weights.sum():.4f}')
74+
print(f'Training Metrics: {optim_result}')
7475

7576
# Save a checkpoint after each epoch
7677
save_future = training_client.save_state(f'twinkle-lora-{epoch}')
@@ -85,7 +86,7 @@ def eval():
8586
weight_path = 'twinkle://20260212_174205-Qwen_Qwen2_5-7B-Instruct-51edc9ed/weights/twinkle-lora-2'
8687

8788
# Connect to the server and create a sampling client with the trained weights
88-
service_client = init_tinker_compat_client(base_url='http://localhost:8000')
89+
service_client = init_tinker_compat_client(base_url='http://localhost:9000')
8990
sampling_client = service_client.create_sampling_client(model_path=weight_path, base_model=base_model)
9091

9192
# Step 2: Prepare the chat prompt

cookbook/client/tinker/transformer/server_config.yaml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,6 @@ applications:
6565
runtime_env:
6666
env_vars:
6767
TWINKLE_TRUST_REMOTE_CODE: "0"
68-
DEVICE_COUNT_PER_PHYSICAL_NODE: "8"
6968

7069
# 3. Sampler Service - Runs inference / sampling using vLLM engine
7170
# Used for generating text from the model (e.g., evaluating LoRA results).
@@ -102,4 +101,3 @@ applications:
102101
runtime_env:
103102
env_vars:
104103
TWINKLE_TRUST_REMOTE_CODE: "0"
105-
DEVICE_COUNT_PER_PHYSICAL_NODE: "8"

cookbook/client/twinkle/transformer/server_config.yaml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,6 @@ applications:
6161
runtime_env:
6262
env_vars:
6363
TWINKLE_TRUST_REMOTE_CODE: "0"
64-
DEVICE_COUNT_PER_PHYSICAL_NODE: "8"
6564

6665
# 3. Processor Service - Handles data preprocessing on CPU
6766
# Runs tokenization, template application, and other CPU-bound tasks.
@@ -90,7 +89,6 @@ applications:
9089
runtime_env:
9190
env_vars:
9291
TWINKLE_TRUST_REMOTE_CODE: "0"
93-
DEVICE_COUNT_PER_PHYSICAL_NODE: "8"
9492

9593
# 4. Sampler Service - Handles text generation inference
9694
# Uses vLLM for efficient batched generation with optional LoRA adapters.
@@ -125,4 +123,3 @@ applications:
125123
runtime_env:
126124
env_vars:
127125
TWINKLE_TRUST_REMOTE_CODE: "0"
128-
DEVICE_COUNT_PER_PHYSICAL_NODE: "8"

docs/source_en/Usage Guide/Server and Client/Server.md

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -55,12 +55,9 @@ This configuration starts 3 nodes:
5555
Before starting the Server, you need to set the following environment variables:
5656

5757
```bash
58-
export DEVICE_COUNT_PER_PHYSICAL_NODE=8 # Specify the total number of GPUs on each physical machine
5958
export TWINKLE_TRUST_REMOTE_CODE=0 # Whether to trust remote code (security consideration)
6059
```
6160

62-
> **Important Note**: `DEVICE_COUNT_PER_PHYSICAL_NODE` must be set to the actual number of physical GPUs on the machine, which is crucial for correctly parsing the `ranks` configuration.
63-
6461
### Node Rank in YAML Configuration
6562

6663
In the YAML configuration file, **each component needs to occupy a separate Node**.
@@ -117,7 +114,6 @@ applications:
117114
**Important notes:**
118115
- The `ranks` configuration uses **physical GPU card numbers**, directly corresponding to the actual GPU devices on the machine
119116
- The `device_mesh` configuration uses parameters like `dp_size`, `tp_size`, `pp_size`, `ep_size` instead of the original `mesh` and `mesh_dim_names`
120-
- The environment variable `DEVICE_COUNT_PER_PHYSICAL_NODE` must be set to inform the system of the total number of physical GPUs on each machine
121117
- Different components will be automatically assigned to different Nodes
122118
- Ray will automatically schedule to the appropriate Node based on resource requirements (`num_gpus`, `num_cpus` in `ray_actor_options`)
123119

@@ -393,7 +389,6 @@ applications:
393389
num_cpus: 0.1
394390
runtime_env:
395391
env_vars:
396-
DEVICE_COUNT_PER_PHYSICAL_NODE: "8" # Total number of physical GPUs on each machine
397392
398393
# 3. Sampler service (optional, for inference sampling)
399394
- name: sampler-Qwen2.5-0.5B-Instruct
@@ -425,7 +420,6 @@ applications:
425420
num_gpus: 1 # Sampler needs independent GPU
426421
runtime_env:
427422
env_vars:
428-
DEVICE_COUNT_PER_PHYSICAL_NODE: "8" # Total number of physical GPUs on each machine
429423
```
430424

431425
## Configuration Item Description
@@ -471,6 +465,5 @@ device_mesh:
471465
**Environment variables:**
472466

473467
```bash
474-
export DEVICE_COUNT_PER_PHYSICAL_NODE=8 # Total number of GPUs on each physical machine (must be set)
475468
export TWINKLE_TRUST_REMOTE_CODE=0 # Whether to trust remote code
476469
```

docs/source_en/Usage Guide/Server and Client/Tinker-Compatible-Client.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ for item in service_client.get_server_capabilities().supported_models:
2525
When calling `init_tinker_compat_client`, the following operations are automatically executed:
2626

2727
1. **Patch Tinker SDK**: Bypass Tinker's `tinker://` prefix validation, allowing it to connect to standard HTTP addresses
28-
2. **Set Request Headers**: Inject necessary authentication headers such as `X-Ray-Serve-Request-Id` and `Authorization`
28+
2. **Set Request Headers**: Inject necessary authentication headers such as `serve_multiplexed_model_id` and `Authorization`
2929
3. **Return `ServiceClient`**: Returns a standard Tinker `ServiceClient` object, subsequent operations are completely identical to native Tinker
3030

3131
This means that after initialization, **all existing Tinker training code can be used directly** without any modifications.

docs/source_zh/使用指引/服务端和客户端/Tinker兼容客户端.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ for item in service_client.get_server_capabilities().supported_models:
2525
调用 `init_tinker_compat_client` 时,会自动执行以下操作:
2626

2727
1. **Patch Tinker SDK**:绕过 Tinker 的 `tinker://` 前缀校验,使其可以连接到标准 HTTP 地址
28-
2. **设置请求头**:注入 `X-Ray-Serve-Request-Id``Authorization` 等必要的认证头
28+
2. **设置请求头**:注入 `serve_multiplexed_model_id``Authorization` 等必要的认证头
2929
3. **返回 `ServiceClient`**:返回一个标准的 Tinker `ServiceClient` 对象,后续操作与原生 Tinker 完全一致
3030

3131
这意味着在初始化之后,**所有已有的 Tinker 训练代码都可以直接使用**,无需任何修改。

docs/source_zh/使用指引/服务端和客户端/服务端.md

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -55,12 +55,9 @@ ray start --address=10.28.252.9:6379 --num-gpus=0
5555
在启动 Server 之前,需要设置以下环境变量:
5656

5757
```bash
58-
export DEVICE_COUNT_PER_PHYSICAL_NODE=8 # 指定每台物理机上的 GPU 总数
5958
export TWINKLE_TRUST_REMOTE_CODE=0 # 是否信任远程代码(安全考虑)
6059
```
6160

62-
> **重要提示**`DEVICE_COUNT_PER_PHYSICAL_NODE` 必须设置为机器上实际的物理 GPU 数量,这对于正确解析 `ranks` 配置至关重要。
63-
6461
### YAML 配置中的 Node Rank
6562

6663
在 YAML 配置文件中,**每个组件需要占用一个独立的 Node**
@@ -117,7 +114,6 @@ applications:
117114
**重要提示:**
118115
- `ranks` 配置使用**物理 GPU 卡号**,直接对应机器上的实际 GPU 设备
119116
- `device_mesh` 配置使用 `dp_size`、`tp_size`、`pp_size`、`ep_size` 等参数替代原来的 `mesh` 和 `mesh_dim_names`
120-
- 必须设置环境变量 `DEVICE_COUNT_PER_PHYSICAL_NODE` 来告知系统每台机器的物理 GPU 总数
121117
- 不同组件会自动分配到不同的 Node 上
122118
- Ray 会根据资源需求(`ray_actor_options` 中的 `num_gpus`、`num_cpus`)自动调度到合适的 Node
123119

@@ -336,7 +332,6 @@ applications:
336332
num_cpus: 0.1
337333
runtime_env:
338334
env_vars:
339-
DEVICE_COUNT_PER_PHYSICAL_NODE: "8" # 每台机器的物理 GPU 总数
340335
341336
# 3. Sampler 服务(可选,用于推理采样)
342337
- name: sampler-Qwen2.5-0.5B-Instruct
@@ -368,7 +363,6 @@ applications:
368363
num_gpus: 1 # Sampler 需要独立 GPU
369364
runtime_env:
370365
env_vars:
371-
DEVICE_COUNT_PER_PHYSICAL_NODE: "8" # 每台机器的物理 GPU 总数
372366
```
373367

374368
## 配置项说明
@@ -414,6 +408,5 @@ device_mesh:
414408
**环境变量:**
415409

416410
```bash
417-
export DEVICE_COUNT_PER_PHYSICAL_NODE=8 # 每台物理机上的 GPU 总数(必须设置)
418411
export TWINKLE_TRUST_REMOTE_CODE=0 # 是否信任远程代码
419412
```

0 commit comments

Comments
 (0)