Skip to content

Commit 5071196

Browse files
author
Yingda Chen
committed
Merge remote-tracking branch 'origin' into update
2 parents 09216a7 + eb91bbb commit 5071196

File tree

100 files changed

+3401
-1568
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

100 files changed

+3401
-1568
lines changed

.github/copilot-instructions.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,10 +40,11 @@ These instructions help AI agents work productively in this repo. Focus on concr
4040
- Initialize infra: `twinkle.initialize(mode='local', seed=42)`
4141
- Inspect device placement: call `twinkle.infra.get_device_placement()`.
4242
- **Ray Serve demo (HTTP services):**
43-
- Config and launcher: [cookbook/client/server.py](cookbook/client/server.py), [cookbook/client/server_config.yaml](cookbook/client/server_config.yaml)
43+
- Config and launcher: [cookbook/client/tinker/megatron/server.py](https://github.com/modelscope/twinkle/blob/main/cookbook/client/tinker/megatron/server.py), [cookbook/client/tinker/megatron/server_config.yaml](https://github.com/modelscope/twinkle/blob/main/cookbook/client/tinker/megatron/server_config.yaml)
4444
- Start:
45-
- `python cookbook/client/server.py`
46-
- Endpoints print on startup (default `localhost:8000`).
45+
- `cd cookbook/client/tinker/megatron`
46+
- `python server.py`
47+
- Endpoints print on startup (default `localhost:8000` or `https://www.modelscope.cn/twinkle`).
4748
- Model app binds `MultiLoraTransformersModel` and exposes routes like `/add_adapter_to_model`, `/forward`, `/calculate_loss`, etc. See [src/twinkle/server/twinkle/model.py](src/twinkle/server/twinkle/model.py).
4849
- **vLLM inference:** Use `VLLMEngine` with engine args; LoRA weight sync via `patch.vllm_lora_weights`. See [src/twinkle/sampler/vllm_engine.py](src/twinkle/sampler/vllm_engine.py).
4950

INSTALL_MEGATRON.sh

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
#!/bin/bash
2+
3+
# Installation script - We offer a script to install the megatron and vllm related dependencies,
4+
# which always occur error
5+
6+
set -e # Exit immediately on error
7+
8+
echo "=========================================="
9+
echo "Starting deep learning dependencies installation..."
10+
echo "=========================================="
11+
12+
# Detect GPU architecture from nvidia-smi
13+
echo ""
14+
echo "Detecting GPU architecture..."
15+
GPU_NAME=$(nvidia-smi --query-gpu=name --format=csv,noheader | head -n 1)
16+
echo "Detected GPU: $GPU_NAME"
17+
18+
# Map GPU name to CUDA architecture
19+
get_cuda_arch() {
20+
local gpu_name="$1"
21+
case "$gpu_name" in
22+
*H100*|*H200*|*H20*|*H800*)
23+
echo "9.0"
24+
;;
25+
*A100*|*A800*|*A30*)
26+
echo "8.0"
27+
;;
28+
*A10*|*A40*|*A16*|*A2*)
29+
echo "8.6"
30+
;;
31+
*L40*|*L4*|*Ada*|*RTX\ 40*|*RTX\ 50*)
32+
echo "8.9"
33+
;;
34+
*V100*)
35+
echo "7.0"
36+
;;
37+
*T4*)
38+
echo "7.5"
39+
;;
40+
*RTX\ 30*|*A6000*|*A5000*)
41+
echo "8.6"
42+
;;
43+
*RTX\ 20*)
44+
echo "7.5"
45+
;;
46+
*)
47+
echo "8.0;9.0" # Default fallback
48+
;;
49+
esac
50+
}
51+
52+
TORCH_CUDA_ARCH_LIST=$(get_cuda_arch "$GPU_NAME")
53+
export TORCH_CUDA_ARCH_LIST
54+
echo "Using CUDA architecture: $TORCH_CUDA_ARCH_LIST"
55+
56+
# Install latest base packages
57+
echo ""
58+
echo "Installing peft, accelerate, transformers, modelscope, oss2..."
59+
pip install --upgrade peft accelerate transformers "modelscope[framework]" oss2
60+
61+
# Install latest vllm
62+
echo ""
63+
echo "Installing latest vllm..."
64+
pip install --upgrade vllm
65+
66+
# Get site-packages path and install transformer_engine and megatron_core
67+
echo ""
68+
echo "Installing transformer_engine and megatron_core..."
69+
SITE_PACKAGES=$(python -c "import site; print(site.getsitepackages()[0])")
70+
echo "Site-packages path: $SITE_PACKAGES"
71+
72+
CUDNN_PATH=$SITE_PACKAGES/nvidia/cudnn \
73+
CPLUS_INCLUDE_PATH=$SITE_PACKAGES/nvidia/cudnn/include \
74+
pip install --no-build-isolation "transformer_engine[pytorch]" megatron_core --no-cache-dir
75+
76+
# Install flash-attention (force local build)
77+
echo ""
78+
echo "Installing flash-attention (local build for $GPU_NAME)..."
79+
TORCH_CUDA_ARCH_LIST="$TORCH_CUDA_ARCH_LIST" \
80+
MAX_JOBS=8 \
81+
FLASH_ATTENTION_FORCE_BUILD=TRUE \
82+
pip install flash-attn --no-build-isolation --no-cache-dir
83+
84+
# Install numpy
85+
echo ""
86+
echo "Installing numpy==2.2 and deep_gemm..."
87+
pip install numpy==2.2
88+
pip uninstall deep_gemm -y
89+
cd /tmp
90+
git clone --recursive https://github.com/deepseek-ai/DeepGEMM.git
91+
cd DeepGEMM
92+
pip install . --no-build-isolation
93+
94+
# Verify installation
95+
echo ""
96+
echo "Verifying installation..."
97+
echo ""
98+
python -c "
99+
import pkg_resources
100+
101+
packages = ['peft', 'accelerate', 'transformers', 'modelscope', 'oss2', 'vllm', 'transformer_engine', 'megatron_core', 'flash_attn', 'numpy']
102+
103+
print('Installed package versions:')
104+
print('-' * 40)
105+
for pkg in packages:
106+
try:
107+
version = pkg_resources.get_distribution(pkg).version
108+
print(f'{pkg}: {version}')
109+
except pkg_resources.DistributionNotFound:
110+
print(f'{pkg}: Not installed')
111+
"
112+
113+
echo ""
114+
echo "=========================================="
115+
echo "Installation complete!"
116+
echo "=========================================="

README.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ Tinker-compatible APIs.
9090
We are rolling out training service built atop Twinkle✨ on ModelScope. It is currently in _Beta_. You may
9191
sign up for free access by joining the [Twinkle-Explorers](https://modelscope.cn/organization/twinkle-explorers) organization, and
9292
train via API endpoint `base_url=https://www.modelscope.cn/twinkle`. For more details, please refer to
93-
our [documentation](docs/source_en/Usage%20Guide/ModelScope-Official-Resources.md).
93+
our [documentation](docs/source_en/Usage%20Guide/Train-as-a-Service.md).
9494

9595
## Supported Hardware
9696

@@ -134,7 +134,7 @@ supported on Twinkle✨ framework.
134134
| | [deepseek-ai/DeepSeek-R1](https://modelscope.cn/models/deepseek-ai/DeepSeek-R1) | transformers>=4.39.3 || [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) |
135135
| deepSeek-r1-distill | [deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) ~32B | transformers>=4.37 || [deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) |
136136

137-
For a more detailed model support list 👉 [Quick Start.md](https://github.com/modelscope/twinkle/blob/dev/docs/source/%E4%BD%BF%E7%94%A8%E6%8C%87%E5%BC%95/%E5%BF%AB%E9%80%9F%E5%BC%80%E5%A7%8B.md)
137+
For more detailed model support list 👉 [Quick Start](docs/source_en/Usage%20Guide/Quick-Start.md)
138138

139139
## Sample Code
140140

@@ -207,7 +207,7 @@ if __name__ == '__main__':
207207
import os
208208
from tqdm import tqdm
209209
from tinker import types
210-
from twinkle_client import init_tinker_compat_client
210+
from twinkle_client import init_tinker_client
211211
from twinkle.dataloader import DataLoader
212212
from twinkle.dataset import Dataset, DatasetMeta
213213
from twinkle.preprocessor import SelfCognitionProcessor
@@ -224,8 +224,11 @@ dataset.map(SelfCognitionProcessor('twinkle Model', 'twinkle Team'), load_from_c
224224
dataset.encode(batched=True, load_from_cache_file=False)
225225
dataloader = DataLoader(dataset=dataset, batch_size=8)
226226

227-
# Initialize tinker client
228-
service_client = init_tinker_compat_client(base_url, api_key)
227+
# Initialize Tinker client before importing ServiceClient
228+
init_tinker_client()
229+
from tinker import ServiceClient
230+
231+
service_client = ServiceClient(base_url=base_url, api_key=api_key)
229232
training_client = service_client.create_lora_training_client(base_model=base_model[len('ms://'):], rank=16)
230233

231234
# Training loop: use input_feature_to_datum to transfer the input format

README_ZH.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ pip install -e .
113113
| | [deepseek-ai/DeepSeek-R1](https://modelscope.cn/models/deepseek-ai/DeepSeek-R1) | transformers>=4.39.3 || [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) |
114114
| deepSeek-r1-distill | [deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) ~32B | transformers>=4.37 || [deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) |
115115

116-
更详细的模型支持列表 👉 [快速开始.md](https://github.com/modelscope/twinkle/blob/dev/docs/source/%E4%BD%BF%E7%94%A8%E6%8C%87%E5%BC%95/%E5%BF%AB%E9%80%9F%E5%BC%80%E5%A7%8B.md)
116+
更详细的模型支持列表 👉 [快速开始.md](docs/source_zh/使用指引/快速开始.md)
117117

118118
## 示例代码
119119

@@ -186,7 +186,7 @@ if __name__ == '__main__':
186186
import os
187187
from tqdm import tqdm
188188
from tinker import types
189-
from twinkle_client import init_tinker_compat_client
189+
from twinkle_client import init_tinker_client
190190
from twinkle.dataloader import DataLoader
191191
from twinkle.dataset import Dataset, DatasetMeta
192192
from twinkle.preprocessor import SelfCognitionProcessor
@@ -203,8 +203,11 @@ dataset.map(SelfCognitionProcessor('twinkle Model', 'twinkle Team'), load_from_c
203203
dataset.encode(batched=True, load_from_cache_file=False)
204204
dataloader = DataLoader(dataset=dataset, batch_size=8)
205205

206-
# Initialize tinker client
207-
service_client = init_tinker_compat_client(base_url, api_key)
206+
# Initialize Tinker client before importing ServiceClient
207+
init_tinker_client()
208+
from tinker import ServiceClient
209+
210+
service_client = ServiceClient(base_url=base_url, api_key=api_key)
208211
training_client = service_client.create_lora_training_client(base_model=base_model[len('ms://'):], rank=16)
209212

210213
# Training loop: use input_feature_to_datum to transfer the input format

ROADMAP.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@
6565
- [ ] 支持DPO对齐训练
6666
- [ ] 支持colocate RL训练
6767
- [ ] Preprocess支持batched
68+
- [ ] 对多replica的支持和粘滞路由
6869

6970
### 网络能力
7071

@@ -84,5 +85,6 @@
8485
- [ ] Support for DPO alignment training
8586
- [ ] Support for colocate RL training
8687
- [ ] Support for batched preprocessing
88+
- [ ] Support for multiple replicas and sticky routing
8789

8890
### Networking Capabilities

cookbook/client/tinker/lora.py

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -13,20 +13,25 @@
1313

1414
import os
1515

16-
from twinkle_client import init_tinker_compat_client
16+
# Step 2: Initialize Tinker client before importing ServiceClient
17+
from twinkle_client import init_tinker_client
1718

18-
# Step 2: Initialize the Tinker-compatible client to communicate with the server.
19-
# - base_url: the address of the running server
20-
# - api_key: authentication token (loaded from environment variable)
21-
service_client = init_tinker_compat_client(
22-
base_url='http://www.modelscope.cn/twinkle', api_key=os.environ.get('MODELSCOPE_TOKEN'))
19+
init_tinker_client()
2320

24-
# Step 3: List models available on the server to verify the connection
21+
# Step 3: Use ServiceClient directly from tinker
22+
from tinker import ServiceClient
23+
24+
service_client = ServiceClient(
25+
base_url='http://www.modelscope.cn/twinkle',
26+
api_key=os.environ.get('MODELSCOPE_TOKEN')
27+
)
28+
29+
# Step 4: List models available on the server to verify the connection
2530
print('Available models:')
2631
for item in service_client.get_server_capabilities().supported_models:
2732
print('- ' + item.model_name)
2833

29-
# Step 4: Create a REST client for querying training runs and checkpoints.
34+
# Step 5: Create a REST client for querying training runs and checkpoints.
3035
# This is useful for inspecting previous training sessions or resuming training.
3136
rest_client = service_client.create_rest_client()
3237

@@ -51,7 +56,7 @@
5156
# Uncomment the line below to resume from the last checkpoint:
5257
# resume_path = chpt.tinker_path
5358

54-
# Step 5: Create or resume a training client.
59+
# Step 6: Create or resume a training client.
5560
# If resume_path is set, it restores both model weights and optimizer state.
5661
base_model = 'Qwen/Qwen2.5-7B-Instruct'
5762
if not resume_path:
@@ -60,7 +65,7 @@
6065
print('Resuming from ' + resume_path)
6166
training_client = service_client.create_training_client_from_state_with_optimizer(path=resume_path)
6267

63-
# Step 6: Prepare training data manually
68+
# Step 7: Prepare training data manually
6469
#
6570
# This example teaches the model to translate English into Pig Latin.
6671
# Each example has an "input" (English phrase) and "output" (Pig Latin).
@@ -146,7 +151,7 @@ def process_example(example: dict, tokenizer) -> types.Datum:
146151
datum0.loss_fn_inputs['weights'].tolist())):
147152
print(f'{repr(tokenizer.decode([inp])):<20} {repr(tokenizer.decode([tgt])):<20} {wgt:<10}')
148153

149-
# Step 7: Run the training loop
154+
# Step 8: Run the training loop
150155
#
151156
# For each epoch, iterate over multiple batches:
152157
# - forward_backward: sends data to the server, computes loss & gradients
@@ -174,7 +179,7 @@ def process_example(example: dict, tokenizer) -> types.Datum:
174179
save_result = save_future.result()
175180
print(f'Saved checkpoint for epoch {epoch} to {save_result.path}')
176181

177-
# Step 8: Publish the final checkpoint to ModelScope Hub.
182+
# Step 9: Publish the final checkpoint to ModelScope Hub.
178183
# NOTE: Requires a valid ModelScope token set as api_key when initializing the client.
179184
# The published model name will be: {run_id}_{checkpoint_name}
180185
rest_client.publish_checkpoint_from_tinker_path(save_result.path).result()

cookbook/client/tinker/megatron/server_config.yaml

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@ applications:
2121
route_prefix: /api/v1 # API endpoint prefix (Tinker-compatible)
2222
import_path: server # Python module to import
2323
args:
24+
server_config:
25+
per_token_model_limit: 3 # Maximum number of models (adapters) per token (server-globally enforced)
2426

2527
deployments:
2628
- name: TinkerCompatServer
@@ -33,7 +35,6 @@ applications:
3335
runtime_env:
3436
env_vars:
3537
TWINKLE_TRUST_REMOTE_CODE: "0"
36-
DEVICE_COUNT_PER_PHYSICAL_NODE: "8"
3738

3839
# 3. Sampler Service - Runs inference / sampling using vLLM engine
3940
# Used for generating text from the model (e.g., evaluating LoRA results).
@@ -52,7 +53,7 @@ applications:
5253
device_group: # Logical device group for the sampler
5354
name: sampler
5455
gpus_per_worker: 1
55-
ranks: [0,1,2,3] # GPU rank indices to use
56+
ranks: 4 # GPU rank indices to use
5657
device_type: cuda
5758
device_mesh:
5859
device_type: cuda
@@ -71,7 +72,6 @@ applications:
7172
runtime_env:
7273
env_vars:
7374
TWINKLE_TRUST_REMOTE_CODE: "0"
74-
DEVICE_COUNT_PER_PHYSICAL_NODE: "8"
7575

7676
# 2. Model Service (commented out) - Would host the base model for training.
7777
# Uncomment and configure if you need a training model worker.
@@ -86,7 +86,7 @@ applications:
8686
nproc_per_node: 4 # Number of GPU processes per node
8787
device_group:
8888
name: model
89-
ranks: [4,5,6,7] # GPU rank indices
89+
ranks: 4 # GPU rank indices
9090
device_type: cuda
9191
device_mesh:
9292
device_type: cuda
@@ -97,7 +97,6 @@ applications:
9797
rps_limit: 20 # Max requests per second
9898
tps_limit: 16000 # Max tokens per second
9999
adapter_config:
100-
per_token_adapter_limit: 3 # Max concurrent LoRA adapters
101100
adapter_timeout: 30 # Seconds before idle adapter unload
102101
adapter_max_lifetime: 36000 # Maximum lifetime of an adapter in seconds (e.g., 10 hours)
103102
deployments:
@@ -111,4 +110,3 @@ applications:
111110
runtime_env:
112111
env_vars:
113112
TWINKLE_TRUST_REMOTE_CODE: "0"
114-
DEVICE_COUNT_PER_PHYSICAL_NODE: "8"

0 commit comments

Comments
 (0)