Skip to content

Commit 0d5da72

Browse files
committed
bump model to qwen3.5-4b
1 parent a07fc14 commit 0d5da72

File tree

19 files changed

+38
-38
lines changed

19 files changed

+38
-38
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ supported on Twinkle✨ framework.
129129
> For serverless training service accessed via `base_url=https://www.modelscope.cn/twinkle`, it
130130
> is currently provided via the Tinker-compatible APIs. We will be rolling out services that support
131131
> both Tinker APIs, as well as the full-fledged Twinkle✨ native APIs. The serverless endpoint is backed
132-
> by one training base at a time, and currently it is [Qwen3-30B-A3B-Instruct-2507](https://modelscope.cn/models/Qwen/Qwen3-30B-A3B-Instruct-2507).
132+
> by one training base at a time, and currently it is [Qwen3.5-4B](https://modelscope.cn/models/Qwen/Qwen3.5-4B).
133133
134134
| Model Type | Model ID on [ModelScope](https://modelscope.cn) | Model Size | Requires | Support Megatron | HF Model ID |
135135
|---------------------|-----------------------------------------------------------------------------------------------------------------|:---------------------------------------:|----------------------|:----------------:|:---------------------------------------------------------------------------------------------------------:|
@@ -234,7 +234,7 @@ from twinkle.dataset import Dataset, DatasetMeta
234234
from twinkle.preprocessor import SelfCognitionProcessor
235235
from twinkle.server.common import input_feature_to_datum
236236

237-
base_model = 'ms://Qwen/Qwen3-30B-A3B-Instruct-2507'
237+
base_model = 'ms://Qwen/Qwen3.5-4B'
238238
base_url='your-base-url'
239239
api_key='your-api-key'
240240

README_ZH.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ Twinkle✨支持相同的算法接口运行在单GPU、torchrun多机、Ray、Cl
112112
随着新模型的发布,我们将添加对更多模型的支持。下表列出了 Twinkle✨ 框架当前支持的模型。
113113

114114
>[!Note]
115-
> 通过 `base_url=https://www.modelscope.cn/twinkle` 访问的无服务器训练服务,目前是通过兼容Tinker的API提供的。我们将陆续推出同时支持Tinker API和完整Twinkle✨原生 API的服务。无服务器端点每次由一个训练基座支持,目前使用的是[Qwen3-30B-A3B-Instruct-2507](https://modelscope.cn/models/Qwen/Qwen3-30B-A3B-Instruct-2507)
115+
> 通过 `base_url=https://www.modelscope.cn/twinkle` 访问的无服务器训练服务,目前是通过兼容Tinker的API提供的。我们将陆续推出同时支持Tinker API和完整Twinkle✨原生 API的服务。无服务器端点每次由一个训练基座支持,目前使用的是[Qwen3.5-4B](https://modelscope.cn/models/Qwen/Qwen3.5-4B)
116116
117117
| Model Type | Model ID 举例 | Model Size | Requires | Support Megatron | HF Model ID |
118118
|---------------------|-----------------------------------------------------------------------------------------------------------------|:---------------------------------------:|----------------------|:----------------:|:---------------------------------------------------------------------------------------------------------:|
@@ -216,7 +216,7 @@ from twinkle.dataset import Dataset, DatasetMeta
216216
from twinkle.preprocessor import SelfCognitionProcessor
217217
from twinkle.server.common import input_feature_to_datum
218218

219-
base_model = 'ms://Qwen/Qwen3-30B-A3B-Instruct-2507'
219+
base_model = 'ms://Qwen/Qwen3.5-4B'
220220
base_url='your-base-url'
221221
api_key='your-api-key'
222222

cookbook/client/server/megatron/server_config.yaml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -36,11 +36,11 @@ applications:
3636

3737
# 3. Sampler Service - Runs inference / sampling using vLLM engine
3838
# Used for generating text from the model (e.g., evaluating LoRA results).
39-
- name: sampler-Qwen3-30B-A3B-Instruct-2507
40-
route_prefix: /api/v1/sampler/Qwen/Qwen3-30B-A3B-Instruct-2507
39+
- name: sampler-Qwen3.5-4B
40+
route_prefix: /api/v1/sampler/Qwen/Qwen3.5-4B
4141
import_path: sampler
4242
args:
43-
model_id: "ms://Qwen/Qwen3-30B-A3B-Instruct-2507" # ModelScope model identifier
43+
model_id: "ms://Qwen/Qwen3.5-4B" # ModelScope model identifier
4444
nproc_per_node: 4 # Number of GPU processes per node
4545
sampler_type: vllm # Inference engine: 'vllm' (fast) or 'torch' (TorchSampler)
4646
engine_args: # vLLM engine-specific settings
@@ -73,12 +73,12 @@ applications:
7373

7474
# 2. Model Service (commented out) - Would host the base model for training.
7575
# Uncomment and configure if you need a training model worker.
76-
- name: models-Qwen3-30B-A3B-Instruct-2507
77-
route_prefix: /api/v1/model/Qwen/Qwen3-30B-A3B-Instruct-2507
76+
- name: models-Qwen3.5-4B
77+
route_prefix: /api/v1/model/Qwen/Qwen3.5-4B
7878
import_path: model
7979
args:
8080
use_megatron: true # Use HuggingFace Transformers backend
81-
model_id: "ms://Qwen/Qwen3-30B-A3B-Instruct-2507" # ModelScope model identifier
81+
model_id: "ms://Qwen/Qwen3.5-4B" # ModelScope model identifier
8282
max_length: 16000 # model max length
8383
max_loras: 5 # model max loras
8484
nproc_per_node: 4 # Number of GPU processes per node

cookbook/client/server/megatron/server_config_4b.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ applications:
3838
route_prefix: /api/v1/model/Qwen/Qwen3.5-4B
3939
import_path: model
4040
args:
41-
use_megatron: false
41+
use_megatron: true
4242
model_cls: Qwen3_5ForConditionalGeneration
4343
model_id: "ms://Qwen/Qwen3.5-4B" # ModelScope model identifier
4444
max_length: 10240

cookbook/client/tinker/modelscope/sample.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616

1717
from tinker import ServiceClient
1818

19-
base_model = 'Qwen/Qwen3-30B-A3B-Instruct-2507'
19+
base_model = 'Qwen/Qwen3.5-4B'
2020
base_url = 'http://www.modelscope.cn/twinkle'
2121

2222
# Step 2: Define the base model and connect to the server
@@ -29,7 +29,7 @@
2929
# The model_path is a twinkle:// URI pointing to a previously saved LoRA checkpoint.
3030
# The server will load the base model and apply the LoRA adapter weights.
3131
sampling_client = service_client.create_sampling_client(
32-
model_path='twinkle://xxx-Qwen_Qwen3-30B-A3B-Instruct-2507-xxx/weights/twinkle-lora-1',
32+
model_path='twinkle://xxx-Qwen_Qwen3.5-4B-xxx/weights/twinkle-lora-1',
3333
base_model=base_model
3434
)
3535

cookbook/client/tinker/modelscope/self_cognition.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
from tinker import ServiceClient
2424

2525
# The base model to fine-tune / evaluate
26-
base_model = 'Qwen/Qwen3-30B-A3B-Instruct-2507'
26+
base_model = 'Qwen/Qwen3.5-4B'
2727
base_url = 'http://www.modelscope.cn/twinkle'
2828

2929

cookbook/client/tinker/modelscope/short_math_grpo.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
logger = get_logger()
3939

4040
# ========== Configuration ==========
41-
BASE_MODEL = 'Qwen/Qwen3-30B-A3B-Instruct-2507'
41+
BASE_MODEL = 'Qwen/Qwen3.5-4B'
4242
NUM_GENERATIONS = 8
4343
MAX_NEW_TOKENS = 4096
4444
LEARNING_RATE = 1e-4

cookbook/client/tinker/self_host/sample.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
# The model_path is a twinkle:// URI pointing to a previously saved LoRA checkpoint.
2828
# The server will load the base model and apply the LoRA adapter weights.
2929
sampling_client = service_client.create_sampling_client(
30-
model_path='twinkle://xxx-Qwen_Qwen3-30B-A3B-Instruct-2507-xxx/weights/twinkle-lora-1',
30+
model_path='twinkle://xxx-Qwen_Qwen3.5-4B-xxx/weights/twinkle-lora-1',
3131
base_model=base_model
3232
)
3333

cookbook/client/twinkle/modelscope/self_congnition.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121

2222
logger = get_logger()
2323

24-
base_model = 'Qwen/Qwen3-30B-A3B-Instruct-2507'
24+
base_model = 'Qwen/Qwen3.5-4B'
2525
base_url = 'http://www.modelscope.cn/twinkle'
2626

2727
# Step 2: Initialize the Twinkle client to communicate with the remote server.

cookbook/transformers/ep_fsdp_qwen3_moe.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111

1212
logger = get_logger()
1313

14-
MODEL_ID = os.environ.get('QWEN3_MODEL_ID', 'ms://Qwen/Qwen3-30B-A3B-Instruct-2507')
14+
MODEL_ID = os.environ.get('QWEN3_MODEL_ID', 'ms://Qwen/Qwen3.5-4B')
1515
DATASET_ID = os.environ.get('DATASET_ID', 'ms://swift/self-cognition')
1616
TEMPLATE_ID = os.environ.get('TEMPLATE_ID', 'Template')
1717
_num_layers_env = os.environ.get('NUM_LAYERS')

0 commit comments

Comments
 (0)