@@ -112,35 +112,31 @@ supported on Twinkle✨ framework.
112112> both Tinker APIs, as well as the full-fledged Twinkle✨ native APIs. The serverless endpoint is backed
113113> by one training base at a time, and currently it is [ Qwen3-30B-A3B-Instruct-2507] ( https://modelscope.cn/models/Qwen/Qwen3-30B-A3B-Instruct-2507 ) .
114114
115-
116- | Model Type | Model ID on [ ModelScope] ( https://modelscope.cn ) | Requires | Megatron Support | HF Model ID |
117- | ------------------- | --------------------------------------------------------------------------------------------------------------------------| -------------------- | ---------------- | ---------------------------------------------------------------------------------------------------------- |
118- | qwen3 series | [ Qwen/Qwen3-0.6B-Base] ( https://modelscope.cn/models/Qwen/Qwen3-0.6B-Base ) ~ 32B | transformers>=4.51 | ✅ | [ Qwen/Qwen3-0.6B-Base] ( https://huggingface.co/Qwen/Qwen3-0.6B-Base ) |
119- | qwen3_moe series | [ Qwen/Qwen3-30B-A3B-Base] ( https://modelscope.cn/models/Qwen/Qwen3-30B-A3B-Base ) | transformers>=4.51 | ✅ | [ Qwen/Qwen3-30B-A3B-Base] ( https://huggingface.co/Qwen/Qwen3-30B-A3B-Base ) |
120- | | [ Qwen/Qwen3-30B-A3B] ( https://modelscope.cn/models/Qwen/Qwen3-30B-A3B ) ~ 235B | transformers>=4.51 | ✅ | [ Qwen/Qwen3-30B-A3B] ( https://huggingface.co/Qwen/Qwen3-30B-A3B ) |
121- | qwen2 series | [ Qwen/Qwen2-0.5B-Instruct] ( https://modelscope.cn/models/Qwen/Qwen2-0.5B-Instruct ) ~ 72B | transformers>=4.37 | ✅ | [ Qwen/Qwen2-0.5B-Instruct] ( https://huggingface.co/Qwen/Qwen2-0.5B-Instruct ) |
122- | | [ Qwen/Qwen2.5-0.5B-Instruct] ( https://modelscope.cn/models/Qwen/Qwen2.5-0.5B-Instruct ) ~ 72B | transformers>=4.37 | ✅ | [ Qwen/Qwen2.5-0.5B-Instruct] ( https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct ) |
123- | | [ Qwen/Qwen2.5-0.5B] ( https://modelscope.cn/models/Qwen/Qwen2.5-0.5B ) ~ 72B | transformers>=4.37 | ✅ | [ Qwen/Qwen2.5-0.5B] ( https://huggingface.co/Qwen/Qwen2.5-0.5B ) |
124- | qwen2_moe series | [ Qwen/Qwen1.5-MoE-A2.7B-Chat] ( https://modelscope.cn/models/Qwen/Qwen1.5-MoE-A2.7B-Chat ) | transformers>=4.40 | ✅ | [ Qwen/Qwen1.5-MoE-A2.7B-Chat] ( https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B-Chat ) |
125- | chatglm4 series | [ ZhipuAI/glm-4-9b-chat] ( https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat ) | transformers>=4.42 | ✘ | [ zai-org/glm-4-9b-chat] ( https://huggingface.co/zai-org/glm-4-9b-chat ) |
126- | | [ ZhipuAI/LongWriter-glm4-9b] ( https://modelscope.cn/models/ZhipuAI/LongWriter-glm4-9b ) | transformers>=4.42 | ✘ | [ zai-org/LongWriter-glm4-9b] ( https://huggingface.co/zai-org/LongWriter-glm4-9b ) |
127- | glm_edge series | [ ZhipuAI/glm-edge-1.5b-chat] ( https://modelscope.cn/models/ZhipuAI/glm-edge-1.5b-chat ) | transformers>=4.46 | ✘ | [ zai-org/glm-edge-1.5b-chat] ( https://huggingface.co/zai-org/glm-edge-1.5b-chat ) |
128- | | [ ZhipuAI/glm-edge-4b-chat] ( https://modelscope.cn/models/ZhipuAI/glm-edge-4b-chat ) | transformers>=4.46 | ✘ | [ zai-org/glm-edge-4b-chat] ( https://huggingface.co/zai-org/glm-edge-4b-chat ) |
129- | internlm2 series | [ Shanghai_AI_Laboratory/internlm2-1_8b] ( https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-1_8b ) | transformers>=4.38 | ✘ | [ internlm/internlm2-1_8b] ( https://huggingface.co/internlm/internlm2-1_8b ) |
130- | | [ Shanghai_AI_Laboratory/internlm2-chat-7b] ( https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-7b ) | transformers>=4.38 | ✘ | [ internlm/internlm2-chat-7b] ( https://huggingface.co/internlm/internlm2-chat-7b ) |
131- | deepseek_v1 | [ deepseek-ai/deepseek-vl-7b-chat] ( https://modelscope.cn/models/deepseek-ai/deepseek-vl-7b-chat ) | transformers>=4.39.4 | ✅ | —— |
132- | | [ deepseek-ai/DeepSeek-V2-Lite] ( https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Lite ) | transformers>=4.39.3 | ✅ | [ deepseek-ai/DeepSeek-V2-Lite] ( https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite ) |
133- | | [ deepseek-ai/DeepSeek-V2.5] ( https://modelscope.cn/models/deepseek-ai/DeepSeek-V2.5 ) | transformers>=4.39.3 | ✅ | [ deepseek-ai/DeepSeek-V2.5] ( https://huggingface.co/deepseek-ai/DeepSeek-V2.5 ) |
134- | | [ deepseek-ai/DeepSeek-R1] ( https://modelscope.cn/models/deepseek-ai/DeepSeek-R1 ) | transformers>=4.39.3 | ✅ | [ deepseek-ai/DeepSeek-R1] ( https://huggingface.co/deepseek-ai/DeepSeek-R1 ) |
135- | deepSeek-r1-distill | [ deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B] ( https://modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B ) ~ 32B | transformers>=4.37 | ✅ | [ deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B] ( https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B ) |
115+ | Model Type | Model ID on [ ModelScope] ( https://modelscope.cn ) | Model Size | Requires | Support Megatron | HF Model ID |
116+ | ------------------- | ------------------------------------------------------------ | :-------------------------------------: | -------------------- | :--------------: | :----------------------------------------------------------: |
117+ | qwen3 series | [ Qwen/Qwen3-14B-Base] ( https://modelscope.cn/models/Qwen/Qwen3-14B-Base ) | 0.6B/1.7B/4B/8B/14B | transformers>=4.51 | ✔ | [ Qwen/Qwen3-14B-Base] ( https://huggingface.co/Qwen/Qwen3-14B-Base ) |
118+ | | [ Qwen/Qwen3-32B] ( https://modelscope.cn/models/Qwen/Qwen3-32B ) | 0.6B/1.7B/4B/8B/14B/32B | transformers>=4.51 | ✔ | [ Qwen/Qwen3-32B] ( https://huggingface.co/Qwen/Qwen3-32B ) |
119+ | qwen3_moe series | [ Qwen/Qwen3-30B-A3B-Base] ( https://modelscope.cn/models/Qwen/Qwen3-30B-A3B-Base ) | 30B-A3B/A3B-Base,235B-A22B | transformers>=4.51 | ✔ | [ Qwen/Qwen3-30B-A3B-Base] ( https://huggingface.co/Qwen/Qwen3-30B-A3B-Base ) |
120+ | qwen2 series | [ Qwen/Qwen2-0.5B-Instruct] ( https://modelscope.cn/models/Qwen/Qwen2-0.5B-Instruct ) | 0.5B/1.5B/7B/72B | transformers>=4.37 | ✔ | [ Qwen/Qwen2-0.5B-Instruct] ( https://huggingface.co/Qwen/Qwen2-0.5B-Instruct ) |
121+ | | [ Qwen/Qwen2-1.5B] ( https://modelscope.cn/models/Qwen/Qwen2-1.5B ) | 0.5B/1.5B/7B/72B | transformers>=4.37 | ✔ | [ Qwen/Qwen2-1.5B] ( https://huggingface.co/Qwen/Qwen2-1.5B ) |
122+ | | [ Qwen/Qwen2.5-1.5B-Instruct] ( https://modelscope.cn/models/Qwen/Qwen2.5-1.5B-Instruct ) | 0.5B/1.5B/3B/7B/14B/32B/72B | transformers>=4.37 | ✔ | [ Qwen/Qwen2.5-1.5B-Instruct] ( https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct ) |
123+ | | [ Qwen/Qwen2.5-0.5B] ( https://modelscope.cn/models/Qwen/Qwen2.5-0.5B ) | 0.5B/1.5B/3B/7B/14B/32B | transformers>=4.37 | ✔ | [ Qwen/Qwen2.5-0.5B] ( https://huggingface.co/Qwen/Qwen2.5-0.5B ) |
124+ | qwen2_moe series | [ Qwen/Qwen1.5-MoE-A2.7B-Chat] ( https://modelscope.cn/models/Qwen/Qwen1.5-MoE-A2.7B-Chat ) | - | transformers>=4.40 | ✔ | [ Qwen/Qwen1.5-MoE-A2.7B-Chat] ( https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B-Chat ) |
125+ | | [ Qwen/Qwen1.5-MoE-A2.7B] ( https://modelscope.cn/models/Qwen/Qwen1.5-MoE-A2.7B ) | - | transformers>=4.40 | ✔ | [ Qwen/Qwen1.5-MoE-A2.7B] ( https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B ) |
126+ | chatglm3 series | [ ZhipuAI/chatglm3-6b] ( https://modelscope.cn/models/ZhipuAI/chatglm3-6b ) | 6b/6b-base/6b-32k/6b-128k | transformers<4.42 | ✘ | [ zai-org/chatglm3-6b] ( https://huggingface.co/zai-org/chatglm3-6b ) |
127+ | chatglm4 series | [ ZhipuAI/glm-4-9b-chat] ( https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat ) | glm-4-9b/glm-4-9b-chat/glm-4-9b-chat-1m | transformers>=4.42 | ✘ | [ zai-org/glm-4-9b-chat] ( https://huggingface.co/zai-org/glm-4-9b-chat ) |
128+ | | [ ZhipuAI/LongWriter-glm4-9b] ( https://modelscope.cn/models/ZhipuAI/LongWriter-glm4-9b ) | - | transformers>=4.42 | ✘ | [ zai-org/LongWriter-glm4-9b] ( https://huggingface.co/zai-org/LongWriter-glm4-9b ) |
129+ | glm_edge series | [ ZhipuAI/glm-edge-1.5b-chat] ( https://modelscope.cn/models/ZhipuAI/glm-edge-1.5b-chat ) | 1.5b-chat/4b-chat | transformers>=4.46 | ✘ | [ zai-org/glm-edge-1.5b-chat] ( https://huggingface.co/zai-org/glm-edge-1.5b-chat ) |
130+ | internlm2 series | [ Shanghai_AI_Laboratory/internlm2-1_8b] ( https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-1_8b ) | 1_8b/chat-1_8b-sft/base-7b/7b/chat-7b/ | transformers>=4.38 | ✘ | [ internlm/internlm2-1_8b] ( https://huggingface.co/internlm/internlm2-1_8b ) |
131+ | deepseek_v1 | [ deepseek-ai/DeepSeek-V2-Lite] ( https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Lite ) | V2/V2-Lite/V2-Chat/2-Lite-Chat/V2.5 | transformers>=4.39.3 | ✔ | [ deepseek-ai/DeepSeek-V2-Lite] ( https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite ) |
132+ | | [ deepseek-ai/DeepSeek-Prover-V2-7B] ( https://modelscope.cn/models/deepseek-ai/DeepSeek-Prover-V2-7B ) | - | transformers>=4.39.3 | ✔ | [ deepseek-ai/DeepSeek-Prover-V2-7B] ( https://huggingface.co/deepseek-ai/DeepSeek-Prover-V2-7B ) |
133+ | | [ deepseek-ai/DeepSeek-R1] ( https://modelscope.cn/models/deepseek-ai/DeepSeek-R1 ) | - | transformers>=4.39.3 | ✔ | [ deepseek-ai/DeepSeek-R1] ( https://huggingface.co/deepseek-ai/DeepSeek-R1 ) |
134+ | deepSeek-r1-distill | [ deepseek-ai/DeepSeek-R1-Distill-Qwen-7B] ( https://modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B ) | 1.5B/7B/14B/32B | transformers>=4.37 | ✔ | [ deepseek-ai/DeepSeek-R1-Distill-Qwen-7B] ( https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B ) |
136135
137136For more detailed model support list 👉 [ Quick Start] ( docs/source_en/Usage%20Guide/Quick-Start.md )
138137
139138## Sample Code
140139
141- Below are some of the capabilities demonstrated in the example code. For a complete introduction to training capabilities,
142- please refer to [ Quick Start] ( docs/source_en/Usage%20Guide/Quick-Start.md ) and [ cookbook] ( cookbook ) .
143-
144140### Train with Ray
145141
146142``` python
@@ -160,7 +156,7 @@ twinkle.initialize(mode='ray', groups=device_group, global_device_mesh=device_me
160156
161157def train ():
162158 # to load model from Hugging Face, use 'hf://...'
163- base_model = ' ms://Qwen/Qwen3-4B '
159+ base_model = ' ms://Qwen/Qwen2.5-7B-Instruct '
164160 # 1000 samples
165161 dataset = Dataset(dataset_meta = DatasetMeta(' ms://swift/self-cognition' , data_slice = range (1000 )))
166162 # Set template to prepare encoding
@@ -210,20 +206,20 @@ if __name__ == '__main__':
210206import os
211207from tqdm import tqdm
212208from tinker import types
213- from twinkle import init_tinker_client
209+ from twinkle_client import init_tinker_client
214210from twinkle.dataloader import DataLoader
215211from twinkle.dataset import Dataset, DatasetMeta
216212from twinkle.preprocessor import SelfCognitionProcessor
217213from twinkle.server.tinker.common import input_feature_to_datum
218214
219215base_model = ' ms://Qwen/Qwen3-30B-A3B-Instruct-2507'
220- base_url= ' your-base-url '
221- api_key= ' your-api-key '
216+ base_url= ' http://www.modelscope.cn/twinkle '
217+ api_key= os.environ.get( ' MODELSCOPE_TOKEN ' )
222218
223219# Use twinkle dataset to load the data
224220dataset = Dataset(dataset_meta = DatasetMeta(' ms://swift/self-cognition' , data_slice = range (500 )))
225221dataset.set_template(' Template' , model_id = base_model, max_length = 256 )
226- dataset.map(SelfCognitionProcessor(' twinkle Model' , ' ModelScope Team' ), load_from_cache_file = False )
222+ dataset.map(SelfCognitionProcessor(' twinkle Model' , ' twinkle Team' ), load_from_cache_file = False )
227223dataset.encode(batched = True , load_from_cache_file = False )
228224dataloader = DataLoader(dataset = dataset, batch_size = 8 )
229225
0 commit comments