-
Notifications
You must be signed in to change notification settings - Fork 22
update #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update #57
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -51,7 +51,7 @@ be reused in [ms-swift](https://github.com/modelscope/ms-swift). | |||||||||
| pip install 'twinkle-kit' | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ### Installation from Source: | ||||||||||
| ### Install from Source: | ||||||||||
|
|
||||||||||
| ```shell | ||||||||||
| git clone https://github.com/modelscope/twinkle.git | ||||||||||
|
|
@@ -75,13 +75,14 @@ pip install -e . | |||||||||
|
|
||||||||||
| ## Changelog | ||||||||||
|
|
||||||||||
| - 🎉2026-02-10 Initial version of Twinkle✨ released, including SFT/PT/RL for text models and serverless training capabilities on [ModelScope](https://modelscope.cn). | ||||||||||
| - 🎉2026-02-13 Initial version of Twinkle✨ released, including SFT/PT/RL support for text models and serverless training capabilities on [ModelScope](https://modelscope.cn). | ||||||||||
|
|
||||||||||
| # ModelScope Community | ||||||||||
| ## Training as a Service on ModelScope | ||||||||||
|
|
||||||||||
| ## ModelScope Official Environment | ||||||||||
|
|
||||||||||
| The ModelScope community provides an official environment for running Twinkle. The API endpoint is: [base_url](https://www.modelscope.cn/twinkle). Developers can refer to our [documentation](docs/source_en/Usage%20Guide/ModelScope-Official-Resources.md) for usage instructions. | ||||||||||
| We are rolling out training service built atop Twinkle✨ on ModelScope. It is currently in _Beta_. You may | ||||||||||
| sign up for free access by joining the [Twinkle-Explorers](https://modelscope.cn/organization/twinkle-explorers) organization, and | ||||||||||
| train via API endpoint `base_url=https://www.modelscope.cn/twinkle`. For more details, please refer to | ||||||||||
| our [documentation](docs/source_en/Usage%20Guide/ModelScope-Official-Resources.md). | ||||||||||
|
|
||||||||||
| ## Supported Hardware | ||||||||||
|
|
||||||||||
|
|
@@ -95,29 +96,33 @@ The ModelScope community provides an official environment for running Twinkle. T | |||||||||
| ## Supported Models | ||||||||||
|
|
||||||||||
| We will be adding support for more models as new models are released. The following table lists current models | ||||||||||
| supported on Twinkle✨ framework. However, the models supported on our serverless training backend may be a | ||||||||||
| much smaller subset. Please refer to the [doc](link) section for more information. | ||||||||||
|
|
||||||||||
| | Model Type | Model ID on[ModelScope](https://modelscope.cn) | Requires | Megatron Support | HF Model ID | | ||||||||||
| | ------------------- | --------------------------------------------------------------------------------------------------------------------- | -------------------- | ---------------- | ---------------------------------------------------------------------------------------------------------- | | ||||||||||
| | qwen3 series | [Qwen/Qwen3-0.6B-Base](https://modelscope.cn/models/Qwen/Qwen3-0.6B-Base)~32B | transformers>=4.51 | ✔ | [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) | | ||||||||||
| | qwen3_moe series | [Qwen/Qwen3-30B-A3B-Base](https://modelscope.cn/models/Qwen/Qwen3-30B-A3B-Base) | transformers>=4.51 | ✔ | [Qwen/Qwen3-30B-A3B-Base](https://huggingface.co/Qwen/Qwen3-30B-A3B-Base) | | ||||||||||
| | | [Qwen/Qwen3-30B-A3B](https://modelscope.cn/models/Qwen/Qwen3-30B-A3B)~235B | transformers>=4.51 | ✔ | [Qwen/Qwen3-30B-A3B](https://huggingface.co/Qwen/Qwen3-30B-A3B) | | ||||||||||
| | qwen2 series | [Qwen/Qwen2-0.5B-Instruct](https://modelscope.cn/models/Qwen/Qwen2-0.5B-Instruct) ~72B | transformers>=4.37 | ✔ | [Qwen/Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) | | ||||||||||
| | | [Qwen/Qwen2.5-0.5B-Instruct](https://modelscope.cn/models/Qwen/Qwen2.5-0.5B-Instruct)~72B | transformers>=4.37 | ✔ | [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) | | ||||||||||
| | | [Qwen/Qwen2.5-0.5B](https://modelscope.cn/models/Qwen/Qwen2.5-0.5B)~72B | transformers>=4.37 | ✔ | [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) | | ||||||||||
| | qwen2_moe series | [Qwen/Qwen1.5-MoE-A2.7B-Chat](https://modelscope.cn/models/Qwen/Qwen1.5-MoE-A2.7B-Chat) | transformers>=4.40 | ✔ | [Qwen/Qwen1.5-MoE-A2.7B-Chat](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B-Chat) | | ||||||||||
| supported on Twinkle✨ framework. | ||||||||||
|
|
||||||||||
| >[!Note] | ||||||||||
| > For serverless training service accessed via `base_url=https://www.modelscope.cn/twinkle`, it currently supports | ||||||||||
| > one training base at a time, and currently it is [Qwen3-30B-A3B-Instruct-2507](https://modelscope.cn/models/Qwen/Qwen3-30B-A3B-Instruct-2507). | ||||||||||
|
|
||||||||||
|
|
||||||||||
| | Model Type | Model ID on [ModelScope](https://modelscope.cn) | Requires | Megatron Support | HF Model ID | | ||||||||||
| | ------------------- |--------------------------------------------------------------------------------------------------------------------------| -------------------- | ---------------- | ---------------------------------------------------------------------------------------------------------- | | ||||||||||
|
Comment on lines
+106
to
+107
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The markdown table formatting appears to be broken. The separator for the second column is not aligned and lacks spacing, which can cause rendering issues on some platforms. Please adjust the separator line to align with the headers for better readability and consistent rendering.
Suggested change
|
||||||||||
| | qwen3 series | [Qwen/Qwen3-0.6B-Base](https://modelscope.cn/models/Qwen/Qwen3-0.6B-Base)~32B | transformers>=4.51 | ✅ | [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) | | ||||||||||
| | qwen3_moe series | [Qwen/Qwen3-30B-A3B-Base](https://modelscope.cn/models/Qwen/Qwen3-30B-A3B-Base) | transformers>=4.51 | ✅ | [Qwen/Qwen3-30B-A3B-Base](https://huggingface.co/Qwen/Qwen3-30B-A3B-Base) | | ||||||||||
| | | [Qwen/Qwen3-30B-A3B](https://modelscope.cn/models/Qwen/Qwen3-30B-A3B)~235B | transformers>=4.51 | ✅ | [Qwen/Qwen3-30B-A3B](https://huggingface.co/Qwen/Qwen3-30B-A3B) | | ||||||||||
| | qwen2 series | [Qwen/Qwen2-0.5B-Instruct](https://modelscope.cn/models/Qwen/Qwen2-0.5B-Instruct) ~72B | transformers>=4.37 | ✅ | [Qwen/Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) | | ||||||||||
| | | [Qwen/Qwen2.5-0.5B-Instruct](https://modelscope.cn/models/Qwen/Qwen2.5-0.5B-Instruct)~72B | transformers>=4.37 | ✅ | [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) | | ||||||||||
| | | [Qwen/Qwen2.5-0.5B](https://modelscope.cn/models/Qwen/Qwen2.5-0.5B)~72B | transformers>=4.37 | ✅ | [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) | | ||||||||||
| | qwen2_moe series | [Qwen/Qwen1.5-MoE-A2.7B-Chat](https://modelscope.cn/models/Qwen/Qwen1.5-MoE-A2.7B-Chat) | transformers>=4.40 | ✅ | [Qwen/Qwen1.5-MoE-A2.7B-Chat](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B-Chat) | | ||||||||||
| | chatglm4 series | [ZhipuAI/glm-4-9b-chat](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat) | transformers>=4.42 | ✘ | [zai-org/glm-4-9b-chat](https://huggingface.co/zai-org/glm-4-9b-chat) | | ||||||||||
| | | [ZhipuAI/LongWriter-glm4-9b](https://modelscope.cn/models/ZhipuAI/LongWriter-glm4-9b) | transformers>=4.42 | ✘ | [zai-org/LongWriter-glm4-9b](https://huggingface.co/zai-org/LongWriter-glm4-9b) | | ||||||||||
| | glm_edge series | [ZhipuAI/glm-edge-1.5b-chat](https://modelscope.cn/models/ZhipuAI/glm-edge-1.5b-chat) | transformers>=4.46 | ✘ | [zai-org/glm-edge-1.5b-chat](https://huggingface.co/zai-org/glm-edge-1.5b-chat) | | ||||||||||
| | | [ZhipuAI/glm-edge-4b-chat](https://modelscope.cn/models/ZhipuAI/glm-edge-4b-chat) | transformers>=4.46 | ✘ | [zai-org/glm-edge-4b-chat](https://huggingface.co/zai-org/glm-edge-4b-chat) | | ||||||||||
| | internlm2 series | [Shanghai_AI_Laboratory/internlm2-1_8b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-1_8b) | transformers>=4.38 | ✘ | [internlm/internlm2-1_8b](https://huggingface.co/internlm/internlm2-1_8b) | | ||||||||||
| | | [Shanghai_AI_Laboratory/internlm2-chat-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-7b) | transformers>=4.38 | ✘ | [internlm/internlm2-chat-7b](https://huggingface.co/internlm/internlm2-chat-7b) | | ||||||||||
| | deepseek_v1 | [deepseek-ai/deepseek-vl-7b-chat](https://modelscope.cn/models/deepseek-ai/deepseek-vl-7b-chat) | transformers>=4.39.4 | ✔ | —— | | ||||||||||
| | | [deepseek-ai/DeepSeek-V2-Lite](https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Lite) | transformers>=4.39.3 | ✔ | [deepseek-ai/DeepSeek-V2-Lite](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite) | | ||||||||||
| | | [deepseek-ai/DeepSeek-V2.5](https://modelscope.cn/models/deepseek-ai/DeepSeek-V2.5) | transformers>=4.39.3 | ✔ | [deepseek-ai/DeepSeek-V2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5) | | ||||||||||
| | | [deepseek-ai/DeepSeek-R1](https://modelscope.cn/models/deepseek-ai/DeepSeek-R1) | transformers>=4.39.3 | ✔ | [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) | | ||||||||||
| | deepSeek-r1-distill | [deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) ~32B | transformers>=4.37 | ✔ | [deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) | | ||||||||||
| | deepseek_v1 | [deepseek-ai/deepseek-vl-7b-chat](https://modelscope.cn/models/deepseek-ai/deepseek-vl-7b-chat) | transformers>=4.39.4 | ✅ | —— | | ||||||||||
| | | [deepseek-ai/DeepSeek-V2-Lite](https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Lite) | transformers>=4.39.3 | ✅ | [deepseek-ai/DeepSeek-V2-Lite](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite) | | ||||||||||
| | | [deepseek-ai/DeepSeek-V2.5](https://modelscope.cn/models/deepseek-ai/DeepSeek-V2.5) | transformers>=4.39.3 | ✅ | [deepseek-ai/DeepSeek-V2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5) | | ||||||||||
| | | [deepseek-ai/DeepSeek-R1](https://modelscope.cn/models/deepseek-ai/DeepSeek-R1) | transformers>=4.39.3 | ✅ | [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) | | ||||||||||
| | deepSeek-r1-distill | [deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) ~32B | transformers>=4.37 | ✅ | [deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) | | ||||||||||
|
|
||||||||||
| For a more detailed model support list 👉 [Quick Start.md](https://github.com/modelscope/twinkle/blob/dev/docs/source/%E4%BD%BF%E7%94%A8%E6%8C%87%E5%BC%95/%E5%BF%AB%E9%80%9F%E5%BC%80%E5%A7%8B.md) | ||||||||||
|
|
||||||||||
|
|
@@ -141,18 +146,20 @@ twinkle.initialize(mode='ray', groups=device_group, global_device_mesh=device_me | |||||||||
|
|
||||||||||
|
|
||||||||||
| def train(): | ||||||||||
| # to load model from Hugging Face, use 'hf://...' | ||||||||||
| base_model = 'ms://Qwen/Qwen2.5-7B-Instruct' | ||||||||||
| # 1000 samples | ||||||||||
| dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(1000))) | ||||||||||
| # Set template to prepare encoding | ||||||||||
| dataset.set_template('Template', model_id='ms://Qwen/Qwen2.5-7B-Instruct') | ||||||||||
| dataset.set_template('Template', model_id=base_model) | ||||||||||
| # Preprocess the dataset to standard format | ||||||||||
| dataset.map(SelfCognitionProcessor('twinkle LLM', 'ModelScope Community')) | ||||||||||
| # Encode dataset | ||||||||||
| dataset.encode() | ||||||||||
| # Global batch size = 8, for GPUs, so 1 sample per GPU | ||||||||||
| dataloader = DataLoader(dataset=dataset, batch_size=8, min_batch_size=8) | ||||||||||
| # Use a TransformersModel | ||||||||||
| model = TransformersModel(model_id='ms://Qwen/Qwen2.5-7B-Instruct', remote_group='default') | ||||||||||
| model = TransformersModel(model_id=base_model, remote_group='default') | ||||||||||
|
|
||||||||||
| lora_config = LoraConfig( | ||||||||||
| r=8, | ||||||||||
|
|
@@ -184,7 +191,7 @@ if __name__ == '__main__': | |||||||||
| train() | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ### Tinker-Like Remote API | ||||||||||
| ### Using Tinker-Like API | ||||||||||
|
|
||||||||||
| ```python | ||||||||||
| import os | ||||||||||
|
|
@@ -196,17 +203,19 @@ from twinkle.dataset import Dataset, DatasetMeta | |||||||||
| from twinkle.preprocessor import SelfCognitionProcessor | ||||||||||
| from twinkle.server.tinker.common import input_feature_to_datum | ||||||||||
|
|
||||||||||
| base_model = "Qwen/Qwen2.5-0.5B-Instruct" | ||||||||||
| base_model = 'ms://Qwen/Qwen2.5-0.5B-Instruct' | ||||||||||
| base_url='http://www.modelscope.cn/twinkle' | ||||||||||
| api_key=os.environ.get('MODELSCOPE_TOKEN') | ||||||||||
|
|
||||||||||
| # Use twinkle dataset to load the data | ||||||||||
| dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(500))) | ||||||||||
| dataset.set_template('Template', model_id=f'ms://{base_model}', max_length=256) | ||||||||||
| dataset.set_template('Template', model_id=base_model, max_length=256) | ||||||||||
| dataset.map(SelfCognitionProcessor('twinkle Model', 'twinkle Team'), load_from_cache_file=False) | ||||||||||
| dataset.encode(batched=True, load_from_cache_file=False) | ||||||||||
| dataloader = DataLoader(dataset=dataset, batch_size=8) | ||||||||||
|
|
||||||||||
| # Initialize tinker client | ||||||||||
| service_client = init_tinker_compat_client(base_url='http://www.modelscope.cn/twinkle', api_key=os.environ.get('MODELSCOPE_SDK_TOKEN')) | ||||||||||
| service_client = init_tinker_compat_client(base_model, api_key) | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There seems to be a bug in this example. The
Suggested change
|
||||||||||
| training_client = service_client.create_lora_training_client(base_model=base_model, rank=16) | ||||||||||
|
|
||||||||||
| # Training loop: use input_feature_to_datum to transfer the input format | ||||||||||
|
|
@@ -223,12 +232,6 @@ for epoch in range(3): | |||||||||
| training_client.save_state(f"twinkle-lora-{epoch}").result() | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| Launch training: | ||||||||||
|
|
||||||||||
| ```shell | ||||||||||
| python3 train.py | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ## Architecture Design | ||||||||||
|
|
||||||||||
| <img src="assets/framework.jpg" style="max-width: 500px; width: 100%;" /> | ||||||||||
|
|
||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linked English documentation file (
docs/source_en/Usage Guide/ModelScope-Official-Resources.md) appears to be outdated. It still contains the old instructions, which are inconsistent with the new process described in this README and the updated Chinese documentation (docs/source_zh/使用指引/训练服务.md). To avoid confusion for English-speaking users, please consider updating this document and renaming it (e.g., toTraining-Service.md) to match the changes made to the Chinese documentation.