Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,9 @@ For more detailed model support list 👉 [Quick Start](docs/source_en/Usage%20

## Sample Code

Below are some of the capabilities demonstrated in the example code. For a complete introduction to training capabilities,
please refer to [Quick Start](docs/source_en/Usage%20Guide/Quick-Start.md) and [cookbook](cookbook).

### Train with Ray

```python
Expand All @@ -157,7 +160,7 @@ twinkle.initialize(mode='ray', groups=device_group, global_device_mesh=device_me

def train():
# to load model from Hugging Face, use 'hf://...'
base_model = 'ms://Qwen/Qwen2.5-7B-Instruct'
base_model = 'ms://Qwen/Qwen3-4B'
# 1000 samples
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(1000)))
# Set template to prepare encoding
Expand Down Expand Up @@ -214,13 +217,13 @@ from twinkle.preprocessor import SelfCognitionProcessor
from twinkle.server.tinker.common import input_feature_to_datum

base_model = 'ms://Qwen/Qwen3-30B-A3B-Instruct-2507'
base_url='http://www.modelscope.cn/twinkle'
api_key=os.environ.get('MODELSCOPE_TOKEN')
base_url='your-base-url'
api_key='your-api-key'

# Use twinkle dataset to load the data
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(500)))
dataset.set_template('Template', model_id=base_model, max_length=256)
dataset.map(SelfCognitionProcessor('twinkle Model', 'twinkle Team'), load_from_cache_file=False)
dataset.map(SelfCognitionProcessor('twinkle Model', 'ModelScope Team'), load_from_cache_file=False)
dataset.encode(batched=True, load_from_cache_file=False)
dataloader = DataLoader(dataset=dataset, batch_size=8)

Expand Down
20 changes: 12 additions & 8 deletions README_ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,9 +67,11 @@ pip install -e .
| twinkle 客户端微调 | megatron | [脚本](cookbook/client/twinkle/megatron) |
| twinkle 客户端微调 | transformer | [脚本](cookbook/client/twinkle/transformer) |

Twinkle✨支持相同的算法接口运行在单GPU、torchrun多机、Ray、Client等各场景下。其算法过程是外露的,非常便于修改和调试。完整的框架介绍请查看[快速开始](docs/source_zh/使用指引/快速开始.md)

## 更新日志

- 🎉2026-02-13 Twinkle✨ 初始版本发布,包括对文本模型的 SFT/PT/RL 支持以及在 [ModelScope](https://modelscope.cn) 上的无服务器训练能力
🎉2026-02-13 Twinkle✨ 初始版本发布,支持文本模型的SFT/PT/RL训练。我们还通过兼容Tinker的API,在魔搭社区上提供了无服务器训练功能

## ModelScope 的训练服务

Expand All @@ -88,8 +90,8 @@ pip install -e .

随着新模型的发布,我们将添加对更多模型的支持。下表列出了 Twinkle✨ 框架当前支持的模型。

>[!注意]
> 对于通过 `base_url=https://www.modelscope.cn/twinkle` 访问的无服务器训练服务,目前一次只支持一个训练基座,当前是 [Qwen3-30B-A3B-Instruct-2507](https://modelscope.cn/models/Qwen/Qwen3-30B-A3B-Instruct-2507)。
>[!Note]
> 通过 `base_url=https://www.modelscope.cn/twinkle` 访问的无服务器训练服务,目前是通过兼容Tinker的API提供的。我们将陆续推出同时支持Tinker API和完整Twinkle✨原生 API的服务。无服务器端点每次由一个训练基座支持,目前使用的是[Qwen3-30B-A3B-Instruct-2507](https://modelscope.cn/models/Qwen/Qwen3-30B-A3B-Instruct-2507)。


| 模型类型 | [ModelScope](https://modelscope.cn) 上的模型 ID | 要求 | Megatron 支持 | HF 模型 ID |
Expand Down Expand Up @@ -117,6 +119,8 @@ pip install -e .

## 示例代码

下面列出了示例代码的一部分能力。完整的训练能力介绍请参考[快速开始](docs/source_zh/使用指引/快速开始.md)以及[cookbook](cookbook)。

### 使用 Ray 训练

```python
Expand All @@ -136,7 +140,7 @@ twinkle.initialize(mode='ray', groups=device_group, global_device_mesh=device_me

def train():
# to load model from Hugging Face, use 'hf://...'
base_model = 'ms://Qwen/Qwen2.5-7B-Instruct'
base_model = 'ms://Qwen/Qwen3-4B'
# 1000 samples
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(1000)))
# Set template to prepare encoding
Expand Down Expand Up @@ -180,7 +184,7 @@ if __name__ == '__main__':
train()
```

### 使用类 Tinker API
### 使用类 Tinker API实现无服务器式训练

```python
import os
Expand All @@ -193,13 +197,13 @@ from twinkle.preprocessor import SelfCognitionProcessor
from twinkle.server.tinker.common import input_feature_to_datum

base_model = 'ms://Qwen/Qwen3-30B-A3B-Instruct-2507'
base_url='http://www.modelscope.cn/twinkle'
api_key=os.environ.get('MODELSCOPE_TOKEN')
base_url='your-base-url'
api_key='your-api-key'

# Use twinkle dataset to load the data
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(500)))
dataset.set_template('Template', model_id=base_model, max_length=256)
dataset.map(SelfCognitionProcessor('twinkle Model', 'twinkle Team'), load_from_cache_file=False)
dataset.map(SelfCognitionProcessor('twinkle Model', 'ModelScope Team'), load_from_cache_file=False)
dataset.encode(batched=True, load_from_cache_file=False)
dataloader = DataLoader(dataset=dataset, batch_size=8)

Expand Down
Binary file modified assets/framework.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
46 changes: 0 additions & 46 deletions cookbook/megatron/qwen3_5.py

This file was deleted.

10 changes: 7 additions & 3 deletions cookbook/megatron/tp.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
from twinkle.dataset import Dataset, DatasetMeta
from twinkle.model import MegatronModel
from twinkle.preprocessor import SelfCognitionProcessor
from twinkle.server.tinker.common import input_feature_to_datum
from twinkle.server.tinker.common.compat_base import TwinkleCompatModelBase

# Construct a device_mesh, tp=pp=cp=2, dp=1
device_mesh = DeviceMesh.from_sizes(dp_size=1, tp_size=2, pp_size=2, cp_size=2)
Expand All @@ -20,7 +22,7 @@
def eval(model):
# 100 Samples
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(100)))
dataset.set_template('Template', model_id='ms://Qwen/Qwen2.5-7B-Instruct')
dataset.set_template('Template', model_id='ms://Qwen/Qwen3-4B')
dataset.map(SelfCognitionProcessor('twinkle大模型', 'ModelScope社区'))
dataset.encode()
dataloader = DataLoader(dataset=dataset, batch_size=16)
Expand All @@ -34,15 +36,15 @@ def train():
# 1000 samples
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(1000)))
# Set template to prepare encoding
dataset.set_template('Template', model_id='ms://Qwen/Qwen2.5-7B-Instruct')
dataset.set_template('Template', model_id='ms://Qwen/Qwen3-4B')
# Preprocess the dataset to standard format
dataset.map(SelfCognitionProcessor('twinkle大模型', 'ModelScope社区'))
# Encode dataset
dataset.encode()
# Global batch size = 1, dp_size = 1
dataloader = DataLoader(dataset=dataset, batch_size=16)
# Use a MegatronModel
model = MegatronModel(model_id='ms://Qwen/Qwen2.5-7B-Instruct')
model = MegatronModel(model_id='ms://Qwen/Qwen3-4B')

lora_config = LoraConfig(r=8, lora_alpha=32, target_modules='all-linear')

Expand All @@ -63,6 +65,8 @@ def train():
for step, batch in enumerate(dataloader):
# Do forward and backward
model.forward_backward(inputs=batch)
_inputs = [input_feature_to_datum(b) for b in batch]
_temp = TwinkleCompatModelBase._get_forward_output(_inputs, model.optimizer_group['default'].outputs['logits'])
# Step
model.clip_grad_and_step()
if step % 5 == 0:
Expand Down
8 changes: 4 additions & 4 deletions cookbook/megatron/tp_moe.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
def eval(model):
# 100 Samples
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(100)))
dataset.set_template('Template', model_id='ms://Qwen/Qwen3-30B-A3B-Instruct-2507')
dataset.set_template('Template', model_id='ms://Qwen/Qwen3.5-35B-A3B')
dataset.map(SelfCognitionProcessor('twinkle大模型', 'ModelScope社区'))
dataset.encode()
dataloader = DataLoader(dataset=dataset, batch_size=16)
Expand All @@ -34,15 +34,15 @@ def train():
# 1000 samples
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(1000)))
# Set template to prepare encoding
dataset.set_template('Template', model_id='ms://Qwen/Qwen3-30B-A3B-Instruct-2507')
dataset.set_template('Template', model_id='ms://Qwen/Qwen3.5-35B-A3B')
# Preprocess the dataset to standard format
dataset.map(SelfCognitionProcessor('twinkle大模型', 'ModelScope社区'))
# Encode dataset
dataset.encode()
# Global batch size = 1, dp_size = 1
dataloader = DataLoader(dataset=dataset, batch_size=16)
# Use a MegatronModel
model = MegatronModel(model_id='ms://Qwen/Qwen3-30B-A3B-Instruct-2507')
model = MegatronModel(model_id='ms://Qwen/Qwen3.5-35B-A3B')

lora_config = LoraConfig(r=8, lora_alpha=32, target_modules='all-linear')

Expand Down Expand Up @@ -75,7 +75,7 @@ def train():
if loss_metric > float(metrics['loss']):
model.save(f'checkpoint-{step}')
loss_metric = float(metrics['loss'])
model.save(f'last-checkpoint')
model.save('last-checkpoint', merge_lora=True)


if __name__ == '__main__':
Expand Down
6 changes: 3 additions & 3 deletions cookbook/ray/single_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
def eval(model):
# 100 Samples
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(100)))
dataset.set_template('Template', model_id='ms://Qwen/Qwen2.5-7B-Instruct')
dataset.set_template('Template', model_id='ms://Qwen/Qwen3.5-35B-A3B')
dataset.map(SelfCognitionProcessor('twinkle大模型', 'ModelScope社区'))
dataset.encode()
dataloader = DataLoader(dataset=dataset, batch_size=8, min_batch_size=8)
Expand All @@ -41,15 +41,15 @@ def train():
# 1000 samples
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(1000)))
# Set template to prepare encoding
dataset.set_template('Template', model_id='ms://Qwen/Qwen2.5-7B-Instruct')
dataset.set_template('Template', model_id='ms://Qwen/Qwen3-4B')
# Preprocess the dataset to standard format
dataset.map(SelfCognitionProcessor('twinkle大模型', 'ModelScope社区'))
# Encode dataset
dataset.encode()
# Global batch size = 8, for GPUs, so 1 sample per GPU
dataloader = DataLoader(dataset=dataset, batch_size=8, min_batch_size=8)
# Use a TransformersModel
model = TransformersModel(model_id='ms://Qwen/Qwen2.5-7B-Instruct', remote_group='default')
model = TransformersModel(model_id='ms://Qwen/Qwen3-4B', remote_group='default')

lora_config = LoraConfig(r=8, lora_alpha=32, target_modules='all-linear')

Expand Down
2 changes: 1 addition & 1 deletion cookbook/rl/grpo.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@

logger = get_logger()

MODEL_ID = os.environ.get('MODEL_ID', 'ms://Qwen/Qwen2.5-3B-Instruct')
MODEL_ID = os.environ.get('MODEL_ID', 'ms://Qwen/Qwen3-4B')
USE_MEGATRON = bool(int(os.environ.get('USE_MEGATRON', '1')))

MODEL_GPUS = int(os.environ.get('MODEL_GPUS', 4))
Expand Down
6 changes: 3 additions & 3 deletions cookbook/transformers/fsdp2.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
def eval(model):
# 100 Samples
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(100)))
dataset.set_template('Template', model_id='ms://Qwen/Qwen2.5-7B-Instruct')
dataset.set_template('Template', model_id='ms://Qwen/Qwen3-4B')
dataset.map(SelfCognitionProcessor('twinkle大模型', 'ModelScope社区'))
dataset.encode()
dataloader = DataLoader(dataset=dataset, batch_size=8)
Expand All @@ -35,15 +35,15 @@ def train():
# 1000 samples
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(1000)))
# Set template to prepare encoding
dataset.set_template('Template', model_id='ms://Qwen/Qwen2.5-7B-Instruct')
dataset.set_template('Template', model_id='ms://Qwen/Qwen3-4B')
# Preprocess the dataset to standard format
dataset.map(SelfCognitionProcessor('twinkle大模型', 'ModelScope社区'))
# Encode dataset
dataset.encode()
# Global batch size = 8, for GPUs, so 1 sample per GPU
dataloader = DataLoader(dataset=dataset, batch_size=8)
# Use a TransformersModel
model = TransformersModel(model_id='ms://Qwen/Qwen2.5-7B-Instruct')
model = TransformersModel(model_id='ms://Qwen/Qwen3-4B')

lora_config = LoraConfig(r=8, lora_alpha=32, target_modules='all-linear')

Expand Down
2 changes: 1 addition & 1 deletion cookbook/transformers/sp_fsdp_dense.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from twinkle.preprocessor import SelfCognitionProcessor

logger = get_logger()
MODEL_ID = 'ms://Qwen/Qwen2.5-7B-Instruct'
MODEL_ID = 'ms://Qwen/Qwen3-4B'
DATASETS = 'ms://swift/self-cognition'

device_group = [DeviceGroup(
Expand Down
4 changes: 2 additions & 2 deletions docs/source_en/Components/Advantage/GRPOAdvantage.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ from twinkle.sampler import vLLMSampler
from twinkle.reward import MathReward

# Create components
actor = TransformersModel(model_id='Qwen/Qwen2.5-7B-Instruct')
sampler = vLLMSampler(model_id='Qwen/Qwen2.5-7B-Instruct')
actor = TransformersModel(model_id='ms://Qwen/Qwen3-4B')
sampler = vLLMSampler(model_id='ms://Qwen/Qwen3-4B')
reward_fn = MathReward()
advantage_fn = GRPOAdvantage()

Expand Down
7 changes: 4 additions & 3 deletions docs/source_en/Components/Advantage/RLOOAdvantage.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ RLOO advantages:
- More accurate counterfactual baseline estimation
- Better performance when there are more samples

## Complete Training Example
## Training Example

```python
from twinkle.advantage import RLOOAdvantage
Expand All @@ -38,10 +38,11 @@ from twinkle.sampler import vLLMSampler
from twinkle.reward import MathReward

# Create components
actor = TransformersModel(model_id='Qwen/Qwen2.5-7B-Instruct')
sampler = vLLMSampler(model_id='Qwen/Qwen2.5-7B-Instruct')
actor = TransformersModel(model_id='ms://Qwen/Qwen3-4B')
sampler = vLLMSampler(model_id='ms://Qwen/Qwen3-4B')
reward_fn = MathReward()
advantage_fn = RLOOAdvantage()
dataloader = ...

# Training loop
for batch in dataloader:
Expand Down
2 changes: 1 addition & 1 deletion docs/source_en/Components/Data Format/Sampling.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ Usage example:
from twinkle.data_format import SamplingParams, SampleResponse
from twinkle.sampler import vLLMSampler

sampler = vLLMSampler(model_id='Qwen/Qwen2.5-7B-Instruct')
sampler = vLLMSampler(model_id='ms://Qwen/Qwen3-4B')
params = SamplingParams(max_tokens=512, temperature=0.7, top_p=0.9)
response: SampleResponse = sampler.sample(trajectories, sampling_params=params, num_samples=4)

Expand Down
2 changes: 1 addition & 1 deletion docs/source_en/Components/Dataset/Dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ dataset = Dataset(DatasetMeta(dataset_id='my/custom/dataset.jsonl', data_slice=r
The Template component is responsible for converting string/image multimodal raw data into model input tokens. The dataset can set a Template to complete the `encode` process.

```python
dataset.set_template('Template', model_id='ms://Qwen/Qwen2.5-7B-Instruct', max_length=512)
dataset.set_template('Template', model_id='ms://Qwen/Qwen3-4B', max_length=512)
```

The set_template method supports passing `kwargs` (such as `max_length` in the example) to be used as constructor parameters for `Template`.
Expand Down
2 changes: 1 addition & 1 deletion docs/source_en/Components/Model/MegatronModel.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ from twinkle.model import MegatronModel
from twinkle import DeviceMesh
from twinkle.dataloader import DataLoader
dataloader = DataLoader(...)
model = MegatronModel(model_id='ms://Qwen/Qwen2.5-7B-Instruct', device_mesh=DeviceMesh.from_sizes(dp_size=2, tp_size=2, pp_size=2), remote_group='actor')
model = MegatronModel(model_id='ms://Qwen/Qwen3-4B', device_mesh=DeviceMesh.from_sizes(dp_size=2, tp_size=2, pp_size=2), remote_group='actor')
model.add_adapter_to_model(...)
model.set_optimizer('default', adapter_name='...')
for data in dataloader:
Expand Down
2 changes: 1 addition & 1 deletion docs/source_en/Components/Model/TransformersModel.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ from twinkle.model import TransformersModel
from twinkle import DeviceMesh
from twinkle.dataloader import DataLoader
dataloader = DataLoader(...)
model = TransformersModel(model_id='ms://Qwen/Qwen2.5-7B-Instruct', device_mesh=DeviceMesh.from_sizes(dp_size=2, fsdp_size=2), remote_group='actor')
model = TransformersModel(model_id='ms://Qwen/Qwen3-4B', device_mesh=DeviceMesh.from_sizes(dp_size=2, fsdp_size=2), remote_group='actor')
model.add_adapter_to_model(...)
model.set_optimizer(..., adapter_name='...')
for data in dataloader:
Expand Down
2 changes: 1 addition & 1 deletion docs/source_en/Components/Reward/Reward.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ from twinkle.sampler import vLLMSampler
from twinkle.reward import MathReward
from twinkle.advantage import GRPOAdvantage

sampler = vLLMSampler(model_id='Qwen/Qwen2.5-7B-Instruct')
sampler = vLLMSampler(model_id='ms://Qwen/Qwen3-4B')
reward_fn = MathReward()
advantage_fn = GRPOAdvantage()

Expand Down
2 changes: 1 addition & 1 deletion docs/source_en/Components/Sampler/TorchSampler.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ from twinkle.sampler import TorchSampler
from twinkle import DeviceMesh

sampler = TorchSampler(
model_id='ms://Qwen/Qwen2.5-7B-Instruct',
model_id='ms://Qwen/Qwen3-4B',
device_mesh=DeviceMesh.from_sizes(dp_size=1),
)

Expand Down
Loading
Loading