modelscope
diff --git a/‎.github/copilot-instructions.md‎
Lines changed: 1 addition & 1 deletion b/‎.github/copilot-instructions.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 227 additions & 5 deletions b/‎README.md‎
Lines changed: 227 additions & 5 deletions
diff --git a/‎README_ZH.md‎
Lines changed: 50 additions & 7 deletions b/‎README_ZH.md‎
Lines changed: 50 additions & 7 deletions
diff --git a/‎ROADMAP.md‎
Lines changed: 2 additions & 0 deletions b/‎ROADMAP.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎assets/multi_lora.png‎
178 KB b/‎assets/multi_lora.png‎
178 KB
diff --git a/‎client_tools/client_generator.py‎
Lines changed: 2 additions & 10 deletions b/‎client_tools/client_generator.py‎
Lines changed: 2 additions & 10 deletions
diff --git a/‎cookbook/client/tinker/megatron/lora.py‎
Lines changed: 0 additions & 158 deletions b/‎cookbook/client/tinker/megatron/lora.py‎
Lines changed: 0 additions & 158 deletions
diff --git a/‎cookbook/client/tinker/transformer/grpo.py‎
Lines changed: 7 additions & 7 deletions b/‎cookbook/client/tinker/transformer/grpo.py‎
Lines changed: 7 additions & 7 deletions
diff --git a/‎cookbook/client/twinkle/transformer/grpo.py‎
Lines changed: 5 additions & 6 deletions b/‎cookbook/client/twinkle/transformer/grpo.py‎
Lines changed: 5 additions & 6 deletions
@@ -57,7 +57,7 @@ These instructions help AI agents work productively in this repo. Focus on concr
 ## Examples
 - **Visualize a custom mesh:** create `DeviceMesh` and call `get_device_placement()`; example in [tests/infra/test_infra_graph.py](tests/infra/test_infra_graph.py).
 - **Add LoRA adapter via HTTP:** POST to `/add_adapter_to_model` with serialized `LoraConfig`; see server routes in [src/twinkle/server/twinkle/model.py](src/twinkle/server/twinkle/model.py).
-- **Sample with vLLM:** Configure `VLLMSampler`, set `Template`/`Processor`, then `sample()` on `Trajectory` list; see [src/twinkle/sampler/vllm_sampler.py](src/twinkle/sampler/vllm_sampler.py).
+- **Sample with vLLM:** Configure `vLLMSampler`, set `Template`/`Processor`, then `sample()` on `Trajectory` list; see [src/twinkle/sampler/vllm_sampler.py](src/twinkle/sampler/vllm_sampler.py).
 
 ---
 Questions or gaps? Tell us where guidance is unclear (e.g., missing run scripts, Ray cluster setup), and we’ll refine this document.
@@ -21,7 +21,7 @@
 </p>
 
 <p align="center">
-        <a href="https://swift.readthedocs.io/en/latest/">English Documentation</a> &nbsp ｜ &nbsp <a href="https://swift.readthedocs.io/zh-cn/latest/">中文文档</a> &nbsp
+        <a href="https://twinkle-kit.readthedocs.io/en/latest/">English Documentation</a> &nbsp ｜ &nbsp <a href="https://twinkle-kit.readthedocs.io/zh-cn/latest/">中文文档</a> &nbsp
 </p>
 
 <div align="center">
@@ -191,14 +191,57 @@ twinkle的架构由client和server两部分构成，其中client端包含两个
 
 这使得开发者可以直接使用Tinker API调用twinkle部署起来的后端训练服务。
 
+## 多租户支持
+
+Twinkle支持多个租户同时使用一个基模型进行训练。这一行为目前仅限于[LoRA](https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/config.py#L323)。
+Twinkle采用了LoRA池+租户申请的技术方案。这个方案可以支持最大N个租户并行训练互不干扰，并且在模型角度来看，不同租户的训练流程可能不同，在基模中的数据padding方式、optimizer、Loss类型也可以不同。
+
+<img src="assets/multi_lora.png" style="max-width: 500px; width: 100%;" />
+
+例如：
+
+- 租户A：本机加载本地私有数据集，loRA rank=8，使用基模进行SFT
+- 租户B：使用远端加载Hub端开源数据集，LoRA rank=32，使用基模进行PT
+- 租户C：使用基模进行GRPO Loss计算，使用Sampler采样
+- 租户D：使用基模进行logps推理
+
+这些过程可以同时发生在一个基模上，因为模型、Sampler本质上也是twinkle组件的一部分，可以做到任务无关。训练完成后，支持checkpoint推送HuggingFace/ModelScope的模型仓库，默认为私有。twinkle提供了完整的多租户训练解决方案，在server端支持集群化管理和动态扩缩容，可以进行简单定制化后作为企业级服务。
+
+> 作为模块化框架，twinkle本身也可以支持远端临时的独占式训练，即全参数方式。
+
+
 ## 支持的组件
 
-|                                                            |                                                          |                                                              |                                                            |                                                                |
-| :--------------------------------------------------------: | :-------------------------------------------------------: | :----------------------------------------------------------: | :--------------------------------------------------------: | :-------------------------------------------------------------: |
-|  **Dataset**`<br><sub>`数据加载和预处理`</sub>`  |    **Template**`<br><sub>`编码和解码`</sub>`    | **DataLoader**`<br><sub>`数据分发和batch化`</sub>` |    **Preprocessor**`<br><sub>`数据ETL`</sub>`    | **InputProcessor**`<br><sub>`处理任务特定输入`</sub>` |
-| **Model**`<br><sub>`大模型，支持多种框架`</sub>` |      **Sampler**`<br><sub>`采样器`</sub>`      |          **Loss**`<br><sub>`残差`</sub>`          |    **Metric**`<br><sub>`训练指标集合`</sub>`    |         **Reward**`<br><sub>`奖励函数`</sub>`         |
-|     **Advantage**`<br><sub>`优势函数`</sub>`     | **CheckpointEngine**`<br><sub>`权重同步`</sub>` |   **Patch**`<br><sub>`补丁，用于模型修复`</sub>`   | **Module**`<br><sub>`组件，例如Optimizer`</sub>` |           **Kernel**`<br><sub>`算子`</sub>`           |
-|    **Server**`<br><sub>`开启后端集群`</sub>`    |     **Client**`<br><sub>`客户端代码`</sub>`     | **Infra**`<br><sub>`隔离ray和torchrun差异`</sub>` |    **Plugin**`<br><sub>`使用hub端组件`</sub>`    |       **Hub**`<br><sub>`对接HF/MS网络库`</sub>`       |
+<table>
+  <tr>
+    <td align="center"><b>Dataset</b><br><sub>数据加载和预处理</sub></td>
+    <td align="center"><b>Template</b><br><sub>编码和解码</sub></td>
+    <td align="center"><b>DataLoader</b><br><sub>数据分发和batch化</sub></td>
+    <td align="center"><b>Preprocessor</b><br><sub>数据ETL</sub></td>
+    <td align="center"><b>InputProcessor</b><br><sub>处理任务特定输入</sub></td>
+  </tr>
+  <tr>
+    <td align="center"><b>Model</b><br><sub>大模型，支持多种框架</sub></td>
+    <td align="center"><b>Sampler</b><br><sub>采样器</sub></td>
+    <td align="center"><b>Loss</b><br><sub>残差</sub></td>
+    <td align="center"><b>Metric</b><br><sub>训练指标集合</sub></td>
+    <td align="center"><b>Reward</b><br><sub>奖励函数</sub></td>
+  </tr>
+  <tr>
+    <td align="center"><b>Advantage</b><br><sub>优势函数</sub></td>
+    <td align="center"><b>CheckpointEngine</b><br><sub>权重同步</sub></td>
+    <td align="center"><b>Patch</b><br><sub>补丁，用于模型修复</sub></td>
+    <td align="center"><b>Module</b><br><sub>组件，例如Optimizer</sub></td>
+    <td align="center"><b>Kernel</b><br><sub>算子</sub></td>
+  </tr>
+  <tr>
+    <td align="center"><b>Server</b><br><sub>开启后端集群</sub></td>
+    <td align="center"><b>Client</b><br><sub>客户端代码</sub></td>
+    <td align="center"><b>Infra</b><br><sub>隔离ray和torchrun差异</sub></td>
+    <td align="center"><b>Plugin</b><br><sub>使用hub端组件</sub></td>
+    <td align="center"><b>Hub</b><br><sub>对接HF/MS网络库</sub></td>
+  </tr>
+</table>
 
 ## 社区组件
 
 
@@ -64,6 +64,7 @@
 - [ ] 支持GKD、on-policy-distill等蒸馏算法
 - [ ] 支持DPO对齐训练
 - [ ] 支持colocate RL训练
+- [ ] Preprocess支持batched
 
 ### 网络能力
 
@@ -82,6 +83,7 @@
 - [ ] Support for distillation algorithms such as GKD and on-policy distillation
 - [ ] Support for DPO alignment training
 - [ ] Support for colocate RL training
+- [ ] Support for batched preprocessing
 
 ### Networking Capabilities
 
@@ -240,7 +240,6 @@ def build_imports() -> Tuple[List[str], str]:
             if typing_imports:
                 lines.append(f"from typing import {', '.join(sorted(typing_imports))}")
             lines.extend([
-                "from twinkle_client.http import TWINKLE_SERVER_URL",
                 "from twinkle_client.http import http_post, heartbeat_manager",
             ])
             lines.extend(sorted(twinkle_imports))
@@ -447,7 +446,6 @@ def generate_models():
 
     model_code = AUTO_GEN_WARNING + '''from typing import Any, Optional, Union, Type, Dict, Literal, List
 import uuid
-from twinkle_client.http import TWINKLE_SERVER_URL
 from twinkle_client.http import http_post, heartbeat_manager
 from twinkle import DeviceMesh
 from twinkle.data_format import InputFeature, Trajectory
@@ -724,18 +722,13 @@ def generate_samplers():
     client_module_path.mkdir(parents=True, exist_ok=True)
 
     sampler_code = AUTO_GEN_WARNING + '''from typing import Any, Optional, List, Dict, Union
-import uuid
-from twinkle_client.http import TWINKLE_SERVER_URL
 from twinkle_client.http import http_post, heartbeat_manager
 from twinkle.sampler.base import Sampler
-from twinkle.sampler.types import SamplingParams, SampleResponse
-from twinkle import DeviceMesh
 from peft import PeftConfig
 from twinkle.data_format import Trajectory, InputFeature
-import json
 
 
-class VLLMSampler(Sampler):
+class vLLMSampler(Sampler):
     """Client wrapper for Sampler that calls server HTTP endpoints.
     
     This client manages sampling operations and adapter synchronization with the sampler server.
@@ -756,7 +749,6 @@ def __init__(self, model_id: str, **kwargs):
             json_data=kwargs
         )
         response.raise_for_status()
-        return response.json()
     
     def _send_adapter_heartbeat(self):
         """Internal method to send adapter heartbeat."""
@@ -859,7 +851,7 @@ def set_template(self, template_cls: str, adapter_name: str = '', **kwargs):
 
     # Create/overwrite __init__.py
     init_file = client_module_path / '__init__.py'
-    init_content = AUTO_GEN_WARNING + "from .vllm_sampler import VLLMSampler\n"
+    init_content = AUTO_GEN_WARNING + "from .vllm_sampler import vLLMSampler\n"
     print(f"Writing {init_file}...")
     with open(init_file, 'w', encoding='utf-8') as f:
         f.write(init_content)
 
@@ -35,25 +35,25 @@
 logger = get_logger()
 
 # ========== Configuration ==========
-BASE_MODEL = "Qwen/Qwen2.5-3B-Instruct"
+MODEL_ID = 'ms://Qwen/Qwen2.5-0.5B-Instruct'
 NUM_GENERATIONS = 8
 MAX_NEW_TOKENS = 1024
 LEARNING_RATE = 1e-5
-MAX_STEPS = 2000
+MAX_STEPS = 10
 BATCH_SIZE = 4
 TEMPERATURE = 1.0
-SYNC_INTERVAL = 1       # Save weights for sampler every N steps
-LORA_RANK = 8
+SYNC_INTERVAL = 5       # Save weights for sampler every N steps
+GRADIENT_ACCUMULATION_STEPS = 4
 
 
 def create_countdown_dataset():
     """Create Countdown Game dataset for GRPO training."""
-    from twinkle.preprocessor import CountdownProcessor
+
     dataset = Dataset(DatasetMeta(
-        "ms://zouxuhong/Countdown-Tasks-3to4", data_slice=range(50000)))
+        "ms://zouxuhong/Countdown-Tasks-3to4", data_slice=range(500)))
     dataset.set_template(
         "Template", model_id=f'ms://{BASE_MODEL}', max_length=8192)
-    dataset.map(CountdownProcessor())
+    dataset.map('CountdownProcessor')
     dataset.encode(add_generation_prompt=True)
     return dataset
 
 
@@ -41,26 +41,25 @@
 logger = get_logger()
 
 # ========== Configuration ==========
-MODEL_ID = 'ms://Qwen/Qwen2.5-3B-Instruct'
+MODEL_ID = 'ms://Qwen/Qwen2.5-0.5B-Instruct'
 NUM_GENERATIONS = 8
 MAX_NEW_TOKENS = 1024
 LEARNING_RATE = 1e-5
-MAX_STEPS = 2000
+MAX_STEPS = 10
 BATCH_SIZE = 4
 TEMPERATURE = 1.0
-SYNC_INTERVAL = 1       # Save weights for sampler every N steps
+SYNC_INTERVAL = 5       # Save weights for sampler every N steps
 GRADIENT_ACCUMULATION_STEPS = 4
 
 
 def create_countdown_dataset():
     """Create Countdown Game dataset for GRPO training."""
-    from twinkle.preprocessor import CountdownProcessor
 
     dataset = Dataset(dataset_meta=DatasetMeta(
-        "ms://zouxuhong/Countdown-Tasks-3to4", data_slice=range(50000)))
+        "ms://zouxuhong/Countdown-Tasks-3to4", data_slice=range(500)))
     dataset.set_template(
         'Template', model_id=MODEL_ID, max_length=8192)
-    dataset.map(CountdownProcessor())
+    dataset.map('CountdownProcessor')
     dataset.encode(add_generation_prompt=True, batched=True)
     return dataset