modelscope
diff --git a/‎.github/copilot-instructions.md‎
Lines changed: 1 addition & 1 deletion b/‎.github/copilot-instructions.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 227 additions & 5 deletions b/‎README.md‎
Lines changed: 227 additions & 5 deletions
diff --git a/‎README_ZH.md‎
Lines changed: 50 additions & 7 deletions b/‎README_ZH.md‎
Lines changed: 50 additions & 7 deletions
diff --git a/‎ROADMAP.md‎
Lines changed: 2 additions & 0 deletions b/‎ROADMAP.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎assets/multi_lora.png‎
178 KB b/‎assets/multi_lora.png‎
178 KB
diff --git a/‎client_tools/client_generator.py‎
Lines changed: 2 additions & 2 deletions b/‎client_tools/client_generator.py‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎cookbook/legacy/grpo/lora.py‎
Lines changed: 2 additions & 2 deletions b/‎cookbook/legacy/grpo/lora.py‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎cookbook/legacy/grpo/lora_gpu.py‎
Lines changed: 10 additions & 10 deletions b/‎cookbook/legacy/grpo/lora_gpu.py‎
Lines changed: 10 additions & 10 deletions
diff --git a/‎cookbook/legacy/grpo/lora_npu.py‎
Lines changed: 3 additions & 3 deletions b/‎cookbook/legacy/grpo/lora_npu.py‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎cookbook/legacy/grpo/lora_pr.py‎
Lines changed: 2 additions & 2 deletions b/‎cookbook/legacy/grpo/lora_pr.py‎
Lines changed: 2 additions & 2 deletions
@@ -57,7 +57,7 @@ These instructions help AI agents work productively in this repo. Focus on concr
 ## Examples
 - **Visualize a custom mesh:** create `DeviceMesh` and call `get_device_placement()`; example in [tests/infra/test_infra_graph.py](tests/infra/test_infra_graph.py).
 - **Add LoRA adapter via HTTP:** POST to `/add_adapter_to_model` with serialized `LoraConfig`; see server routes in [src/twinkle/server/twinkle/model.py](src/twinkle/server/twinkle/model.py).
-- **Sample with vLLM:** Configure `VLLMSampler`, set `Template`/`Processor`, then `sample()` on `Trajectory` list; see [src/twinkle/sampler/vllm_sampler.py](src/twinkle/sampler/vllm_sampler.py).
+- **Sample with vLLM:** Configure `vLLMSampler`, set `Template`/`Processor`, then `sample()` on `Trajectory` list; see [src/twinkle/sampler/vllm_sampler.py](src/twinkle/sampler/vllm_sampler.py).
 
 ---
 Questions or gaps? Tell us where guidance is unclear (e.g., missing run scripts, Ray cluster setup), and we’ll refine this document.
@@ -21,7 +21,7 @@
 </p>
 
 <p align="center">
-        <a href="https://swift.readthedocs.io/en/latest/">English Documentation</a> &nbsp ｜ &nbsp <a href="https://swift.readthedocs.io/zh-cn/latest/">中文文档</a> &nbsp
+        <a href="https://twinkle-kit.readthedocs.io/en/latest/">English Documentation</a> &nbsp ｜ &nbsp <a href="https://twinkle-kit.readthedocs.io/zh-cn/latest/">中文文档</a> &nbsp
 </p>
 
 <div align="center">
@@ -191,14 +191,57 @@ twinkle的架构由client和server两部分构成，其中client端包含两个
 
 这使得开发者可以直接使用Tinker API调用twinkle部署起来的后端训练服务。
 
+## 多租户支持
+
+Twinkle支持多个租户同时使用一个基模型进行训练。这一行为目前仅限于[LoRA](https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/config.py#L323)。
+Twinkle采用了LoRA池+租户申请的技术方案。这个方案可以支持最大N个租户并行训练互不干扰，并且在模型角度来看，不同租户的训练流程可能不同，在基模中的数据padding方式、optimizer、Loss类型也可以不同。
+
+<img src="assets/multi_lora.png" style="max-width: 500px; width: 100%;" />
+
+例如：
+
+- 租户A：本机加载本地私有数据集，loRA rank=8，使用基模进行SFT
+- 租户B：使用远端加载Hub端开源数据集，LoRA rank=32，使用基模进行PT
+- 租户C：使用基模进行GRPO Loss计算，使用Sampler采样
+- 租户D：使用基模进行logps推理
+
+这些过程可以同时发生在一个基模上，因为模型、Sampler本质上也是twinkle组件的一部分，可以做到任务无关。训练完成后，支持checkpoint推送HuggingFace/ModelScope的模型仓库，默认为私有。twinkle提供了完整的多租户训练解决方案，在server端支持集群化管理和动态扩缩容，可以进行简单定制化后作为企业级服务。
+
+> 作为模块化框架，twinkle本身也可以支持远端临时的独占式训练，即全参数方式。
+
+
 ## 支持的组件
 
-|                                                            |                                                          |                                                              |                                                            |                                                                |
-| :--------------------------------------------------------: | :-------------------------------------------------------: | :----------------------------------------------------------: | :--------------------------------------------------------: | :-------------------------------------------------------------: |
-|  **Dataset**`<br><sub>`数据加载和预处理`</sub>`  |    **Template**`<br><sub>`编码和解码`</sub>`    | **DataLoader**`<br><sub>`数据分发和batch化`</sub>` |    **Preprocessor**`<br><sub>`数据ETL`</sub>`    | **InputProcessor**`<br><sub>`处理任务特定输入`</sub>` |
-| **Model**`<br><sub>`大模型，支持多种框架`</sub>` |      **Sampler**`<br><sub>`采样器`</sub>`      |          **Loss**`<br><sub>`残差`</sub>`          |    **Metric**`<br><sub>`训练指标集合`</sub>`    |         **Reward**`<br><sub>`奖励函数`</sub>`         |
-|     **Advantage**`<br><sub>`优势函数`</sub>`     | **CheckpointEngine**`<br><sub>`权重同步`</sub>` |   **Patch**`<br><sub>`补丁，用于模型修复`</sub>`   | **Module**`<br><sub>`组件，例如Optimizer`</sub>` |           **Kernel**`<br><sub>`算子`</sub>`           |
-|    **Server**`<br><sub>`开启后端集群`</sub>`    |     **Client**`<br><sub>`客户端代码`</sub>`     | **Infra**`<br><sub>`隔离ray和torchrun差异`</sub>` |    **Plugin**`<br><sub>`使用hub端组件`</sub>`    |       **Hub**`<br><sub>`对接HF/MS网络库`</sub>`       |
+<table>
+  <tr>
+    <td align="center"><b>Dataset</b><br><sub>数据加载和预处理</sub></td>
+    <td align="center"><b>Template</b><br><sub>编码和解码</sub></td>
+    <td align="center"><b>DataLoader</b><br><sub>数据分发和batch化</sub></td>
+    <td align="center"><b>Preprocessor</b><br><sub>数据ETL</sub></td>
+    <td align="center"><b>InputProcessor</b><br><sub>处理任务特定输入</sub></td>
+  </tr>
+  <tr>
+    <td align="center"><b>Model</b><br><sub>大模型，支持多种框架</sub></td>
+    <td align="center"><b>Sampler</b><br><sub>采样器</sub></td>
+    <td align="center"><b>Loss</b><br><sub>残差</sub></td>
+    <td align="center"><b>Metric</b><br><sub>训练指标集合</sub></td>
+    <td align="center"><b>Reward</b><br><sub>奖励函数</sub></td>
+  </tr>
+  <tr>
+    <td align="center"><b>Advantage</b><br><sub>优势函数</sub></td>
+    <td align="center"><b>CheckpointEngine</b><br><sub>权重同步</sub></td>
+    <td align="center"><b>Patch</b><br><sub>补丁，用于模型修复</sub></td>
+    <td align="center"><b>Module</b><br><sub>组件，例如Optimizer</sub></td>
+    <td align="center"><b>Kernel</b><br><sub>算子</sub></td>
+  </tr>
+  <tr>
+    <td align="center"><b>Server</b><br><sub>开启后端集群</sub></td>
+    <td align="center"><b>Client</b><br><sub>客户端代码</sub></td>
+    <td align="center"><b>Infra</b><br><sub>隔离ray和torchrun差异</sub></td>
+    <td align="center"><b>Plugin</b><br><sub>使用hub端组件</sub></td>
+    <td align="center"><b>Hub</b><br><sub>对接HF/MS网络库</sub></td>
+  </tr>
+</table>
 
 ## 社区组件
 
 
@@ -64,6 +64,7 @@
 - [ ] 支持GKD、on-policy-distill等蒸馏算法
 - [ ] 支持DPO对齐训练
 - [ ] 支持colocate RL训练
+- [ ] Preprocess支持batched
 
 ### 网络能力
 
@@ -82,6 +83,7 @@
 - [ ] Support for distillation algorithms such as GKD and on-policy distillation
 - [ ] Support for DPO alignment training
 - [ ] Support for colocate RL training
+- [ ] Support for batched preprocessing
 
 ### Networking Capabilities
 
@@ -728,7 +728,7 @@ def generate_samplers():
 from twinkle.data_format import Trajectory, InputFeature
 
 
-class VLLMSampler(Sampler):
+class vLLMSampler(Sampler):
     """Client wrapper for Sampler that calls server HTTP endpoints.
     
     This client manages sampling operations and adapter synchronization with the sampler server.
@@ -851,7 +851,7 @@ def set_template(self, template_cls: str, adapter_name: str = '', **kwargs):
 
     # Create/overwrite __init__.py
     init_file = client_module_path / '__init__.py'
-    init_content = AUTO_GEN_WARNING + "from .vllm_sampler import VLLMSampler\n"
+    init_content = AUTO_GEN_WARNING + "from .vllm_sampler import vLLMSampler\n"
     print(f"Writing {init_file}...")
     with open(init_file, 'w', encoding='utf-8') as f:
         f.write(init_content)
 
@@ -15,7 +15,7 @@
 from twinkle.dataset import Dataset, DatasetMeta
 from twinkle.model import TransformersModel
 from twinkle.processor import InputProcessor
-from twinkle.sampler import VLLMSampler
+from twinkle.sampler import vLLMSampler
 from twinkle.template import Template
 from twinkle.metric import CompletionRewardMetric
 
@@ -126,7 +126,7 @@ def main():
     model.set_processor(InputProcessor, adapter_name=ADAPTER_NAME)
     model.set_template('Template', model_id=MODEL_ID, adapter_name=ADAPTER_NAME)
 
-    sampler = VLLMSampler(
+    sampler = vLLMSampler(
         model_id=MODEL_ID,
         engine_args={
             'load_format': 'dummy',
 
@@ -3,7 +3,7 @@
 
 This script tests the twinkle RL training capabilities on GPU:
 1. TransformersModel backend
-2. VLLMSampler / TorchSampler integration
+2. vLLMSampler / TorchSampler integration
 3. GRPOLoss and advantage computation
 4. Weight synchronization between model and sampler
 
@@ -16,7 +16,7 @@
     # Test with multiple GPUs (Ray mode)
     CUDA_VISIBLE_DEVICES=0,1 TWINKLE_MODE=ray python lora_gpu.py
 
-    # Use VLLMSampler (requires more GPU memory)
+    # Use vLLMSampler (requires more GPU memory)
     TWINKLE_USE_TORCH_SAMPLER=0 python lora_gpu.py
 
     # Debug mode
@@ -27,14 +27,14 @@
     TWINKLE_MAX_LENGTH: Max sequence length (default: 2048)
     TWINKLE_MAX_STEPS: Max training steps (default: 3)
     TWINKLE_USE_REF_MODEL: Use reference model for KL (default: 0)
-    TWINKLE_USE_TORCH_SAMPLER: Use TorchSampler instead of VLLMSampler (default: 1)
+    TWINKLE_USE_TORCH_SAMPLER: Use TorchSampler instead of vLLMSampler (default: 1)
     TWINKLE_DEBUG: Enable debug logging (default: 0)
     TWINKLE_MODE: 'local' or 'ray' (default: local)
 
 Test Results (as of 2026-01-30):
     - TransformersModel + TorchSampler: PASS
-    - VLLMSampler sampling: PASS  
-    - VLLMSampler LoRA weight sync: IN PROGRESS (needs more debugging)
+    - vLLMSampler sampling: PASS  
+    - vLLMSampler LoRA weight sync: IN PROGRESS (needs more debugging)
 """
 import numpy as np
 from peft import LoraConfig
@@ -52,7 +52,7 @@
 from twinkle.infra import DeviceGroup, remote_function, remote_class
 from twinkle.model import TransformersModel
 from twinkle.reward import MathReward
-from twinkle.sampler import VLLMSampler, TorchSampler
+from twinkle.sampler import vLLMSampler, TorchSampler
 from twinkle.data_format.sampling import SamplingParams
 from twinkle.weight_loader import NativeLoader
 from twinkle.advantage import GRPOAdvantage
@@ -238,8 +238,8 @@ def __init__(self, engine_args=None, lora_config=None, adapter_name=None, **kwar
             )
         else:
             if engine_args is None:
-                raise ValueError("engine_args is required for VLLMSampler.")
-            self.sampler = VLLMSampler(
+                raise ValueError("engine_args is required for vLLMSampler.")
+            self.sampler = vLLMSampler(
                 model_path,
                 engine_args=engine_args,
                 device_mesh=actor_device_mesh,
@@ -403,7 +403,7 @@ def train_local():
             device_mesh=actor_device_mesh,
         )
     else:
-        from twinkle.sampler import VLLMSampler
+        from twinkle.sampler import vLLMSampler
         engine_args = {
             'model': model_path,
             'enable_lora': True,
@@ -413,7 +413,7 @@ def train_local():
             'gpu_memory_utilization': 0.5,
             'trust_remote_code': True,
         }
-        sampler = VLLMSampler(
+        sampler = vLLMSampler(
             model_path,
             engine_args=engine_args,
             device_mesh=actor_device_mesh,
 
@@ -8,7 +8,7 @@
 from twinkle.infra import DeviceGroup, remote_function, remote_class
 from twinkle.model import TransformersModel
 from twinkle.reward import MathReward
-from twinkle.sampler import VLLMSampler, TorchSampler
+from twinkle.sampler import vLLMSampler, TorchSampler
 from twinkle.data_format.sampling import SamplingParams, SampleResponse
 from twinkle.weight_loader import NativeLoader
 from twinkle.advantage import compute_advantages
@@ -230,8 +230,8 @@ def __init__(self, engine_args=None, lora_config=None, adapter_name=None, **kwar
             )
         else:
             if engine_args is None:
-                raise ValueError("engine_args is required for VLLMSampler.")
-            self.sampler = VLLMSampler(
+                raise ValueError("engine_args is required for vLLMSampler.")
+            self.sampler = vLLMSampler(
                 model_path,
                 engine_args=engine_args,
                 device_mesh=actor_device_mesh,
 
@@ -12,7 +12,7 @@
 from twinkle.metric import CompletionRewardMetric
 from twinkle.model import TransformersModel
 from twinkle.processor import InputProcessor
-from twinkle.sampler import VLLMSampler
+from twinkle.sampler import vLLMSampler
 from twinkle.template import Template
 from twinkle import torch_util
 
@@ -47,7 +47,7 @@ def main():
     lora_config = LoraConfig(target_modules="all-linear", r=8, lora_alpha=32, lora_dropout=0.05)
     model = TransformersModel(model_id='ms://Qwen/Qwen2.5-3B-Instruct', device_mesh=model_mesh, remote_group='model')
     model.add_adapter_to_model('default', lora_config, gradient_accumulation_steps=4,)
-    sampler = VLLMSampler(
+    sampler = vLLMSampler(
         model_id='ms://Qwen/Qwen2.5-3B-Instruct',
         engine_args={
             'load_format': 'dummy',