Skip to content

Commit 62e680c

Browse files
committed
fix
1 parent 109cf28 commit 62e680c

File tree

5 files changed

+7
-7
lines changed

5 files changed

+7
-7
lines changed

docs/source_en/Components/Model/TransformersModel.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ class TransformersModel:
1515
ddp_config: Dict[str, Any] = None,
1616
fsdp_config: Dict[str, Any] = None,
1717
grad_scaler_config: Dict[str, Any] = None,
18-
memory_efficient_init: bool = True,
18+
memory_efficient_init: bool = False,
1919
**kwargs):
2020
...
2121

@@ -31,7 +31,7 @@ class TransformersModel:
3131
- ddp_config: DDP configuration when strategy is `accelerate`, see: [DDPKwargs](https://github.com/huggingface/accelerate/blob/main/src/accelerate/utils/dataclasses.py#L155)
3232
- fsdp_config: FSDP configuration when strategy is `accelerate`, see: [FSDPConfig](https://github.com/huggingface/accelerate/blob/main/src/accelerate/utils/dataclasses.py#L1566)
3333
- grad_scaler_config: PyTorch's grad_scaler initialization configuration, see: [PyTorch's GradScaler constructor](https://github.com/pytorch/pytorch/blob/main/torch/cuda/amp/grad_scaler.py#L25)
34-
- memory_efficient_init: Whether to enable memory-efficient model initialization for FSDP. When enabled, only rank 0 loads full weights and broadcasts sharded parameters to other ranks, reducing peak memory usage during initialization. Default `True`. Note: The optimization currently only applies to transformers <= 4.57.x; for transformers >= 5.0.x, it may lead to negative performance impact.
34+
- memory_efficient_init: Whether to enable memory-efficient model initialization for FSDP. When enabled, only rank 0 loads full weights and broadcasts sharded parameters to other ranks, reducing peak memory usage during initialization. Default `False`. Note: The optimization currently only applies to transformers <= 4.57.6; for transformers >= 5.0.0, it may lead to negative performance impact.
3535
- kwargs:
3636
- If you don't want to pass the model config field, you can put scattered configurations here. These parameters will be passed to `from_pretrained` or `from_config` later.
3737

docs/source_zh/组件/模型/TransformersModel.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ class TransformersModel:
1515
ddp_config: Dict[str, Any] = None,
1616
fsdp_config: Dict[str, Any] = None,
1717
grad_scaler_config: Dict[str, Any] = None,
18-
memory_efficient_init: bool = True,
18+
memory_efficient_init: bool = False,
1919
**kwargs):
2020
...
2121

@@ -31,7 +31,7 @@ class TransformersModel:
3131
- ddp_config: strategy为`accelerate`时的DDP配置,参见:[DDPKwargs](https://github.com/huggingface/accelerate/blob/main/src/accelerate/utils/dataclasses.py#L155)
3232
- fsdp_config: strategy为`accelerate`时的FSDP配置,参见:[FSDPConfig](https://github.com/huggingface/accelerate/blob/main/src/accelerate/utils/dataclasses.py#L1566)
3333
- grad_scaler_config: PyTorch的grad_scaler初始化配置,参见:[PyTorch的GradScaler构造](https://github.com/pytorch/pytorch/blob/main/torch/cuda/amp/grad_scaler.py#L25)
34-
- memory_efficient_init: 是否启用FSDP内存高效初始化。启用后仅rank 0加载完整权重,其余rank通过广播获取分片参数,降低初始化阶段的内存和显存峰值。默认`True`。注意:该优化目前仅适用于 transformers <= 4.57.x;对于 transformers >= 5.0.x,可能会导致负面性能影响。
34+
- memory_efficient_init: 是否启用FSDP内存高效初始化。启用后仅rank 0加载完整权重,其余rank通过广播获取分片参数,降低初始化阶段的内存和显存峰值。默认`False`。注意:该优化目前仅适用于 transformers <= 4.57.6;对于 transformers >= 5.0.0,可能会导致负面性能影响。
3535
- kwargs:
3636
- 如果你不希望传递模型config字段,可以把零星的配置从这里放置进去。后续这些参数会传递到`from_pretrained`或者`from_config`中。
3737

src/twinkle/model/transformers/strategy/accelerate.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ def __init__(
2121
mixed_precision: Literal['no', 'fp8', 'fp16', 'bf16'] = 'bf16',
2222
ddp_config: Dict[str, Any] = None,
2323
fsdp_config: Dict[str, Any] = None,
24-
memory_efficient_init: bool = True,
24+
memory_efficient_init: bool = False,
2525
):
2626
from accelerate import Accelerator
2727

src/twinkle/model/transformers/strategy/native_fsdp.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ def __init__(self,
1919
device_mesh: Optional[DeviceMesh] = None,
2020
mixed_precision: Literal['no', 'fp8', 'fp16', 'bf16'] = 'bf16',
2121
fsdp_config: Dict[str, Any] = None,
22-
memory_efficient_init: bool = True,
22+
memory_efficient_init: bool = False,
2323
enable_ep: bool = True,
2424
ep_size: Optional[int] = None):
2525
self.device_mesh = device_mesh

src/twinkle/model/transformers/transformers.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -189,7 +189,7 @@ def __init__(
189189
ddp_config: Dict[str, Any] = None,
190190
fsdp_config: Dict[str, Any] = None,
191191
grad_scaler_config: Dict[str, Any] = None,
192-
memory_efficient_init: bool = True,
192+
memory_efficient_init: bool = False,
193193
**kwargs):
194194
os.environ['TOKENIZERS_PARALLELISM'] = 'true'
195195
self._try_init_process_group()

0 commit comments

Comments
 (0)