Skip to content

Commit 109cf28

Browse files
committed
doc
1 parent 5fbd998 commit 109cf28

File tree

2 files changed

+4
-0
lines changed

2 files changed

+4
-0
lines changed

docs/source_en/Components/Model/TransformersModel.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ class TransformersModel:
1515
ddp_config: Dict[str, Any] = None,
1616
fsdp_config: Dict[str, Any] = None,
1717
grad_scaler_config: Dict[str, Any] = None,
18+
memory_efficient_init: bool = True,
1819
**kwargs):
1920
...
2021

@@ -30,6 +31,7 @@ class TransformersModel:
3031
- ddp_config: DDP configuration when strategy is `accelerate`, see: [DDPKwargs](https://github.com/huggingface/accelerate/blob/main/src/accelerate/utils/dataclasses.py#L155)
3132
- fsdp_config: FSDP configuration when strategy is `accelerate`, see: [FSDPConfig](https://github.com/huggingface/accelerate/blob/main/src/accelerate/utils/dataclasses.py#L1566)
3233
- grad_scaler_config: PyTorch's grad_scaler initialization configuration, see: [PyTorch's GradScaler constructor](https://github.com/pytorch/pytorch/blob/main/torch/cuda/amp/grad_scaler.py#L25)
34+
- memory_efficient_init: Whether to enable memory-efficient model initialization for FSDP. When enabled, only rank 0 loads full weights and broadcasts sharded parameters to other ranks, reducing peak memory usage during initialization. Default `True`. Note: The optimization currently only applies to transformers <= 4.57.x; for transformers >= 5.0.x, it may lead to negative performance impact.
3335
- kwargs:
3436
- If you don't want to pass the model config field, you can put scattered configurations here. These parameters will be passed to `from_pretrained` or `from_config` later.
3537

docs/source_zh/组件/模型/TransformersModel.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ class TransformersModel:
1515
ddp_config: Dict[str, Any] = None,
1616
fsdp_config: Dict[str, Any] = None,
1717
grad_scaler_config: Dict[str, Any] = None,
18+
memory_efficient_init: bool = True,
1819
**kwargs):
1920
...
2021

@@ -30,6 +31,7 @@ class TransformersModel:
3031
- ddp_config: strategy为`accelerate`时的DDP配置,参见:[DDPKwargs](https://github.com/huggingface/accelerate/blob/main/src/accelerate/utils/dataclasses.py#L155)
3132
- fsdp_config: strategy为`accelerate`时的FSDP配置,参见:[FSDPConfig](https://github.com/huggingface/accelerate/blob/main/src/accelerate/utils/dataclasses.py#L1566)
3233
- grad_scaler_config: PyTorch的grad_scaler初始化配置,参见:[PyTorch的GradScaler构造](https://github.com/pytorch/pytorch/blob/main/torch/cuda/amp/grad_scaler.py#L25)
34+
- memory_efficient_init: 是否启用FSDP内存高效初始化。启用后仅rank 0加载完整权重,其余rank通过广播获取分片参数,降低初始化阶段的内存和显存峰值。默认`True`。注意:该优化目前仅适用于 transformers <= 4.57.x;对于 transformers >= 5.0.x,可能会导致负面性能影响。
3335
- kwargs:
3436
- 如果你不希望传递模型config字段,可以把零星的配置从这里放置进去。后续这些参数会传递到`from_pretrained`或者`from_config`中。
3537

0 commit comments

Comments
 (0)