Training LoRA using FSDP config

Hi, thanks a lot for this project!

I’m trying to train a LoRA model using your FSDP config (configs/accelerate/fsdp.yaml) with num_processes=4 on a machine with 2 H100 GPUs (80 GB VRAM each).

I launch training with:

```
CUDA_VISIBLE_DEVICES=0,1 \
uv run accelerate launch --config_file configs/accelerate/fsdp.yaml \
  scripts/train.py configs/ltx2_ti2v_lora.yaml
```

During training startup, I get the following error on rank1:

`
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 30.00 MiB. GPU 0 has a total capacity of 79.10 GiB of which 15.19 MiB is free. Process 16746 has 20.84 GiB memory in use. Process 16747 has 20.33 GiB memory in use. Process 16748 has 18.94 GiB memory in use. Process 16749 has 18.94 GiB memory in use. Of the allocated memory 19.60 GiB is allocated by PyTorch, and 225.59 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
`

The error is here:

`
self._cached_validation_embeddings = self._load_text_encoder_and_cache_embeddings()
`

When using num_processes=4, it seems that the text encoder is loaded on GPU 0 by all processes. 

My config:

<img width="505" height="705" alt="Image" src="https://github.com/user-attachments/assets/8c621c79-8a1b-4875-bceb-a2836b7ea3a8" />



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Training LoRA using FSDP config #57

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Training LoRA using FSDP config #57

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions