https://github.com/ModelsLab/diffusers_plus_plus/blob/d1fd977c5ac0cef08c6f965213c20f4460f6c37a/examples/dreambooth/README_flux.md?plain=1#L226C40-L226C57 Also the training can't fit on one H100 with 95 GB of memory. Maybe there are some optimizations we can make?