-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Thanks for sharing this amazing work!!!
Is there any chance you could provide the training logs/ plots?
I am finetuning the model using diffusionvl_qwen_finetune.sh over the same data used, and i am getting huge grad norms:
{'loss': 13.6275, 'grad_norm': 4764.443141419522, 'learning_rate': 0.0, 'epoch': 0.0}
{'loss': 15.5931, 'grad_norm': 4215.6201747259565, 'learning_rate': 3.816793893129771e-08, 'epoch': 0.0}
{'loss': 16.0353, 'grad_norm': 4286.821692714313, 'learning_rate': 7.633587786259542e-08, 'epoch': 0.0}
{'loss': 14.9213, 'grad_norm': 4543.0723907595175, 'learning_rate': 1.1450381679389314e-07, 'epoch': 0.0}
{'loss': 14.1928, 'grad_norm': 4897.973683106089, 'learning_rate': 1.5267175572519085e-07, 'epoch': 0.0}
{'loss': 16.8709, 'grad_norm': 5067.07121542981, 'learning_rate': 1.9083969465648858e-07, 'epoch': 0.0}
{'loss': 13.7367, 'grad_norm': 4019.022290890797, 'learning_rate': 2.2900763358778629e-07, 'epoch': 0.0}
{'loss': 12.8087, 'grad_norm': 4551.950453152943, 'learning_rate': 2.67175572519084e-07, 'epoch': 0.0}
{'loss': 15.1466, 'grad_norm': 3418.496955354792, 'learning_rate': 3.053435114503817e-07, 'epoch': 0.0}
{'loss': 14.1777, 'grad_norm': 3730.573287628722, 'learning_rate': 3.4351145038167945e-07, 'epoch': 0.0}
{'loss': 13.5811, 'grad_norm': 3745.5851850794143, 'learning_rate': 3.8167938931297716e-07, 'epoch': 0.0}
{'loss': 15.4803, 'grad_norm': 4495.710111715893, 'learning_rate': 4.1984732824427486e-07, 'epoch': 0.0}
{'loss': 13.1366, 'grad_norm': 3749.8792776003506, 'learning_rate': 4.5801526717557257e-07, 'epoch': 0.0}
{'loss': 13.9805, 'grad_norm': 3007.6973452817824, 'learning_rate': 4.961832061068702e-07, 'epoch': 0.0}
{'loss': 15.1132, 'grad_norm': 3983.442996557239, 'learning_rate': 5.34351145038168e-07, 'epoch': 0.0}
{'loss': 12.5787, 'grad_norm': 2545.957209183806, 'learning_rate': 5.725190839694656e-07, 'epoch': 0.0}
{'loss': 11.7663, 'grad_norm': 3041.763299868712, 'learning_rate': 6.106870229007634e-07, 'epoch': 0.0}
And i am wondering if this is fine!!
Thanks in advance