-
Notifications
You must be signed in to change notification settings - Fork 289
Description
~/# accelerate launch train_stage_2.py --config configs/train/stage2.yaml
The following values were not passed to accelerate launch and had defaults used instead:
--num_processes was set to a value of 1
--num_machines was set to a value of 1
--mixed_precision was set to a value of 'no'
--dynamo_backend was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
10/12/2024 09:16:18 - INFO - main - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: fp16
{'scaling_factor', 'force_upcast'} was not found in config. Values will be initialized to default values.
{'addition_time_embed_dim', 'time_embedding_type', 'num_class_embeds', 'encoder_hid_dim', 'encoder_hid_dim_type', 'addition_embed_type_num_heads', 'addition_embed_type', 'dual_cross_attention', 'dropout', 'resnet_out_scale_factor', 'attention_type', 'reverse_transformer_layers_per_block', 'projection_class_embeddings_input_dim', 'mid_block_type', 'conv_out_kernel', 'resnet_skip_time_act', 'use_linear_projection', 'class_embeddings_concat', 'time_embedding_dim', 'timestep_post_act', 'resnet_time_scale_shift', 'only_cross_attention', 'transformer_layers_per_block', 'class_embed_type', 'conv_in_kernel', 'time_cond_proj_dim', 'time_embedding_act_fn', 'mid_block_only_cross_attention', 'num_attention_heads', 'upcast_attention', 'cross_attention_norm'} was not found in config. Values will be initialized to default values.
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
['conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
10/12/2024 09:16:25 - INFO - src.models.unet_3d - loaded temporal unet's pretrained weights from pretrained_weights/stable-diffusion-v1-5/unet ...
{'dual_cross_attention', 'use_linear_projection', 'num_class_embeds', 'upcast_attention', 'mode', 'task_type', 'resnet_time_scale_shift', 'only_cross_attention', 'class_embed_type'} was not found in config. Values will be initialized to default values.
10/12/2024 09:16:38 - INFO - src.models.unet_3d - Load motion module params from pretrained_weights/mm_sd_v15_v2.ckpt
10/12/2024 09:16:39 - INFO - src.models.unet_3d - Loaded 453.20928M-parameter motion module
10/12/2024 09:16:44 - INFO - main - Total trainable params 546
10/12/2024 09:16:45 - INFO - main - ***** Running training *****
10/12/2024 09:16:45 - INFO - main - Num examples = 7755
10/12/2024 09:16:45 - INFO - main - Num Epochs = 2
10/12/2024 09:16:45 - INFO - main - Instantaneous batch size per device = 1
10/12/2024 09:16:45 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 1
10/12/2024 09:16:45 - INFO - main - Gradient Accumulation steps = 1
10/12/2024 09:16:45 - INFO - main - Total optimization steps = 10000
Steps: 0%| | 0/10000 [00:00<?, ?it/s]10/12/2024 09:16:50 - INFO - src.models.unet_3d - Forward upsample size to force interpolation output size.
Traceback (most recent call last):
File "/root/MusePose/train_stage_2.py", line 773, in
main(config)
File "/root/MusePose/train_stage_2.py", line 602, in main
model_pred = net(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/accelerate/utils/operations.py", line 825, in forward
return model_forward(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/accelerate/utils/operations.py", line 813, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/root/miniconda3/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/root/MusePose/train_stage_2.py", line 96, in forward
model_pred = self.denoising_unet(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, **kwargs)
File "/root//src/models/unet_3d.py", line 505, in forward
sample = sample + pose_cond_fea
RuntimeError: The size of tensor a (22) must match the size of tensor b (23) at non-singleton dimension 3
Steps: 0%| | 0/10000 [00:06<?, ?it/s]
Traceback (most recent call last):
File "/root/miniconda3/bin/accelerate", line 8, in
sys.exit(main())
File "/root/miniconda3/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
args.func(args)
File "/root/miniconda3/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1075, in launch_command
simple_launcher(args)
File "/root/miniconda3/lib/python3.10/site-packages/accelerate/commands/launch.py", line 681, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/bin/python', 'train_stage_2.py', '--config', 'configs/train/stage2.yaml']' returned non-zero exit status 1.