Training Instability and Performance Issues on RGB-stacking with Reduced Hardware Constraints

Thank you for your time and for this great project!

I am attempting to train/fine-tune CoTracker3Online on the RGB-stacking dataset (and Kubric) using a single node with 8x A100 (40GB). However, I am experiencing significant training fluctuations and the final tracking performance is suboptimal compared to the expected results.

I am using the following command:
`python train_on_kubric.py --batch_size 1 --num_steps 200000 \
 --ckpt_path ./ --model_name cotracker_three --save_freq 200 --sequence_len 32 \
 --eval_datasets tapvid_davis_first tapvid_stacking --traj_per_sample 384 \
 --sliding_window_len 16 --train_datasets kubric --save_every_n_epoch 1 \
 --evaluate_every_n_epoch 1 --model_stride 4 --dataset_root ${path_to_your_dataset} \
 --num_nodes 1 --num_virtual_tracks 64 --mixed_precision --corr_radius 3 \
 --wdecay 0.0005 --linear_layer_for_vis_conf --validate_at_start --add_huber_loss`




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Instability and Performance Issues on RGB-stacking with Reduced Hardware Constraints #178

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Training Instability and Performance Issues on RGB-stacking with Reduced Hardware Constraints #178

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions