Skip to content

About the training GPU memory requirements #12

@SUSUGIT

Description

@SUSUGIT

Thank you for the excellent work! I encountered an issue during SFT where the GPU memory ran out.
I run the codes on 8 H200 GPUs, with each of 140GB of memory.

TOKENIZERS_PARALLELISM=false torchrun --standalone --nproc_per_node=8 train.py
--deepspeed ./scripts/deepspeed_zero2.json
--output_dir checkpoints/$run_name
--overwrite_output_dir True
--run_name $run_name
--save_on_each_node True
--do_train True
--eval_strategy no
--per_device_train_batch_size 1
--gradient_accumulation_steps 64
--learning_rate $learning_rate
--warmup_ratio 0.03
--optim adamw_torch
--lr_scheduler_type cosine
--num_train_epochs 1
--logging_steps 1000
--save_steps 1000
--bf16 True
--tf32 True
--gradient_checkpointing True
--pretrained_model_name_or_path /data/checkpoints/LiveCC-7B-Instruct
--annotation_paths
datasets/live_whisperx_526k_with_seeks_filtered.jsonl
--dataloader_num_workers 16
--freeze_modules visual
--use_liger_kernel True
--report_to tensorboard

Could you please share what hardware configuration you used for SFT? And how much GPU memory is typically required for SFT?
Look forward to your reply!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions