Skip to content

显存占用过高 #20

@Bruce410526

Description

@Bruce410526

作者您好!

十分感谢你分享的工作,我通过下面命令训练模型

‘’‘
python Train.py
--ngpu 1
--train_keys ./data/data_splits/screen_model/train_keys.pkl
--val_keys ./data/data_splits/screen_model/val_keys.pkl
--test_keys ./data/data_splits/screen_model/test_keys.pkl
--epoch 5
--batch_size 1
--test
’‘’

报错如下:
Available GPU List
id utilization.gpu(%) memory.free(MiB)
0 2 19680
Select id #0 for you.
2025-02-24 08:17:42
Number of train data: 310052
Number of val data: 34867
Number of test data: 104206
number of parameters : 565317

0%| | 0/310052 [00:00<?, ?it/s]
0%| | 1/310052 [00:01<130:24:19, 1.51s/it]
0%| | 2/310052 [00:01<61:32:23, 1.40it/s]
...
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 23.68 GiB total capacity; 17.76 GiB already allocated; 12.88 MiB free; 18.95 GiB allowed; 18.84 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
terminate called without an active exception

我的GPU 是 24 GB 的 3090,batch_size 设置的是 1 ,但是随着时间延长,进度到 3% 左右就会报错显存不够。请问怎么设置可以解决这个问题吗?

十分期待你的回复~~

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions