显存占用过高

作者您好！

十分感谢你分享的工作，我通过下面命令训练模型

‘’‘
python Train.py \
  --ngpu 1 \
  --train_keys ./data/data_splits/screen_model/train_keys.pkl \
  --val_keys ./data/data_splits/screen_model/val_keys.pkl \
  --test_keys ./data/data_splits/screen_model/test_keys.pkl \
  --epoch 5 \
  --batch_size 1 \
  --test
’‘’

报错如下：
Available GPU List
id	utilization.gpu(%)	memory.free(MiB)
0 	2                 	19680           
Select id #0 for you.
2025-02-24 08:17:42
Number of train data: 310052
Number of val data: 34867
Number of test data: 104206
number of parameters :  565317

  0%|          | 0/310052 [00:00<?, ?it/s]
  0%|          | 1/310052 [00:01<130:24:19,  1.51s/it]
  0%|          | 2/310052 [00:01<61:32:23,  1.40it/s] 
...
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 23.68 GiB total capacity; 17.76 GiB already allocated; 12.88 MiB free; 18.95 GiB allowed; 18.84 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
terminate called without an active exception

我的GPU 是 24 GB 的 3090，batch_size 设置的是 1 ，但是随着时间延长，进度到 3% 左右就会报错显存不够。请问怎么设置可以解决这个问题吗？

十分期待你的回复~~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

显存占用过高 #20

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

显存占用过高 #20

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions