-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Description
when i run, i got "RuntimeError: CUDA out of memory."i don't know how to modify.
gpu0: 12G; gpu1: 12G
my command is "bash experiments/scripts/train_faster_rcnn.sh 0 pascal_voc vgg16" or
"bash experiments/scripts/train_faster_rcnn.sh 0,1 pascal_voc vgg16". The two commands both caused "RuntimeError: CUDA out of memory"
And i modified vgg16.yml. TRAIN.BATCH_SIZE : 256 --> 2
Running Logs:
- set -e
- export PYTHONUNBUFFERED=True
- PYTHONUNBUFFERED=True
- GPU_ID=0
- DATASET=pascal_voc
- NET=vgg16
- array=($@)
- len=3
- EXTRA_ARGS=
- EXTRA_ARGS_SLUG=
- case ${DATASET} in
- TRAIN_IMDB=voc_2007_trainval
- TEST_IMDB=voc_2007_test
- STEPSIZE='[50000]'
- ITERS=100000
- ANCHORS='[8,16,32]'
- RATIOS='[0.5,1,2]'
++ date +%Y-%m-%d_%H-%M-%S - LOG=experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2021-09-11_15-11-21
- exec
++ tee -a experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2021-09-11_15-11-21
tee: experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2021-09-11_15-11-21: No such file or directory - echo Logging output to experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2021-09-11_15-11-21
Logging output to experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2021-09-11_15-11-21 - set +x
- '[' '!' -f output/vgg16/voc_2007_trainval/default/vgg16_MELM_iter_100000.pth.index ']'
- [[ ! -z '' ]]
- CUDA_VISIBLE_DEVICES=0
- python ./tools/trainval_net.py --weight data/imagenet_weights/vgg16.pth --imdb voc_2007_trainval --imdbval voc_2007_test --iters 100000 --cfg experiments/cfgs/vgg16.yml --net vgg16 --set ANCHOR_SCALES '[8,16,32]' ANCHOR_RATIOS '[0.5,1,2]' TRAIN.STEPSIZE '[50000]'
Called with args:
Namespace(cfg_file='experiments/cfgs/vgg16.yml', imdb_name='voc_2007_trainval', imdbval_name='voc_2007_test', max_iters=100000, net='vgg16', set_cfgs=['ANCHOR_SCALES', '[8,16,32]', 'ANCHOR_RATIOS', '[0.5,1,2]', 'TRAIN.STEPSIZE', '[50000]'], tag=None, weight='data/imagenet_weights/vgg16.pth')
/media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/tools/../lib/model/config.py:369: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
yaml_cfg = edict(yaml.load(f))
Loaded datasetvoc_2007_trainvalfor training
Set proposal method: selective_search
Appending horizontally-flipped training examples...
voc_2007_trainval ss roidb loaded from /media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/data/cache/voc_2007_trainval_selective_search_roidb.pkl
done
Preparing training data...
done
10022 roidb entries
Output will be saved to/media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/output/vgg16_MELM/voc_2007_trainval/default
TensorFlow summaries will be saved to/media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/tensorboard/vgg16_MELM/voc_2007_trainval/default
Loaded datasetvoc_2007_testfor training
Set proposal method: selective_search
Preparing training data...
voc_2007_test ss roidb loaded from /media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/data/cache/voc_2007_test_selective_search_roidb.pkl
done
4952 validation roidb entries
Filtered 0 roidb entries: 10022 -> 10022
Filtered 0 roidb entries: 4952 -> 4952
Solving...
Loading initial model weights from data/imagenet_weights/vgg16.pth
Loaded.
Traceback (most recent call last):
File "./tools/trainval_net.py", line 135, in
max_iters=args.max_iters)
File "/media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/tools/../lib/model/train_val.py", line 377, in train_net
sw.train_model(max_iters)
File "/media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/tools/../lib/model/train_val.py", line 291, in train_model
cls_det_loss, refine_loss_1, refine_loss_2, consistency_loss, total_loss = self.net.train_step(blobs,self.optimizer,iter)
File "/media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/tools/../lib/nets/network.py", line 634, in train_step
self.forward(blobs['data'], blobs['image_level_labels'], blobs['im_info'], blobs['gt_boxes'], blobs['ss_boxes'], step)
File "/media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/tools/../lib/nets/network.py", line 562, in forward
roi_labels_1, keep_inds_1, roi_labels_2, keep_inds_2, bbox_pred, rois = self._predict_train(ss_boxes_all, step)
File "/media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/tools/../lib/nets/network.py", line 508, in _predict_train
roi_labels_2, keep_inds_2, bbox_pred = self._region_classification_train(pool5_roi, fc7_roi,fc7_context, fc7_frame, step)
File "/media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/tools/../lib/nets/network.py", line 398, in _region_classification_train
mask_1 = self._inverted_attention(bbox_feats_new, gt, keep_inds_1_new, 1, step, fg_num_1_new, bg_num_1_new)
File "/media/omnisky/28db8425-dc36-4700-92ef-0dd7e98ccd67/djt/CASD/tools/../lib/nets/network.py", line 147, in _inverted_attention
pooled_feat_before_after = torch.cat((bbox_feats_new, bbox_feats_new * mask_all), dim=0)
RuntimeError: CUDA out of memory. Tried to allocate 766.00 MiB (GPU 0; 11.91 GiB total capacity; 9.59 GiB already allocated; 99.19 MiB free; 1.49 GiB cached)
I would appreciate it if you could help me.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels