Skip to content

OSError: ./data/weneed/mask_r50/epoch_12.pth is not a checkpoint file #19

@hhhhhhxl

Description

@hhhhhhxl

I followed the steps and tried to train with 2 GPUs and got this error.
./tools/dist_train.sh ./configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py 2


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


[07/09 11:22:20] root WARNING: The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

Train fc_cls only.
--Dist-train--IS:False--ISout:False
Dist-train --- Not using image sampling.
Train fc_cls only.
--Dist-train--IS:False--ISout:False
Dist-train --- Not using image sampling.
[07/09 11:22:39] mmcv.runner.runner INFO: load checkpoint from ./data/weneed/mask_r50/epoch_12.pth
Traceback (most recent call last):
File "./tools/train.py", line 169, in
main()
File "./tools/train.py", line 165, in main
logger=logger)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 58, in train_detector
_dist_train(model, dataset, cfg, validate=validate)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 204, in _dist_train
runner.load_checkpoint(cfg.load_from)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/runner.py", line 234, in load_checkpoint
self.logger)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 171, in load_checkpoint
raise IOError('{} is not a checkpoint file'.format(filename))
OSError: ./data/weneed/mask_r50/epoch_12.pth is not a checkpoint file
[07/09 11:22:39] mmcv.runner.runner INFO: load checkpoint from ./data/weneed/mask_r50/epoch_12.pth
Traceback (most recent call last):
File "./tools/train.py", line 169, in
main()
File "./tools/train.py", line 165, in main
logger=logger)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 58, in train_detector
_dist_train(model, dataset, cfg, validate=validate)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 204, in _dist_train
runner.load_checkpoint(cfg.load_from)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/runner.py", line 234, in load_checkpoint
self.logger)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 171, in load_checkpoint
raise IOError('{} is not a checkpoint file'.format(filename))
OSError: ./data/weneed/mask_r50/epoch_12.pth is not a checkpoint file
Traceback (most recent call last):
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/distributed/launch.py", line 253, in
main()
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/distributed/launch.py", line 249, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/sy/anaconda3/envs/mmdet/bin/python', '-u', './tools/train.py', '--local_rank=1', './configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions