Skip to content

Caution: Please download index cache to data folder: https://drive.google.com/open?id=1cZI562MABLtAzM6YU4WmKPFFguuVr0lZ #47

@Minh-begintolovecoding

Description

@Minh-begintolovecoding

Hi, I had fix the error: ImportError: cannot import name '_new_empty_tensor' from 'torchvision.ops'. Your model is cool and I want to fine-tuning it with my custom dataset. I have import my dataset like this:
TransVG/
├── checkpoints/
├── data/
│ ├── train.pth
│ ├── val.pth
│ ├── test.pth
│ ├── corpus.pth
├── datasets/
│ ├── init.py
│ ├── data_loader.py
│ ├── transforms.py
├── docs/
│ ├── GETTINGS_STARTED.md
│ ├── framework.jpg
├── ln_data/
│ ├── images/
│ ├── annotations/
│ │ ├── annotation_coco.json
│ ├── download_data.sh
├── models/
│ ├── language_model/
│ ├── visual_model/
│ ├── init.py
│ ├── trans_vg.py
│ ├── vl_transformer.py
├── outputs/
├── utils/
│ ├── init.py
│ ├── box_utils.py
│ ├── eval_utils.py
│ ├── loss_utils.py
│ ├── misc.py
│ ├── transforms.py
│ ├── word_utils_utils.py
├── README.md
├── engine.py
├── eval.py
├── requirements.txt
├── test.sh
├── train.py
├── train.sh
The datasets were imported into ln_data folder and I transformed annotation_coco.json into 4 file train.pth, val.pth, test.pth, corpus.pth which were put into data folder. I did change some code in train.py in order to suitable with my dataset.:
def get_args_parser():
parser = argparse.ArgumentParser('Set transformer detector', add_help=False)
parser.add_argument('--lr', default=1e-4, type=float)
parser.add_argument('--lr_bert', default=1e-5, type=float)
parser.add_argument('--lr_visu_cnn', default=1e-5, type=float)
parser.add_argument('--lr_visu_tra', default=1e-5, type=float)
parser.add_argument('--batch_size', default=8, type=int)
parser.add_argument('--weight_decay', default=1e-4, type=float)
parser.add_argument('--epochs', default=90, type=int)
parser.add_argument('--lr_power', default=0.9, type=float, help='lr poly power')
parser.add_argument('--clip_max_norm', default=0., type=float,
help='gradient clipping max norm')
parser.add_argument('--eval', dest='eval', default=False, action='store_true', help='if evaluation only')
parser.add_argument('--optimizer', default='adamw', type=str)
parser.add_argument('--lr_scheduler', default='step', type=str)
parser.add_argument('--lr_drop', default=60, type=int)

# Augmentation options
parser.add_argument('--aug_blur', action='store_true',
                    help="If true, use gaussian blur augmentation")
parser.add_argument('--aug_crop', action='store_true',
                    help="If true, use random crop augmentation")
parser.add_argument('--aug_scale', action='store_true',
                    help="If true, use multi-scale augmentation")
parser.add_argument('--aug_translate', action='store_true',
                    help="If true, use random translate augmentation")

# Model parameters
parser.add_argument('--model_name', type=str, default='TransVG',
                    help="Name of model to be exploited.")

# DETR parameters
# * Backbone
parser.add_argument('--backbone', default='resnet50', type=str,
                    help="Name of the convolutional backbone to use")
parser.add_argument('--dilation', action='store_true',
                    help="If true, we replace stride with dilation in the last convolutional block (DC5)")
parser.add_argument('--position_embedding', default='sine', type=str, choices=('sine', 'learned'), help="Type of positional embedding to use on top of the image features")
# * Transformer
parser.add_argument('--enc_layers', default=6, type=int,
                    help="Number of encoding layers in the transformer")
parser.add_argument('--dec_layers', default=0, type=int,
                    help="Number of decoding layers in the transformer")
parser.add_argument('--dim_feedforward', default=2048, type=int,
                    help="Intermediate size of the feedforward layers in the transformer blocks")
parser.add_argument('--hidden_dim', default=256, type=int,
                    help="Size of the embeddings (dimension of the transformer)")
parser.add_argument('--dropout', default=0.1, type=float,
                    help="Dropout applied in the transformer")
parser.add_argument('--nheads', default=8, type=int,
                    help="Number of attention heads inside the transformer's attentions")
parser.add_argument('--num_queries', default=100, type=int,
                    help="Number of query slots")
parser.add_argument('--pre_norm', action='store_true')

parser.add_argument('--imsize', default=640, type=int, help='image size')
parser.add_argument('--emb_size', default=512, type=int,
                    help='fusion module embedding dimensions')

# Transformers in two branches
parser.add_argument('--bert_enc_num', default=12, type=int)
parser.add_argument('--detr_enc_num', default=6, type=int)

# Vision-Language Transformer
parser.add_argument('--vl_dropout', default=0.1, type=float,
                    help="Dropout applied in the vision-language transformer")
parser.add_argument('--vl_nheads', default=8, type=int,
                    help="Number of attention heads inside the vision-language transformer's attentions")
parser.add_argument('--vl_hidden_dim', default=256, type=int,
                    help='Size of the embeddings (dimension of the vision-language transformer)')
parser.add_argument('--vl_dim_feedforward', default=2048, type=int,
                    help="Intermediate size of the feedforward layers in the vision-language transformer blocks")
parser.add_argument('--vl_enc_layers', default=6, type=int,
                    help='Number of encoders in the vision-language transformer')

# Dataset parameters
parser.add_argument('--data_root', type=str, default='ln_data/',
                    help='path to ReferIt splits data folder')
parser.add_argument('--split_root', type=str, default='data',
                    help='location of pre-parsed dataset info')
parser.add_argument('--dataset', default='referit', type=str,
                    help='referit/unc/unc+/gref/gref_umd')
parser.add_argument('--max_query_len', default=20, type=int,
                    help='maximum time steps (lang length) per batch')

# dataset parameters
parser.add_argument('--output_dir', default='./outputs',
                    help='path where to save, empty for no saving')
parser.add_argument('--device', default='cuda',
                    help='device to use for training / testing')
parser.add_argument('--seed', default=13, type=int)
parser.add_argument('--resume', default='', help='resume from checkpoint')
parser.add_argument('--detr_model', default='./saved_models/detr-r50.pth', type=str, help='detr model')
parser.add_argument('--bert_model', default='bert-base-uncased', type=str, help='bert model')
parser.add_argument('--light', dest='light', default=False, action='store_true', help='if use smaller model')
parser.add_argument('--start_epoch', default=0, type=int, metavar='N',
                    help='start epoch')
parser.add_argument('--num_workers', default=2, type=int)

# distributed training parameters
parser.add_argument('--world_size', default=1, type=int,
                    help='number of distributed processes')
parser.add_argument('--dist_url', default='env://', help='url used to set up distributed training')
return parser

But when I ran this code: !python train.py --epochs 100 --batch_size 16 --lr 5e-5 --output_dir outputs/

It made a caution:
Not using distributed mode
git:
sha: c862427, status: has uncommited changes, branch: main

/usr/local/lib/python3.11/dist-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/usr/local/lib/python3.11/dist-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=None.
warnings.warn(msg)
100% 407873900/407873900 [00:09<00:00, 42600579.56B/s]
/usr/local/lib/python3.11/dist-packages/pytorch_pretrained_bert/modeling.py:603: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state_dict = torch.load(weights_path, map_location='cpu')
number of params: 149523460
100% 231508/231508 [00:00<00:00, 1254137.79B/s]
Please download index cache to data folder:
https://drive.google.com/open?id=1cZI562MABLtAzM6YU4WmKPFFguuVr0lZ

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions