-
Notifications
You must be signed in to change notification settings - Fork 28
Description
def main(args):
model, model_args = AutoModel.from_pretrained('/data/model/MathGLM-10B', args)
model = model.eval()
Error: No such file or directory: '/data/model/MathGLM-10B/1/mp_rank_00_model_states.pt'
...but there is no such file and directory in the model library you provided.
Thanks.
args:Namespace(num_layers=48, hidden_size=2560, num_attention_heads=40, vocab_size=100, max_sequence_length=512, layernorm_order='pre', inner_hidden_size=None, hidden_size_per_attention_head=None, model_parallel_size=1, skip_init=True, use_gpu_initialization=True, num_multi_query_heads=0, layernorm_epsilon=1e-05, hidden_dropout=0.1, attention_dropout=0.1, drop_path=0.0, make_vocab_size_divisible_by=128, experiment_name='MyModel', train_iters=10000, batch_size=1, lr=0.0001, mode='inference', seed=1234, zero_stage=0, checkpoint_activations=False, checkpoint_num_layers=1, checkpoint_skip_layers=0, fp16=True, bf16=False, gradient_accumulation_steps=1, epochs=None, log_interval=50, summary_dir='', save_args=False, lr_decay_iters=None, lr_decay_style='linear', lr_decay_ratio=0.1, warmup=0.01, weight_decay=0.01, save=None, load=None, save_interval=5000, no_save_rng=False, no_load_rng=False, resume_dataloader=False, distributed_backend='nccl', local_rank=0, exit_interval=None, eval_batch_size=None, eval_iters=100, eval_interval=None, strict_eval=False, train_data=None, train_data_weights=None, iterable_dataset=False, valid_data=None, test_data=None, split='1000,1,1', num_workers=1, block_size=10000, prefetch_factor=4, tokenizer_type='fake', temperature=0.1, top_p=0.0, top_k=200, num_beams=1, length_penalty=0.0, no_repeat_ngram_size=0, min_tgt_length=0, out_seq_length=256, input_source='./input_test.txt', output_path='samples_result', with_id=False, max_inference_batch_size=8, device=0, deepspeed=False, deepspeed_config=None, deepscale=False, deepscale_config=None, deepspeed_mpi=False, cuda=True, rank=0, world_size=1, master_ip='localhost', master_port='43565', do_train=False
log:
[2023-09-23 11:42:52,093] [INFO] [RANK 0] building GLMModel model ...
[2023-09-23 11:42:53,382] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 9879633920
[2023-09-23 11:42:53,407] [INFO] [RANK 0] global rank 0 is loading checkpoint /data/model/MathGLM-10B/1/mp_rank_00_model_states.pt
Traceback (most recent call last):
File "/data/MathGLM/test.py", line 38, in
main(args)
File "/data/MathGLM/test.py", line 9, in main
model, model_args = AutoModel.from_pretrained('/data/model/MathGLM-10B', args)
File "/root/miniconda3/envs/mathglm/lib/python3.9/site-packages/sat/model/base_model.py", line 310, in from_pretrained
return cls.from_pretrained_base(name, args=args, home_path=home_path, url=url, prefix=prefix, build_only=build_only, overwrite_args=overwrite_args, **kwargs)
File "/root/miniconda3/envs/mathglm/lib/python3.9/site-packages/sat/model/base_model.py", line 304, in from_pretrained_base
load_checkpoint(model, args, load_path=model_path, prefix=prefix)
File "/root/miniconda3/envs/mathglm/lib/python3.9/site-packages/sat/training/model_io.py", line 222, in load_checkpoint
sd = torch.load(checkpoint_name, map_location='cpu')
File "/root/miniconda3/envs/mathglm/lib/python3.9/site-packages/torch/serialization.py", line 791, in load
with _open_file_like(f, 'rb') as opened_file:
File "/root/miniconda3/envs/mathglm/lib/python3.9/site-packages/torch/serialization.py", line 271, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/root/miniconda3/envs/mathglm/lib/python3.9/site-packages/torch/serialization.py", line 252, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/data/model/MathGLM-10B/1/mp_rank_00_model_states.pt'