Skip to content

Using sat framework inference mathglm-10B to report an error #5

@coldwater2000

Description

@coldwater2000
def main(args):
    model, model_args = AutoModel.from_pretrained('/data/model/MathGLM-10B', args)
    model = model.eval()

Error: No such file or directory: '/data/model/MathGLM-10B/1/mp_rank_00_model_states.pt'
...but there is no such file and directory in the model library you provided.

Thanks.

args:Namespace(num_layers=48, hidden_size=2560, num_attention_heads=40, vocab_size=100, max_sequence_length=512, layernorm_order='pre', inner_hidden_size=None, hidden_size_per_attention_head=None, model_parallel_size=1, skip_init=True, use_gpu_initialization=True, num_multi_query_heads=0, layernorm_epsilon=1e-05, hidden_dropout=0.1, attention_dropout=0.1, drop_path=0.0, make_vocab_size_divisible_by=128, experiment_name='MyModel', train_iters=10000, batch_size=1, lr=0.0001, mode='inference', seed=1234, zero_stage=0, checkpoint_activations=False, checkpoint_num_layers=1, checkpoint_skip_layers=0, fp16=True, bf16=False, gradient_accumulation_steps=1, epochs=None, log_interval=50, summary_dir='', save_args=False, lr_decay_iters=None, lr_decay_style='linear', lr_decay_ratio=0.1, warmup=0.01, weight_decay=0.01, save=None, load=None, save_interval=5000, no_save_rng=False, no_load_rng=False, resume_dataloader=False, distributed_backend='nccl', local_rank=0, exit_interval=None, eval_batch_size=None, eval_iters=100, eval_interval=None, strict_eval=False, train_data=None, train_data_weights=None, iterable_dataset=False, valid_data=None, test_data=None, split='1000,1,1', num_workers=1, block_size=10000, prefetch_factor=4, tokenizer_type='fake', temperature=0.1, top_p=0.0, top_k=200, num_beams=1, length_penalty=0.0, no_repeat_ngram_size=0, min_tgt_length=0, out_seq_length=256, input_source='./input_test.txt', output_path='samples_result', with_id=False, max_inference_batch_size=8, device=0, deepspeed=False, deepspeed_config=None, deepscale=False, deepscale_config=None, deepspeed_mpi=False, cuda=True, rank=0, world_size=1, master_ip='localhost', master_port='43565', do_train=False

log:

[2023-09-23 11:42:52,093] [INFO] [RANK 0] building GLMModel model ...
[2023-09-23 11:42:53,382] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 9879633920
[2023-09-23 11:42:53,407] [INFO] [RANK 0] global rank 0 is loading checkpoint /data/model/MathGLM-10B/1/mp_rank_00_model_states.pt
Traceback (most recent call last):
File "/data/MathGLM/test.py", line 38, in
main(args)
File "/data/MathGLM/test.py", line 9, in main
model, model_args = AutoModel.from_pretrained('/data/model/MathGLM-10B', args)
File "/root/miniconda3/envs/mathglm/lib/python3.9/site-packages/sat/model/base_model.py", line 310, in from_pretrained
return cls.from_pretrained_base(name, args=args, home_path=home_path, url=url, prefix=prefix, build_only=build_only, overwrite_args=overwrite_args, **kwargs)
File "/root/miniconda3/envs/mathglm/lib/python3.9/site-packages/sat/model/base_model.py", line 304, in from_pretrained_base
load_checkpoint(model, args, load_path=model_path, prefix=prefix)
File "/root/miniconda3/envs/mathglm/lib/python3.9/site-packages/sat/training/model_io.py", line 222, in load_checkpoint
sd = torch.load(checkpoint_name, map_location='cpu')
File "/root/miniconda3/envs/mathglm/lib/python3.9/site-packages/torch/serialization.py", line 791, in load
with _open_file_like(f, 'rb') as opened_file:
File "/root/miniconda3/envs/mathglm/lib/python3.9/site-packages/torch/serialization.py", line 271, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/root/miniconda3/envs/mathglm/lib/python3.9/site-packages/torch/serialization.py", line 252, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/data/model/MathGLM-10B/1/mp_rank_00_model_states.pt'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions