Skip to content

Got stuck when evaluating MMLU #1

@zhangliang-04

Description

@zhangliang-04

Thanks for your open sourcing! i'm trying to evaluate Llama-7b-hf on mmlu-fr, a warning of Token indices sequence length is longer than the specified maximum sequence length for this model (5023 > 4096). Running this sequence through the model will result in indexing errors occurs and it seems the process is stuck. Here is the callstack after keyboard interrupt:

Token indices sequence length is longer than the specified maximum sequence length for this model (5023 > 4096). Running this sequence through the model will result in indexing errors
^CTraceback (most recent call last):
  File "/data2/zl/code/mlmm-evaluation/main.py", line 135, in <module>
    main()
  File "/data2/zl/code/mlmm-evaluation/main.py", line 108, in main
    results = evaluator.open_llm_evaluate(
  File "/data2/zl/code/mlmm-evaluation/lm_eval/utils.py", line 205, in _wrapper
    return fn(*args, **kwargs)
  File "/data2/zl/code/mlmm-evaluation/lm_eval/evaluator.py", line 79, in open_llm_evaluate
    results = evaluate(
  File "/data2/zl/code/mlmm-evaluation/lm_eval/utils.py", line 205, in _wrapper
    return fn(*args, **kwargs)
  File "/data2/zl/code/mlmm-evaluation/lm_eval/evaluator.py", line 262, in evaluate
    resps = getattr(lm, reqtype)([req.args for req in reqs])
  File "/data2/zl/code/mlmm-evaluation/lm_eval/base.py", line 181, in loglikelihood
    context_enc = self.tok_encode(context)
  File "/data2/zl/code/mlmm-evaluation/lm_eval/models/huggingface.py", line 361, in tok_encode
    return self.tokenizer.encode(string, add_special_tokens=self.add_special_tokens)
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2569, in encode
    encoded_inputs = self.encode_plus(
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2977, in encode_plus
    return self._encode_plus(
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 576, in _encode_plus
    batched_output = self._batch_encode_plus(
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 504, in _batch_encode_plus
    encodings = self._tokenizer.encode_batch(
KeyboardInterrupt

it seems the process is stuck in the batched tokenizing, how to deal with this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions