-
Notifications
You must be signed in to change notification settings - Fork 18
Got stuck when evaluating MMLU #1
Copy link
Copy link
Open
Description
Thanks for your open sourcing! i'm trying to evaluate Llama-7b-hf on mmlu-fr, a warning of Token indices sequence length is longer than the specified maximum sequence length for this model (5023 > 4096). Running this sequence through the model will result in indexing errors occurs and it seems the process is stuck. Here is the callstack after keyboard interrupt:
Token indices sequence length is longer than the specified maximum sequence length for this model (5023 > 4096). Running this sequence through the model will result in indexing errors
^CTraceback (most recent call last):
File "/data2/zl/code/mlmm-evaluation/main.py", line 135, in <module>
main()
File "/data2/zl/code/mlmm-evaluation/main.py", line 108, in main
results = evaluator.open_llm_evaluate(
File "/data2/zl/code/mlmm-evaluation/lm_eval/utils.py", line 205, in _wrapper
return fn(*args, **kwargs)
File "/data2/zl/code/mlmm-evaluation/lm_eval/evaluator.py", line 79, in open_llm_evaluate
results = evaluate(
File "/data2/zl/code/mlmm-evaluation/lm_eval/utils.py", line 205, in _wrapper
return fn(*args, **kwargs)
File "/data2/zl/code/mlmm-evaluation/lm_eval/evaluator.py", line 262, in evaluate
resps = getattr(lm, reqtype)([req.args for req in reqs])
File "/data2/zl/code/mlmm-evaluation/lm_eval/base.py", line 181, in loglikelihood
context_enc = self.tok_encode(context)
File "/data2/zl/code/mlmm-evaluation/lm_eval/models/huggingface.py", line 361, in tok_encode
return self.tokenizer.encode(string, add_special_tokens=self.add_special_tokens)
File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2569, in encode
encoded_inputs = self.encode_plus(
File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2977, in encode_plus
return self._encode_plus(
File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 576, in _encode_plus
batched_output = self._batch_encode_plus(
File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 504, in _batch_encode_plus
encodings = self._tokenizer.encode_batch(
KeyboardInterrupt
it seems the process is stuck in the batched tokenizing, how to deal with this?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels