Got stuck when evaluating MMLU

Thanks for your open sourcing! i'm trying to evaluate `Llama-7b-hf` on `mmlu-fr`, a warning of `Token indices sequence length is longer than the specified maximum sequence length for this model (5023 > 4096). Running this sequence through the model will result in indexing errors` occurs and it seems the process is stuck. Here is the callstack after keyboard interrupt:
```
Token indices sequence length is longer than the specified maximum sequence length for this model (5023 > 4096). Running this sequence through the model will result in indexing errors
^CTraceback (most recent call last):
  File "/data2/zl/code/mlmm-evaluation/main.py", line 135, in <module>
    main()
  File "/data2/zl/code/mlmm-evaluation/main.py", line 108, in main
    results = evaluator.open_llm_evaluate(
  File "/data2/zl/code/mlmm-evaluation/lm_eval/utils.py", line 205, in _wrapper
    return fn(*args, **kwargs)
  File "/data2/zl/code/mlmm-evaluation/lm_eval/evaluator.py", line 79, in open_llm_evaluate
    results = evaluate(
  File "/data2/zl/code/mlmm-evaluation/lm_eval/utils.py", line 205, in _wrapper
    return fn(*args, **kwargs)
  File "/data2/zl/code/mlmm-evaluation/lm_eval/evaluator.py", line 262, in evaluate
    resps = getattr(lm, reqtype)([req.args for req in reqs])
  File "/data2/zl/code/mlmm-evaluation/lm_eval/base.py", line 181, in loglikelihood
    context_enc = self.tok_encode(context)
  File "/data2/zl/code/mlmm-evaluation/lm_eval/models/huggingface.py", line 361, in tok_encode
    return self.tokenizer.encode(string, add_special_tokens=self.add_special_tokens)
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2569, in encode
    encoded_inputs = self.encode_plus(
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2977, in encode_plus
    return self._encode_plus(
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 576, in _encode_plus
    batched_output = self._batch_encode_plus(
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 504, in _batch_encode_plus
    encodings = self._tokenizer.encode_batch(
KeyboardInterrupt

```
it seems the process is stuck in the batched tokenizing, how to deal with this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Got stuck when evaluating MMLU #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Got stuck when evaluating MMLU #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions