-
Notifications
You must be signed in to change notification settings - Fork 29
Open
Description
Many thanks for your contributions for origanizing such a wonderful work! When running the training script, i encounter the following issue. I want to inquire if you have met the same issue? Thanks.
[rank0]: multiprocess.pool.RemoteTraceback:
[rank0]: """
[rank0]: Traceback (most recent call last):
[rank0]: File "/root/anaconda3/envs/llamafactory/lib/python3.11/site-packages/multiprocess/pool.py", line 125, in worker
[rank0]: result = (True, func(*args, **kwds))
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/llamafactory/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 586, in _write_generator_to_queue
[rank0]: for i, result in enumerate(func(**kwargs)):
[rank0]: File "/root/anaconda3/envs/llamafactory/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3689, in _map_single
[rank0]: writer.write_batch(batch, try_original_type=try_original_type)
[rank0]: File "/root/anaconda3/envs/llamafactory/lib/python3.11/site-packages/datasets/arrow_writer.py", line 630, in write_batch
[rank0]: pa_table = pa.Table.from_arrays(arrays, schema=schema)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "pyarrow/table.pxi", line 4851, in pyarrow.lib.Table.from_arrays
[rank0]: File "pyarrow/table.pxi", line 1603, in pyarrow.lib._sanitize_arrays
[rank0]: ValueError: Schema and number of arrays unequal
[rank0]: """
[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/tidal-alsh01/usr/dawo/qinshengqian/c1/OneThinker/LLaMA-Factory/src/llamafactory/data/loader.py", line 316, in _get_preprocessed_dataset
[rank0]: dataset = dataset.map(
[rank0]: ^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/llamafactory/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 560, in wrapper
[rank0]: out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/llamafactory/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3309, in map
[rank0]: for rank, done, content in iflatmap_unordered(
[rank0]: File "/root/anaconda3/envs/llamafactory/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 626, in iflatmap_unordered
[rank0]: [async_result.get(timeout=0.05) for async_result in async_results]
[rank0]: File "/root/anaconda3/envs/llamafactory/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 626, in <listcomp>
[rank0]: [async_result.get(timeout=0.05) for async_result in async_results]
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/llamafactory/lib/python3.11/site-packages/multiprocess/pool.py", line 774, in get
[rank0]: raise self._value
[rank0]: ValueError: Schema and number of arrays unequal
[rank0]: During handling of the above exception, another exception occurred:
[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/tidal-alsh01/usr/dawo/qinshengqian/c1/OneThinker/LLaMA-Factory/src/llamafactory/launcher.py", line 57, in <module>
[rank0]: run_exp()
[rank0]: File "/mnt/tidal-alsh01/usr/dawo/qinshengqian/c1/OneThinker/LLaMA-Factory/src/llamafactory/launcher.py", line 41, in run_exp
[rank0]: return _run_exp() # use absolute import
[rank0]: ^^^^^^^^^^
[rank0]: File "/mnt/tidal-alsh01/usr/dawo/qinshengqian/c1/OneThinker/LLaMA-Factory/src/llamafactory/train/tuner.py", line 110, in run_exp
[rank0]: _training_function(config={"args": args, "callbacks": callbacks})
[rank0]: File "/mnt/tidal-alsh01/usr/dawo/qinshengqian/c1/OneThinker/LLaMA-Factory/src/llamafactory/train/tuner.py", line 72, in _training_function
[rank0]: run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank0]: File "/mnt/tidal-alsh01/usr/dawo/qinshengqian/c1/OneThinker/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 51, in run_sft
[rank0]: dataset_module = get_dataset(template, model_args, data_args, training_args, stage="sft", **tokenizer_module)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/tidal-alsh01/usr/dawo/qinshengqian/c1/OneThinker/LLaMA-Factory/src/llamafactory/data/loader.py", line 391, in get_dataset
[rank0]: dataset = _get_preprocessed_dataset(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/tidal-alsh01/usr/dawo/qinshengqian/c1/OneThinker/LLaMA-Factory/src/llamafactory/data/loader.py", line 327, in _get_preprocessed_dataset
[rank0]: raise ValueError(f"e : {e}")
[rank0]: ValueError: e : Schema and number of arrays unequal
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels