Skip to content

ValueError: Schema and number of arrays unequal during training #10

@hulianyuyy

Description

@hulianyuyy

Many thanks for your contributions for origanizing such a wonderful work! When running the training script, i encounter the following issue. I want to inquire if you have met the same issue? Thanks.

[rank0]: multiprocess.pool.RemoteTraceback:                                                                                                                                  
[rank0]: """                                                                                                                                                                 
[rank0]: Traceback (most recent call last):                                                                                                                                  
[rank0]:   File "/root/anaconda3/envs/llamafactory/lib/python3.11/site-packages/multiprocess/pool.py", line 125, in worker                                                   
[rank0]:     result = (True, func(*args, **kwds))                                                                                                                            
[rank0]:                     ^^^^^^^^^^^^^^^^^^^                                                                                                                             
[rank0]:   File "/root/anaconda3/envs/llamafactory/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 586, in _write_generator_to_queue                          
[rank0]:     for i, result in enumerate(func(**kwargs)):                                                                                                                     
[rank0]:   File "/root/anaconda3/envs/llamafactory/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3689, in _map_single                                        
[rank0]:     writer.write_batch(batch, try_original_type=try_original_type)                                                                                                  
[rank0]:   File "/root/anaconda3/envs/llamafactory/lib/python3.11/site-packages/datasets/arrow_writer.py", line 630, in write_batch                                          
[rank0]:     pa_table = pa.Table.from_arrays(arrays, schema=schema)                                                                                                          
[rank0]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                          
[rank0]:   File "pyarrow/table.pxi", line 4851, in pyarrow.lib.Table.from_arrays                                                                                             
[rank0]:   File "pyarrow/table.pxi", line 1603, in pyarrow.lib._sanitize_arrays                                                                                              
[rank0]: ValueError: Schema and number of arrays unequal                                                                                                                     
[rank0]: """                                                                                                                                                                 
                                                      
[rank0]: Traceback (most recent call last):                                                                                                                                  
[rank0]:   File "/mnt/tidal-alsh01/usr/dawo/qinshengqian/c1/OneThinker/LLaMA-Factory/src/llamafactory/data/loader.py", line 316, in _get_preprocessed_dataset                
[rank0]:     dataset = dataset.map(                                                                                                                                          
[rank0]:               ^^^^^^^^^^^^                                                                                                                                          
[rank0]:   File "/root/anaconda3/envs/llamafactory/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 560, in wrapper                                             
[rank0]:     out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)                                                                                              
[rank0]:                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                              
[rank0]:   File "/root/anaconda3/envs/llamafactory/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3309, in map                                                
[rank0]:     for rank, done, content in iflatmap_unordered(                                                                                                                  
[rank0]:   File "/root/anaconda3/envs/llamafactory/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 626, in iflatmap_unordered                                 
[rank0]:     [async_result.get(timeout=0.05) for async_result in async_results]                                                                                              
[rank0]:   File "/root/anaconda3/envs/llamafactory/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 626, in <listcomp>                                         
[rank0]:     [async_result.get(timeout=0.05) for async_result in async_results]                                                                                              
[rank0]:      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                 
[rank0]:   File "/root/anaconda3/envs/llamafactory/lib/python3.11/site-packages/multiprocess/pool.py", line 774, in get                                                      
[rank0]:     raise self._value                                                                                                                                               
[rank0]: ValueError: Schema and number of arrays unequal                                                                                                                     
                                                                                                                                                                             
[rank0]: During handling of the above exception, another exception occurred:                                                                                                 
                                                                                                                                                                             
[rank0]: Traceback (most recent call last):                                                                                                                                  
[rank0]:   File "/mnt/tidal-alsh01/usr/dawo/qinshengqian/c1/OneThinker/LLaMA-Factory/src/llamafactory/launcher.py", line 57, in <module>                                     
[rank0]:     run_exp()                                                                                                                                                       
[rank0]:   File "/mnt/tidal-alsh01/usr/dawo/qinshengqian/c1/OneThinker/LLaMA-Factory/src/llamafactory/launcher.py", line 41, in run_exp                                      
[rank0]:     return _run_exp()  # use absolute import                                                                                                                        
[rank0]:            ^^^^^^^^^^                                                                                                                                               
[rank0]:   File "/mnt/tidal-alsh01/usr/dawo/qinshengqian/c1/OneThinker/LLaMA-Factory/src/llamafactory/train/tuner.py", line 110, in run_exp                                  
[rank0]:     _training_function(config={"args": args, "callbacks": callbacks})                                                                                               
[rank0]:   File "/mnt/tidal-alsh01/usr/dawo/qinshengqian/c1/OneThinker/LLaMA-Factory/src/llamafactory/train/tuner.py", line 72, in _training_function                        
[rank0]:     run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)                                                                      
[rank0]:   File "/mnt/tidal-alsh01/usr/dawo/qinshengqian/c1/OneThinker/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 51, in run_sft                            
[rank0]:     dataset_module = get_dataset(template, model_args, data_args, training_args, stage="sft", **tokenizer_module)                                                   
[rank0]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                   
[rank0]:   File "/mnt/tidal-alsh01/usr/dawo/qinshengqian/c1/OneThinker/LLaMA-Factory/src/llamafactory/data/loader.py", line 391, in get_dataset                              
[rank0]:     dataset = _get_preprocessed_dataset(                                                                                                                            
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                            
[rank0]:   File "/mnt/tidal-alsh01/usr/dawo/qinshengqian/c1/OneThinker/LLaMA-Factory/src/llamafactory/data/loader.py", line 327, in _get_preprocessed_dataset                
[rank0]:     raise ValueError(f"e : {e}")
[rank0]: ValueError: e : Schema and number of arrays unequal

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions