Skip to content

Data preprocessing ValueError: Unable to synchronously create group (name already exists) #18

@rvandewater

Description

@rvandewater

Hi,
I am experiencing this error for some tasks only. The other tasks were preprocessed correctly. It seems it is trying to write an h5 file synchronously twice.

/sc/arion/projects/hpims-hpi/projects/foundation_models_ehr/cohorts/meds_debug/full_omop_25_04_29/MEDS_cohort/data/train/136.parquet:   0%|   | 0/683 [06:29<?, ?it/s]                                                              
multiprocessing.pool.RemoteTraceback:                                                                                                                
"""                                                                       
Traceback (most recent call last):                                                                                                                                                                                                  
  File "/sc/arion/work/vander09/conda/envs/genhpf/lib/python3.11/multiprocessing/pool.py", line 125, in worker                                                                                                                                                                                                                          
    result = (True, func(*args, **kwds))                                                                                                                                                                                            
                    ^^^^^^^^^^^^^^^^^^^                                                                                                              
  File "/sc/arion/work/vander09/conda/envs/genhpf/lib/python3.11/multiprocessing/pool.py", line 48, in mapstar                                                                                                                      
    return list(map(*args))                                               
           ^^^^^^^^^^^^^^^^                                                                                                                                                                                                         
  File "/sc/arion/work/vander09/conda/envs/genhpf/lib/python3.11/site-packages/genhpf/scripts/preprocess/preprocess_meds.py", line 663, in meds_to_remed                                                                                                                                                                                
    sample_result = result.create_group(sample[0])                                                                                                                                                                                  
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                   
  File "/sc/arion/work/vander09/conda/envs/genhpf/lib/python3.11/site-packages/h5py/_hl/group.py", line 71, in create_group                                                                                                                                                                                                             
    gid = h5g.create(self.id, name, lcpl=lcpl, gcpl=gcpl)                                                                                                                                                                           
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                           
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper                                                                                             
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper                                                                                             
  File "h5py/h5g.pyx", line 173, in h5py.h5g.create                                                                                                                 
ValueError: Unable to synchronously create group (name already exists)                                                                                              
"""                                                                               

The above exception was the direct cause of the following exception:                                                                                                

Traceback (most recent call last):                                                
  File "/sc/arion/work/vander09/conda/envs/genhpf/bin/genhpf-preprocess-meds", line 7, in <module>                                                                                                                                                                                                                                      
    sys.exit(main())                                                              
             ^^^^^^                                                               
  File "/sc/arion/work/vander09/conda/envs/genhpf/lib/python3.11/site-packages/genhpf/scripts/preprocess/preprocess_meds.py", line 397, in main                                                                                                                                                                                         
    length_per_subject_gathered = pool.map(meds_to_remed_partial, data_chunks)                                                                                      
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                      
  File "/sc/arion/work/vander09/conda/envs/genhpf/lib/python3.11/multiprocessing/pool.py", line 367, in map                                                                                                                                                                                                                             
    return self._map_async(func, iterable, mapstar, chunksize).get()                                                                                                
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                
  File "/sc/arion/work/vander09/conda/envs/genhpf/lib/python3.11/multiprocessing/pool.py", line 774, in get                                                                                                                                                                                                                             
    raise self._value                                                             
ValueError: Unable to synchronously create group (name already exists)                                                                                              
Error: genhpf-preprocess-meds failed for task readmission/general_hospital/30d in dataset MSHS. Skipping to the next task.                                                                                                                                                                                                              

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions