Skip to content

Support reading large variables (> INT_MAX per MPI process) with PnetCDF #668

@jayeshkrishna

Description

@jayeshkrishna

Reading large variables where the size of data in each MPI process exceeds INT_MAX fails with PnetCDF.

@bishtgautam reported issues reading large variables in a standalone program reading large land datasets (mksrf_soitex_0.0083x0.0083deg.10level.c250603.cdf5.nc : the variable PCT_SAND being read is a 2D var with dimensions 213,085,469 x 10). In this run pio_read_darray() fails with the following error message (" Size of I/O request exceeds INT_MAX")

Attempting to make %sand and %clay .....
 Open soil texture file: /pscratch/sd/g/gbisht/mksrf_soitex_0.0083x0.0083deg.10level.c250603.cdf5.nc
 domain_read_dims_2d_pio read lon and lat dims from lon/lat
 domain_read_dims_2d_pio nlon        43200 nlat        21600
 domain_read_pio initialized domain
PIO: FATAL ERROR: Aborting... An error occured, Reading variable (PCT_SAND, varid=2) from file (/pscratch/sd/g/gbisht/mksrf_soitex_0.0083x0.0083deg.10level.c250603.cdf5.nc, ncid=27) failed with PIO_IOTYPE_PNETCDF iotype. The low level (PnetCDF) I/O library call failed to read the variable (Number of regions = 1, iodesc id = 572, Bytes to read on this process = 532713670). Size of I/O request exceeds INT_MAX (err=-237). Aborting since the error handler was set to PIO_INTERNAL_ERROR... (/global/cfs/cdirs/e3sm/gbisht/Projects/se-tasks/mksurfdat/scorpio/src/clib/pio_darray_int.cpp: 1328)
PIO: FATAL ERROR: Aborting... An error occured, Reading variable (PCT_SAND, varid=2) from file (/pscratch/sd/g/gbisht/mksrf_soitex_0.0083x0.0083deg.10level.c250603.cdf5.nc, ncid=27) failed with PIO_IOTYPE_PNETCDF iotype. The low level (PnetCDF) I/O library call failed to read the variable (Number of regions = 1, iodesc id = 572, Bytes to read on this process = 532713670). Size of I/O request exceeds INT_MAX (err=-237). Aborting since the error handler was set to PIO_INTERNAL_ERROR... (/global/cfs/cdirs/e3sm/gbisht/Projects/se-tasks/mksurfdat/scorpio/src/clib/pio_darray_int.cpp: 1328)
Obtained 10 stack frames.
src/mksurfdat_petsc() [0x53d137]
src/mksurfdat_petsc() [0x53d302]
src/mksurfdat_petsc() [0x53dba5]
src/mksurfdat_petsc() [0x57c04d]
src/mksurfdat_petsc() [0x578524]
src/mksurfdat_petsc() [0x538fbe]
src/mksurfdat_petsc() [0x50b489]
src/mksurfdat_petsc() [0x43509c]
src/mksurfdat_petsc() [0x454e36]
src/mksurfdat_petsc() [0x41e25a]
Obtained 10 stack frames.
src/mksurfdat_petsc() [0x53d137]
src/mksurfdat_petsc() [0x53d302]
src/mksurfdat_petsc() [0x53dba5]
src/mksurfdat_petsc() [0x57c04d]
src/mksurfdat_petsc() [0x578524]
PIO: FATAL ERROR: Aborting... An error occured, Reading variable (PCT_SAND, varid=2) from file (/pscratch/sd/g/gbisht/mksrf_soitex_0.0083x0.0083deg.10level.c250603.cdf5.nc, ncid=27) failed with PIO_IOTYPE_PNETCDF iotype. The low level (PnetCDF) I/O library call failed to read the variable (Number of regions = 1, iodesc id = 572, Bytes to read on this process = 532713670). Size of I/O request exceeds INT_MAX (err=-237). Aborting since the error handler was set to PIO_INTERNAL_ERROR... (/global/cfs/cdirs/e3sm/gbisht/Projects/se-tasks/mksurfdat/scorpio/src/clib/pio_darray_int.cpp: 1328)
MPICH ERROR [Rank 2] [job id 45439218.0] [Thu Nov 20 08:39:10 2025] [nid004220] - Abort(-1) (rank 2 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 2

src/mksurfdat_petsc() [0x538fbe]
src/mksurfdat_petsc() [0x50b489]
src/mksurfdat_petsc() [0x43509c]
src/mksurfdat_petsc() [0x454e36]
src/mksurfdat_petsc() [0x41e25a]
Obtained 10 stack frames.
src/mksurfdat_petsc() [0x53d137]
src/mksurfdat_petsc() [0x53d302]
src/mksurfdat_petsc() [0x53dba5]
src/mksurfdat_petsc() [0x57c04d]
src/mksurfdat_petsc() [0x578524]
src/mksurfdat_petsc() [0x538fbe]
src/mksurfdat_petsc() [0x50b489]
src/mksurfdat_petsc() [0x43509c]
src/mksurfdat_petsc() [0x454e36]
src/mksurfdat_petsc() [0x41e25a]
aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 2
MPICH ERROR [Rank 3] [job id 45439218.0] [Thu Nov 20 08:39:10 2025] [nid004221] - Abort(-1) (rank 3 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 3

MPICH ERROR [Rank 1] [job id 45439218.0] [Thu Nov 20 08:39:10 2025] [nid004219] - Abort(-1) (rank 1 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions