Skip to content

Conversation

@amametjanov
Copy link
Member

@amametjanov amametjanov commented Dec 8, 2025

Change default IO type from NETCDF4C to PNETCDF

Checklist

  • Linting
  • Building
    • CMake build does not produce any new warnings from changes in this PR
  • Testing
    • Add a comment to the PR titled Testing with the following:
      • Which machines CTest unit tests
        have been run on and indicate that are all passing.
      • The Polaris omega_pr test suite
        has passed, using the Polaris e3sm_submodules/Omega baseline
      • Document machine(s), compiler(s), and the build path(s) used for -p for both the baseline (Polaris e3sm_submodules/Omega) and the PR build
      • Indicate "All tests passed" or document failing tests
      • Document testing used to verify the changes including any tests that are added/modified/impacted.
      • Performance related PRs: Please include a relevant PACE experiment link documenting performance before and after.

@amametjanov
Copy link
Member Author

amametjanov commented Dec 8, 2025

Testing:

  • aurora+oneapi-ifxgpu: 100% tests passed, 0 tests failed out of 35

@xylar
Copy link

xylar commented Dec 9, 2025

Thanks very much, @amametjanov! This looks promising.

@xylar
Copy link

xylar commented Dec 9, 2025

@amametjanov, I've got 3 tests for this in the queue, one on Chrysalis and 2 on Frontier. But wait times seem to be a bit long both places. I'll keep you posted.

@amametjanov
Copy link
Member Author

To fix IO_Test ctest, this PR needs scorpio PR #670, which expands supports for CDF5 data-types (like 64-bit ints).

@xylar
Copy link

xylar commented Dec 9, 2025

@amametjanov, maybe this will be fixed by E3SM-Project/scorpio#670 but what I'm seeing on Chrysalis with this branch is:

PIO: FATAL ERROR: Aborting... An error occured, Waiting on pending requests on file (output.nc, ncid=21) failed (Number of pending requests on file = 1, Number of variables with pending requests = 1, Number of request blocks = 1, Current block being waited on = 0, Number of requests in current block = 1).. NetCDF: Operation not allowed in define mode (err=-39). Aborting since the error handler was set to PIO_INTERNAL_ERROR... (/home/ac.xylar/e3sm_work/e3sm/azamat/mod-dflt-io-type/externals/scorpio/src/clib/pio_darray_int.cpp: 2192)

The polaris output is available at:

/lcrc/group/e3sm/ac.xylar/polaris_0.9/chrysalis/test_20251209/omega-pr-mod-dflt-io-type

It seems like this won't be a short-term fix for Omega if a scoprio fix is needed, because that would mean:

  • scorpio fix gets merge
  • scorpio release happens
  • scorpio gets update in E3SM
  • E3SM/master gets merged into Omega/develop
  • This branch goes in

It feels like we should look into whether there's some alternative way to address #323 in the next week or two.

Update scorpio from v1.8.2 2025-Jul-14 to v1.9.0 2025-Nov-21.
Also add fix for PnetCDF CDF5 types.
@amametjanov amametjanov force-pushed the azamat/mod-dflt-io-type branch from 4d44424 to e68388f Compare December 21, 2025 23:03
@amametjanov amametjanov marked this pull request as ready for review December 21, 2025 23:26
@amametjanov
Copy link
Member Author

Xylar, please check with updated head of this branch to see if it fixes NetCDF: Operation not allowed in define mode errors in polaris tests. All ctests are passing for me.

This branch updates the scorpio submodule ahead of E3SM/master's version, which is still on v1.8.2 2025-Jul-14. When scorpio gets update in E3SM/master (with v1.9.0 or later), E3SM/master merge to Omega/develop will subsume this branch's updates.

@xylar
Copy link

xylar commented Dec 22, 2025

Thanks, @amametjanov! I'll retest as soon as I can.

Copy link

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the omega_pr suite using the fix in E3SM-Project/polaris#442, pointing to this branch for the Omega build.

I was able to run successfully with both Intel and Gnu on Chrysalis. I discovered that I can't log in to either Aurora or Frontier at the moment. I'm trying on Perlmutter (CPU and GPU) next.

In the mean time, two small questions/comments.

Filename: ocn.hifreq.$Y-$M
Mode: write
IfExists: append
IfExists: replace
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel qualified to decide if this is a change we want or not. It seems potentially outside the scope of this particular PR even if we do.

case IfExists::Fail:
PIOErr = PIOc_createfile(SysID, &FileID, &Format, Filename.c_str(),
NC_NOCLOBBER | InMode);
NC_NOCLOBBER | PIO_64BIT_DATA | InMode);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere, what are the implications of this change if someone deliberately chooses a different IO format than the new default at runtime? It seems like something we might not want to be hard-coding in this way but maybe I don't understand the implications.

@xylar
Copy link

xylar commented Jan 5, 2026

On Perlmutter-CPU (Both Intel and Gnu) and -GPU (Gnu-GPU), I'm seeing the same hanging behavior reported in E3SM-Project/polaris#396 as we had seen previously. It seems like maybe that behavior is unfortunately independent of this PIO problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants