Skip to content

Conversation

@LinlinCui-NOAA
Copy link
Collaborator

This PR made two changes to run_graphcast.py:

  • Added time log for each step
  • Removed conversion to Bfloat16. Bfloat16 should not be used for prediction. The reduced precision greatly increases run-to-run variance.

Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good.
One could introduce timing stats in rollout.chunked_predictions as well as converter.save_grib2 to further get info on computation and IO respectively through the model integration.

Copy link

@RussellManser-NCO RussellManser-NCO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional time logging looks good, but I have concerns about the change to float casting if this is intended for the current production code.

@RussellManser-NCO RussellManser-NCO dismissed their stale review December 10, 2025 18:40

Requested changes were made. Thank you.

@RussellManser-NCO
Copy link

I will run a test for this on WCOSS this afternoon.

@RussellManser-NCO
Copy link

The print statements are not being written to output while run_graphcast.py is executing. Could you modify the shebang to the following please?

#!/usr/bin/env -S python3 -u

@aerorahul
Copy link
Contributor

The print statements are not being written to output while run_graphcast.py is executing. Could you modify the shebang to the following please?

#!/usr/bin/env -S python3 -u

could also add flush=True to the print statement. However, seems like there is something else at play.

@aerorahul
Copy link
Contributor

If you don't want to make code changes, add export PYTHONUNBUFFERED=1 to the runtime environment.

@LinlinCui-NOAA
Copy link
Collaborator Author

I can switch to use absl.logging, which can write to output while the script is executing, e.g.:

[2025-12-10 20:13:39,365] absl INFO: Elapsed time for loading input: 41.77620339393616 seconds
[2025-12-10 20:13:48,159] absl INFO: Elapsed time for extracting inputs, targets, and forcings: 8.793529987335205 seconds
[2025-12-10 20:13:49,123] absl INFO: Elapsed time for normalization: 0.9637486934661865 seconds

@RussellManser-NCO Please let me know which one you prefer.

@RussellManser-NCO
Copy link

I can switch to use absl.logging, which can write to output while the script is executing, e.g.:

[2025-12-10 20:13:39,365] absl INFO: Elapsed time for loading input: 41.77620339393616 seconds
[2025-12-10 20:13:48,159] absl INFO: Elapsed time for extracting inputs, targets, and forcings: 8.793529987335205 seconds
[2025-12-10 20:13:49,123] absl INFO: Elapsed time for normalization: 0.9637486934661865 seconds

@RussellManser-NCO Please let me know which one you prefer.

absl.logging works. It's nice to have the timestamps and logging info.

@RussellManser-NCO
Copy link

Logging was not unbuffered, unfortunately, even with a modified shebang. I tried export PYTHONUNBUFFERED=1 as well, which also did not work. The latest push does work.

@LinlinCui-NOAA
Copy link
Collaborator Author

OK. Thanks for testing.

@LinlinCui-NOAA LinlinCui-NOAA merged commit a03a127 into production/mlglobal.v1 Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants