Skip to content

Conversation

@BenjaminTJohnson
Copy link
Contributor

@BenjaminTJohnson BenjaminTJohnson commented Jan 4, 2026

Summary
This PR addresses critical thread-safety and consistency issues in the CRTM Forward, Tangent-Linear (TL), and K-Matrix modules when running with OpenMP. Key changes include fixing a significant memory inefficiency in
parallel loops (AAvar allocation), removing unsafe SAVE attributes in shared coefficient modules, and ensuring correct handling of thread-local storage for predictors and optics structures.

Motivation
While enabling OpenMP parallelism for channel processing, several race conditions and memory inefficiencies were identified:

  • Memory Inefficiency: The AAvar internal structure, which is already an array indexed by thread ID (size n_channel_threads), was incorrectly declared as PRIVATE in OpenMP loops. This caused every thread to allocate
    its own full copy of the array, leading to excessive memory usage and potential stack overflows on high-core counts.
  • Race Conditions in Coefficients: Certain legacy modules (ODCAPS and NESDIS Emissivity) used pointer initializations (=> NULL()) or the SAVE attribute for local arrays. In Fortran, this implies static storage shared
    across all threads, causing race conditions when multiple threads attempted to assign or read these coefficients simultaneously.
  • Consistency: ensuring the Tangent-Linear (TL) and Adjoint (AD) models produce consistent results with the Forward model under parallel execution.

Technical Changes
OpenMP Memory Optimization

  • Modules Affected: CRTM_Forward_Module, CRTM_Tangent_Linear_Module, CRTM_K_Matrix_Module
  • Change: Removed AAvar from the !$OMP PRIVATE(...) clause.
  • Reasoning: AAvar is declared as TYPE(CRTM_AAvar_type) :: AAvar(n_channel_threads). Since each thread nt exclusively accesses AAvar(nt), the array itself must be SHARED so threads access their pre-allocated slice.
    Declaring it PRIVATE forced redundant allocations of the entire array for every thread.

Thread-Safety Fixes (Race Conditions)

  • ODCAPS Modules: (ODCAPS_AtmAbsorption.f90, ODCAPS_Predictor.f90)
    • Removed => NULL() from pointer declarations in subroutines. In Fortran, initialization in a type declaration statement implies the SAVE attribute, making the pointer static and shared across threads. Removing
      this ensures the pointer is automatic (stack-allocated) and thread-local.
  • NESDIS Emissivity Modules: (NESDIS_AMSU_SICEEM_Module.f90, etc.)
    • Removed explicit SAVE attributes for local coefficient arrays (e.g., coe). These arrays are populated at runtime; keeping them static caused data races where one thread could overwrite coefficients being used
      by another.

Administrative

  • Removed the conductor/ directory from version control to keep project planning artifacts local and clean up the repository structure.

Testing & Verification

  • Build: Successful compilation on WSL2 and S4 using ifort 2024 and 2025, and gfortran.
  • Unit Tests: Verified against existing ctest suite.
  • Performance: Expected reduction in memory footprint during parallel execution due to the AAvar fix.
  1. Checklist
  • Code compiles without errors.
  • OpenMP directives verified for correctness.
  • Thread-unsafe SAVE attributes removed.
  • Local planning files (conductor/) removed from the repository.

Resolves #158
Resolves #273

…wnward through emission and scattering TL/AD paths; implement downwelling TL/AD for ADA and SOI solvers; fix CloudScatter TL backscatter gating and VIS SfcOptics TL/AD stubs; ensure RTSolution/common output handles downwelling radiance; map K‑Matrix downwelling input into adjoint preprocessing
…_threads path, removing the temporary ifx workaround. Move the non-variable Atmosphere/Surface copies until after skip/input validation to avoid mutating outputs for skipped profiles.
…mutation while preserving validity checks and layer adds. Align AD/K stream selection and set RT algorithm ID in AD to match forward/TL/K option handling.
…rt/ifx bypass guards. Keep thread-loop directives active for consistent parallel execution.
…s writes in parallel loops. Use per-thread error flags with reductions and early aborts to prevent races during optics/predictor allocation.
…ections to avoid illegal loop exits. Keep channel threading while preventing concurrent AD routine execution
…chunk bounds, basing sensor channel counts on Process_Channel size. Remove profile-parallel OpenMP directives in forward/TL/adjoint paths and tidy

  thread index math. Improve CRTM_RTSolution_Inspect Stokes printing to avoid multi-line zero padding.
The `AAvar` array is allocated with size `n_channel_threads` and accessed using the thread index. Previously, it was declared as `PRIVATE` in OpenMP loops, causing each thread to allocate a private copy of the entire array, leading to memory inefficiency. This change removes `AAvar` from the `PRIVATE` clause, allowing it to be `SHARED` (default) so threads access their specific slice of the shared array.

Affected modules:
- CRTM_Forward_Module
- CRTM_Tangent_Linear_Module
- CRTM_K_Matrix_Module
…nitializations

- In ODCAPS modules: Remove `=> NULL()` pointer initialization in procedure declarations. In Fortran, this implies the `SAVE` attribute, causing the pointer to be shared across threads, leading to potential race conditions.
- In NESDIS Emissivity modules: Remove explicit `SAVE` attributes for local coefficient arrays (`coe`). These arrays are assigned at runtime; sharing them across threads creates race conditions during assignment.
@BenjaminTJohnson BenjaminTJohnson self-assigned this Jan 4, 2026
In the Forward module's parallel loops, threads were writing directly to the shared `Error_Status` variable. This created a race condition where failures could be overwritten or updates could clash.

This commit updates the error handling pattern to match `CRTM_K_Matrix_Module.f90`:
- Use a thread-local `Err_Thread` variable inside parallel loops.
- Use OpenMP reduction `REDUCTION(MAX:thread_error)` to safely aggregate errors.
- Check the aggregated `thread_error` after the parallel region to determine global success/failure.
Copy link
Contributor

@chengdang chengdang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the technical changes mentioned in this PR description, a large portion of this PR is actually a science update to include the TL and AD of downward ADA calculation. Are these updates merged into this PR on purpose for technical issues?

ENDDO
END IF

IF ( compute_down ) THEN
Copy link
Contributor

@chengdang chengdang Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on the TL and AD for downward ADA calculations! I finished the TL last year (never got merged), but the development on AD was paused. Do we have a good way to test the physical values of these routines?

@BenjaminTJohnson
Copy link
Contributor Author

BenjaminTJohnson commented Jan 8, 2026

For some reason the Adjoint model output is substantially different from develop (K-Matrix is fine), I'm investigating. [I copied the same test suite from develop so there's no actual test difference.]. The other tests that differ are only (so far) different in the clear values. I expect that the clear sky values will be near 272

163:       CRTM Version: v3.2.0                               | 163:       CRTM Version: v3.1.2
163:    Surface Reflectivity          :  3.079629756670640E-0 | 163:    Surface Reflectivity          :  2.826954382250922E-0
163:    Up Radiance                   :  1.202269660773092E-0 | 163:    Up Radiance                   :  1.295982889306943E-0
163:    Down Radiance                 :  1.309402453688525E-0 | 163:    Down Radiance                 :  1.318932614912245E-0
163:    Radiance (clear)              :  1.080763706026805E-0 | 163:    Radiance (clear)              :  0.000000000000000E+0
163:    Brightness Temperature (clear):  2.077221843065165E+0 | 163:    Brightness Temperature (clear):  0.000000000000000E+0
163:    Radiance                      :  1.080763706026805E-0 | 163:    Radiance                      :  1.311174988122144E-0
163:    Brightness Temperature        :  2.077221843065165E+0 | 163:    Brightness Temperature        :  2.518855502027379E+0
163:    Stokes                        :  1.080763706026805E-0 | 163:    Stokes                        :  1.311174988122144E-0
163:    Surface Emissivity            :  2.468756184435915E+0 | 163:    Surface Emissivity            :  2.967755451982850E+0
163:    Radiance                      :  1.916719745857602E+0 | 163:    Radiance                      :  1.916718200839424E+0
163:    Surface Reflectivity          :  2.931528566725709E-0 | 163:    Surface Reflectivity          :  2.646834233500325E-0
163:    Up Radiance                   :  1.492529541240421E-0 | 163:    Up Radiance                   :  2.273871207241789E-0
163:    Down Radiance                 :  1.669054059027952E-0 | 163:    Down Radiance                 :  2.319100425356311E-0
163:    Radiance (clear)              :  1.886953141397885E-0 | 163:    Radiance (clear)              :  0.000000000000000E+0
163:    Brightness Temperature (clear):  2.085066579035844E+0 | 163:    Brightness Temperature (clear):  0.000000000000000E+0
163:    Radiance                      :  1.886953141397885E-0 | 163:    Radiance                      :  2.275371664518770E-0
163:    Brightness Temperature        :  2.085066579035844E+0 | 163:    Brightness Temperature        :  2.512717925719001E+0
163:    Stokes                        :  1.886953141397885E-0 | 163:    Stokes                        :  2.275371664518770E-0
163:    Surface Emissivity            :  2.533547739074870E+0 | 163:    Surface Emissivity            :  1.677097347736567E-0
163:    Radiance                      :  1.101007387886514E+0 | 163:    Radiance                      :  1.101005895164793E+0
163:    Surface Reflectivity          :  2.439126304188058E-0 | 163:    Surface Reflectivity          :  2.315046678969014E-0
163:    Up Radiance                   :  2.412797087485997E-0 | 163:    Up Radiance                   :  5.772831321470229E-0

I think I have a fix for this shortly.

…eading.

- Updated Forward, K-Matrix, Adjoint, and Tangent-Linear OpenMP tests to use 'iasi-ng_metop-sg-a1' with stride 2 channel sampling (~8,460 channels) for better performance scaling analysis.
- Modified tests to default to OMP_NUM_THREADS=1 if the environment variable is unset, ensuring consistent baseline behavior.
- Removed explicit OMP_NUM_THREADS overrides in test/CMakeLists.txt to allow runtime environment control.
- Added timing benchmark script test/run_timing_iasi_ng.py.
@BenjaminTJohnson
Copy link
Contributor Author

Update: Fixed a thread‑dependent NLTE adjoint issue by initializing the per‑channel NLTE predictor in CRTM_K_Matrix_Module, which stabilizes K‑matrix outputs for NLTE sensors. Added a per‑channel zero of the Zeeman predictor to prevent cross‑channel contamination when Zeeman paths are skipped. OpenMP now explicitly defaults to 1 thread when OMP_NUM_THREADS is unset (forward/TL/K), while still honoring the environment when it is set. Also adjusted the ctest environment so OMP_NUM_THREADS is picked up from the shell (rather than falling back to a default), making thread counts explicit during test runs.

@BenjaminTJohnson
Copy link
Contributor Author

lots of failures with ifx 2024.x, I think we really can't promise support for ifx 2024 given the known issues. ifort 202x seems to work. ifx 2025.3 all tests pass for various OMP_NUM_THREADS. flang not tested yet.

@BenjaminTJohnson
Copy link
Contributor Author

@chengdang I think this is ready for a thorough review from you. Thanks.

@BenjaminTJohnson
Copy link
Contributor Author

all UFO ctests pass (surprisingly) for me, using GCC.

@BenjaminTJohnson
Copy link
Contributor Author

all fv3-jedi ctests pass using GCC.

Can't test ifx yet for UFO/JEDI: S4 compute nodes down and I don't have the ifx stack rebuilt locally yet.

Looks pretty solid to me. One more thing I'm going to check is debug vs. release -- I expect numerical differences, but nothing outside of single precision boundary.

… it currently seems to be a couple of commits off.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants