Fix TL/AD/K-Matrix Consistency and OpenMP Thread-Safety #277

BenjaminTJohnson · 2026-01-04T23:38:00Z

Summary
This PR addresses critical thread-safety and consistency issues in the CRTM Forward, Tangent-Linear (TL), and K-Matrix modules when running with OpenMP. Key changes include fixing a significant memory inefficiency in
parallel loops (AAvar allocation), removing unsafe SAVE attributes in shared coefficient modules, and ensuring correct handling of thread-local storage for predictors and optics structures.

Motivation
While enabling OpenMP parallelism for channel processing, several race conditions and memory inefficiencies were identified:

Memory Inefficiency: The AAvar internal structure, which is already an array indexed by thread ID (size n_channel_threads), was incorrectly declared as PRIVATE in OpenMP loops. This caused every thread to allocate
its own full copy of the array, leading to excessive memory usage and potential stack overflows on high-core counts.
Race Conditions in Coefficients: Certain legacy modules (ODCAPS and NESDIS Emissivity) used pointer initializations (=> NULL()) or the SAVE attribute for local arrays. In Fortran, this implies static storage shared
across all threads, causing race conditions when multiple threads attempted to assign or read these coefficients simultaneously.
Consistency: ensuring the Tangent-Linear (TL) and Adjoint (AD) models produce consistent results with the Forward model under parallel execution.

Technical Changes
OpenMP Memory Optimization

Modules Affected: CRTM_Forward_Module, CRTM_Tangent_Linear_Module, CRTM_K_Matrix_Module
Change: Removed AAvar from the !$OMP PRIVATE(...) clause.
Reasoning: AAvar is declared as TYPE(CRTM_AAvar_type) :: AAvar(n_channel_threads). Since each thread nt exclusively accesses AAvar(nt), the array itself must be SHARED so threads access their pre-allocated slice.
Declaring it PRIVATE forced redundant allocations of the entire array for every thread.

Thread-Safety Fixes (Race Conditions)

ODCAPS Modules: (ODCAPS_AtmAbsorption.f90, ODCAPS_Predictor.f90)
- Removed => NULL() from pointer declarations in subroutines. In Fortran, initialization in a type declaration statement implies the SAVE attribute, making the pointer static and shared across threads. Removing
  this ensures the pointer is automatic (stack-allocated) and thread-local.
NESDIS Emissivity Modules: (NESDIS_AMSU_SICEEM_Module.f90, etc.)
- Removed explicit SAVE attributes for local coefficient arrays (e.g., coe). These arrays are populated at runtime; keeping them static caused data races where one thread could overwrite coefficients being used
  by another.

Administrative

Removed the conductor/ directory from version control to keep project planning artifacts local and clean up the repository structure.

Testing & Verification

Build: Successful compilation on WSL2 and S4 using ifort 2024 and 2025, and gfortran.
Unit Tests: Verified against existing ctest suite.
Performance: Expected reduction in memory footprint during parallel execution due to the AAvar fix.

Checklist

Code compiles without errors.
OpenMP directives verified for correctness.
Thread-unsafe SAVE attributes removed.
Local planning files (conductor/) removed from the repository.

Resolves #158
Resolves #273

…wnward through emission and scattering TL/AD paths; implement downwelling TL/AD for ADA and SOI solvers; fix CloudScatter TL backscatter gating and VIS SfcOptics TL/AD stubs; ensure RTSolution/common output handles downwelling radiance; map K‑Matrix downwelling input into adjoint preprocessing

…_threads path, removing the temporary ifx workaround. Move the non-variable Atmosphere/Surface copies until after skip/input validation to avoid mutating outputs for skipped profiles.

…mutation while preserving validity checks and layer adds. Align AD/K stream selection and set RT algorithm ID in AD to match forward/TL/K option handling.

…rt/ifx bypass guards. Keep thread-loop directives active for consistent parallel execution.

…s writes in parallel loops. Use per-thread error flags with reductions and early aborts to prevent races during optics/predictor allocation.

…ections to avoid illegal loop exits. Keep channel threading while preventing concurrent AD routine execution

…ssues, removes profile optimization

…chunk bounds, basing sensor channel counts on Process_Channel size. Remove profile-parallel OpenMP directives in forward/TL/adjoint paths and tidy thread index math. Improve CRTM_RTSolution_Inspect Stokes printing to avoid multi-line zero padding.

The `AAvar` array is allocated with size `n_channel_threads` and accessed using the thread index. Previously, it was declared as `PRIVATE` in OpenMP loops, causing each thread to allocate a private copy of the entire array, leading to memory inefficiency. This change removes `AAvar` from the `PRIVATE` clause, allowing it to be `SHARED` (default) so threads access their specific slice of the shared array. Affected modules: - CRTM_Forward_Module - CRTM_Tangent_Linear_Module - CRTM_K_Matrix_Module

…nitializations - In ODCAPS modules: Remove `=> NULL()` pointer initialization in procedure declarations. In Fortran, this implies the `SAVE` attribute, causing the pointer to be shared across threads, leading to potential race conditions. - In NESDIS Emissivity modules: Remove explicit `SAVE` attributes for local coefficient arrays (`coe`). These arrays are assigned at runtime; sharing them across threads creates race conditions during assignment.

…d to .gitignore

In the Forward module's parallel loops, threads were writing directly to the shared `Error_Status` variable. This created a race condition where failures could be overwritten or updates could clash. This commit updates the error handling pattern to match `CRTM_K_Matrix_Module.f90`: - Use a thread-local `Err_Thread` variable inside parallel loops. - Use OpenMP reduction `REDUCTION(MAX:thread_error)` to safely aggregate errors. - Check the aggregated `thread_error` after the parallel region to determine global success/failure.

chengdang

Besides the technical changes mentioned in this PR description, a large portion of this PR is actually a science update to include the TL and AD of downward ADA calculation. Are these updates merged into this PR on purpose for technical issues?

chengdang · 2026-01-07T06:19:39Z

src/RTSolution/ADA/ADA_Module.f90

      ENDDO
    END IF

+    IF ( compute_down ) THEN


Great work on the TL and AD for downward ADA calculations! I finished the TL last year (never got merged), but the development on AD was paused. Do we have a good way to test the physical values of these routines?

BenjaminTJohnson · 2026-01-08T15:40:04Z

For some reason the Adjoint model output is substantially different from develop (K-Matrix is fine), I'm investigating. [I copied the same test suite from develop so there's no actual test difference.]. The other tests that differ are only (so far) different in the clear values. I expect that the clear sky values will be near 272

163:       CRTM Version: v3.2.0                               | 163:       CRTM Version: v3.1.2
163:    Surface Reflectivity          :  3.079629756670640E-0 | 163:    Surface Reflectivity          :  2.826954382250922E-0
163:    Up Radiance                   :  1.202269660773092E-0 | 163:    Up Radiance                   :  1.295982889306943E-0
163:    Down Radiance                 :  1.309402453688525E-0 | 163:    Down Radiance                 :  1.318932614912245E-0
163:    Radiance (clear)              :  1.080763706026805E-0 | 163:    Radiance (clear)              :  0.000000000000000E+0
163:    Brightness Temperature (clear):  2.077221843065165E+0 | 163:    Brightness Temperature (clear):  0.000000000000000E+0
163:    Radiance                      :  1.080763706026805E-0 | 163:    Radiance                      :  1.311174988122144E-0
163:    Brightness Temperature        :  2.077221843065165E+0 | 163:    Brightness Temperature        :  2.518855502027379E+0
163:    Stokes                        :  1.080763706026805E-0 | 163:    Stokes                        :  1.311174988122144E-0
163:    Surface Emissivity            :  2.468756184435915E+0 | 163:    Surface Emissivity            :  2.967755451982850E+0
163:    Radiance                      :  1.916719745857602E+0 | 163:    Radiance                      :  1.916718200839424E+0
163:    Surface Reflectivity          :  2.931528566725709E-0 | 163:    Surface Reflectivity          :  2.646834233500325E-0
163:    Up Radiance                   :  1.492529541240421E-0 | 163:    Up Radiance                   :  2.273871207241789E-0
163:    Down Radiance                 :  1.669054059027952E-0 | 163:    Down Radiance                 :  2.319100425356311E-0
163:    Radiance (clear)              :  1.886953141397885E-0 | 163:    Radiance (clear)              :  0.000000000000000E+0
163:    Brightness Temperature (clear):  2.085066579035844E+0 | 163:    Brightness Temperature (clear):  0.000000000000000E+0
163:    Radiance                      :  1.886953141397885E-0 | 163:    Radiance                      :  2.275371664518770E-0
163:    Brightness Temperature        :  2.085066579035844E+0 | 163:    Brightness Temperature        :  2.512717925719001E+0
163:    Stokes                        :  1.886953141397885E-0 | 163:    Stokes                        :  2.275371664518770E-0
163:    Surface Emissivity            :  2.533547739074870E+0 | 163:    Surface Emissivity            :  1.677097347736567E-0
163:    Radiance                      :  1.101007387886514E+0 | 163:    Radiance                      :  1.101005895164793E+0
163:    Surface Reflectivity          :  2.439126304188058E-0 | 163:    Surface Reflectivity          :  2.315046678969014E-0
163:    Up Radiance                   :  2.412797087485997E-0 | 163:    Up Radiance                   :  5.772831321470229E-0

I think I have a fix for this shortly.

…eading. - Updated Forward, K-Matrix, Adjoint, and Tangent-Linear OpenMP tests to use 'iasi-ng_metop-sg-a1' with stride 2 channel sampling (~8,460 channels) for better performance scaling analysis. - Modified tests to default to OMP_NUM_THREADS=1 if the environment variable is unset, ensuring consistent baseline behavior. - Removed explicit OMP_NUM_THREADS overrides in test/CMakeLists.txt to allow runtime environment control. - Added timing benchmark script test/run_timing_iasi_ng.py.

BenjaminTJohnson · 2026-01-09T19:43:27Z

Update: Fixed a thread‑dependent NLTE adjoint issue by initializing the per‑channel NLTE predictor in CRTM_K_Matrix_Module, which stabilizes K‑matrix outputs for NLTE sensors. Added a per‑channel zero of the Zeeman predictor to prevent cross‑channel contamination when Zeeman paths are skipped. OpenMP now explicitly defaults to 1 thread when OMP_NUM_THREADS is unset (forward/TL/K), while still honoring the environment when it is set. Also adjusted the ctest environment so OMP_NUM_THREADS is picked up from the shell (rather than falling back to a default), making thread counts explicit during test runs.

BenjaminTJohnson · 2026-01-09T22:21:43Z

lots of failures with ifx 2024.x, I think we really can't promise support for ifx 2024 given the known issues. ifort 202x seems to work. ifx 2025.3 all tests pass for various OMP_NUM_THREADS. flang not tested yet.

BenjaminTJohnson · 2026-01-09T22:22:41Z

@chengdang I think this is ready for a thorough review from you. Thanks.

BenjaminTJohnson · 2026-01-10T01:34:36Z

all UFO ctests pass (surprisingly) for me, using GCC.

BenjaminTJohnson · 2026-01-10T02:01:47Z

all fv3-jedi ctests pass using GCC.

Can't test ifx yet for UFO/JEDI: S4 compute nodes down and I don't have the ifx stack rebuilt locally yet.

Looks pretty solid to me. One more thing I'm going to check is debug vs. release -- I expect numerical differences, but nothing outside of single precision boundary.

… it currently seems to be a couple of commits off.

BenjaminTJohnson added 21 commits December 30, 2025 12:15

initial commit

1e5457a

added Obs_4_downward_P tests to test_Simple TL/AD ctests

7bda473

incrementing version numbers to start preparing for v3.2.0 release

6591ef1

normalize Options overrides and guarding across Forward, TL, and AD

359083a

Enable K-matrix channel threading by restoring the computed n_channel…

2c60472

…_threads path, removing the temporary ifx workaround. Move the non-variable Atmosphere/Surface copies until after skip/input validation to avoid mutating outputs for skipped profiles.

test to check that we're handling options correctly.

23b9a63

Normalize cloud-fraction handling across forward/TL/K to avoid input …

66bcf46

…mutation while preserving validity checks and layer adds. Align AD/K stream selection and set RT algorithm ID in AD to match forward/TL/K option handling.

reenabled use OMP_LIB in TL Module

4e59817

Restore K-matrix channel-level OpenMP parallelism by removing the ifo…

ab2c72a

…rt/ifx bypass guards. Keep thread-loop directives active for consistent parallel execution.

attempt to address sigbus issues in fv3-jedi

886ad35

Make K-matrix OpenMP setup thread-safe by removing shared Error_Statu…

508534e

…s writes in parallel loops. Use per-thread error flags with reductions and early aborts to prevent races during optics/predictor allocation.

Serialize K-matrix adjoint calls with fine-grained OpenMP critical s…

a495677

…ections to avoid illegal loop exits. Keep channel threading while preventing concurrent AD routine execution

reverting somewhat to see if we can simplify the K_Matrix OMP directives

138bbd2

reverting to openMP over channel only to see if it resolves fv3jedi i…

534cbe6

…ssues, removes profile optimization

welp, back to removing all openMP directives in CRTM_K_Matrix...

c34c19b

conductor(setup): Add conductor setup files

1e89db4

Remove conductor/ directory from version control but keep locally; ad…

103c499

…d to .gitignore

BenjaminTJohnson requested a review from chengdang January 4, 2026 23:38

BenjaminTJohnson self-assigned this Jan 4, 2026

chengdang reviewed Jan 7, 2026

View reviewed changes

BenjaminTJohnson added 4 commits January 9, 2026 10:28

Initialize NLTE adjoint predictor per channel

23f37aa

Zero Zeeman predictor per channel

115be86

Default OpenMP threads to 1 when unset

616fded

minor update to the sensors used in ctests, sped things up a bit.

da22e8d

updates / refresh of README.md, also hopefully triggers correct CI --…

1d3f929

… it currently seems to be a couple of commits off.

This was referenced Jan 10, 2026

Add diagnostic diff reporting in regression tests + remove ifort array temporary warning #274

Draft

test_forward_Downwelling_Radiance* fails with Intel ifx when OMP_NUM_THREADS > 1 (internal formatted write overflow) #273

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix TL/AD/K-Matrix Consistency and OpenMP Thread-Safety #277

Fix TL/AD/K-Matrix Consistency and OpenMP Thread-Safety #277

Uh oh!

BenjaminTJohnson commented Jan 4, 2026 •

edited

Loading

Uh oh!

chengdang left a comment

Uh oh!

chengdang Jan 7, 2026 •

edited

Loading

Uh oh!

BenjaminTJohnson commented Jan 8, 2026 •

edited

Loading

Uh oh!

BenjaminTJohnson commented Jan 9, 2026

Uh oh!

BenjaminTJohnson commented Jan 9, 2026

Uh oh!

BenjaminTJohnson commented Jan 9, 2026

Uh oh!

BenjaminTJohnson commented Jan 10, 2026

Uh oh!

BenjaminTJohnson commented Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix TL/AD/K-Matrix Consistency and OpenMP Thread-Safety #277

Are you sure you want to change the base?

Fix TL/AD/K-Matrix Consistency and OpenMP Thread-Safety #277

Uh oh!

Conversation

BenjaminTJohnson commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chengdang left a comment

Choose a reason for hiding this comment

Uh oh!

chengdang Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BenjaminTJohnson commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenjaminTJohnson commented Jan 9, 2026

Uh oh!

BenjaminTJohnson commented Jan 9, 2026

Uh oh!

BenjaminTJohnson commented Jan 9, 2026

Uh oh!

BenjaminTJohnson commented Jan 10, 2026

Uh oh!

BenjaminTJohnson commented Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BenjaminTJohnson commented Jan 4, 2026 •

edited

Loading

chengdang Jan 7, 2026 •

edited

Loading

BenjaminTJohnson commented Jan 8, 2026 •

edited

Loading