-
Notifications
You must be signed in to change notification settings - Fork 18
Fix TL/AD/K-Matrix Consistency and OpenMP Thread-Safety #277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Fix TL/AD/K-Matrix Consistency and OpenMP Thread-Safety #277
Conversation
…wnward through emission and scattering TL/AD paths; implement downwelling TL/AD for ADA and SOI solvers; fix CloudScatter TL backscatter gating and VIS SfcOptics TL/AD stubs; ensure RTSolution/common output handles downwelling radiance; map K‑Matrix downwelling input into adjoint preprocessing
…_threads path, removing the temporary ifx workaround. Move the non-variable Atmosphere/Surface copies until after skip/input validation to avoid mutating outputs for skipped profiles.
…mutation while preserving validity checks and layer adds. Align AD/K stream selection and set RT algorithm ID in AD to match forward/TL/K option handling.
…rt/ifx bypass guards. Keep thread-loop directives active for consistent parallel execution.
…s writes in parallel loops. Use per-thread error flags with reductions and early aborts to prevent races during optics/predictor allocation.
…ections to avoid illegal loop exits. Keep channel threading while preventing concurrent AD routine execution
…ssues, removes profile optimization
…chunk bounds, basing sensor channel counts on Process_Channel size. Remove profile-parallel OpenMP directives in forward/TL/adjoint paths and tidy thread index math. Improve CRTM_RTSolution_Inspect Stokes printing to avoid multi-line zero padding.
The `AAvar` array is allocated with size `n_channel_threads` and accessed using the thread index. Previously, it was declared as `PRIVATE` in OpenMP loops, causing each thread to allocate a private copy of the entire array, leading to memory inefficiency. This change removes `AAvar` from the `PRIVATE` clause, allowing it to be `SHARED` (default) so threads access their specific slice of the shared array. Affected modules: - CRTM_Forward_Module - CRTM_Tangent_Linear_Module - CRTM_K_Matrix_Module
…nitializations - In ODCAPS modules: Remove `=> NULL()` pointer initialization in procedure declarations. In Fortran, this implies the `SAVE` attribute, causing the pointer to be shared across threads, leading to potential race conditions. - In NESDIS Emissivity modules: Remove explicit `SAVE` attributes for local coefficient arrays (`coe`). These arrays are assigned at runtime; sharing them across threads creates race conditions during assignment.
In the Forward module's parallel loops, threads were writing directly to the shared `Error_Status` variable. This created a race condition where failures could be overwritten or updates could clash. This commit updates the error handling pattern to match `CRTM_K_Matrix_Module.f90`: - Use a thread-local `Err_Thread` variable inside parallel loops. - Use OpenMP reduction `REDUCTION(MAX:thread_error)` to safely aggregate errors. - Check the aggregated `thread_error` after the parallel region to determine global success/failure.
chengdang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the technical changes mentioned in this PR description, a large portion of this PR is actually a science update to include the TL and AD of downward ADA calculation. Are these updates merged into this PR on purpose for technical issues?
| ENDDO | ||
| END IF | ||
|
|
||
| IF ( compute_down ) THEN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on the TL and AD for downward ADA calculations! I finished the TL last year (never got merged), but the development on AD was paused. Do we have a good way to test the physical values of these routines?
|
For some reason the Adjoint model output is substantially different from develop (K-Matrix is fine), I'm investigating. [I copied the same test suite from I think I have a fix for this shortly. |
…eading. - Updated Forward, K-Matrix, Adjoint, and Tangent-Linear OpenMP tests to use 'iasi-ng_metop-sg-a1' with stride 2 channel sampling (~8,460 channels) for better performance scaling analysis. - Modified tests to default to OMP_NUM_THREADS=1 if the environment variable is unset, ensuring consistent baseline behavior. - Removed explicit OMP_NUM_THREADS overrides in test/CMakeLists.txt to allow runtime environment control. - Added timing benchmark script test/run_timing_iasi_ng.py.
|
Update: Fixed a thread‑dependent NLTE adjoint issue by initializing the per‑channel NLTE predictor in |
|
lots of failures with |
|
@chengdang I think this is ready for a thorough review from you. Thanks. |
|
all UFO ctests pass (surprisingly) for me, using GCC. |
|
all Can't test Looks pretty solid to me. One more thing I'm going to check is debug vs. release -- I expect numerical differences, but nothing outside of single precision boundary. |
… it currently seems to be a couple of commits off.
Summary
This PR addresses critical thread-safety and consistency issues in the CRTM Forward, Tangent-Linear (TL), and K-Matrix modules when running with OpenMP. Key changes include fixing a significant memory inefficiency in
parallel loops (AAvar allocation), removing unsafe SAVE attributes in shared coefficient modules, and ensuring correct handling of thread-local storage for predictors and optics structures.
Motivation
While enabling OpenMP parallelism for channel processing, several race conditions and memory inefficiencies were identified:
its own full copy of the array, leading to excessive memory usage and potential stack overflows on high-core counts.
across all threads, causing race conditions when multiple threads attempted to assign or read these coefficients simultaneously.
Technical Changes
OpenMP Memory Optimization
Declaring it PRIVATE forced redundant allocations of the entire array for every thread.
Thread-Safety Fixes (Race Conditions)
this ensures the pointer is automatic (stack-allocated) and thread-local.
by another.
Administrative
Testing & Verification
Resolves #158
Resolves #273