Open
Conversation
AMD CDNA3 (MI300X/gfx942) does not have a hardware tanh instruction like NVIDIA's PTX tanh.approx. This implements approx_tanh for ROCm using: - For f32 (and f16/bf16 via casting): Triton's __triton_hip_fast_tanhf which uses a fast exp-based formula: tanh(x) = (exp(2x) - 1) / (exp(2x) + 1) - For f64: OCML's __ocml_tanh_f64 (AMD's Open Compute Math Library) Changes: - Add f64 support to approx_tanh function - Add ROCm platform detection in _elementwise_inline_asm_lowering - Add _approx_tanh_rocm_lowering function for ROCm-specific lowering - Add test_approx_tanh test with f16/bf16/f32/f64 support See: triton-lang/triton#7780 (cherry picked from commit 39ceb95)
- Remove verbose comment in _elementwise_inline_asm_lowering - Inline dtype_to_ir_type helper, use mlir.dtype_to_ir_type directly - Move ir and arith_dialect imports to top-level - Add TypeError for float64 on non-ROCm platforms - Simplify _approx_tanh_rocm_lowering with needs_cast pattern - Move test_approx_tanh from ops_test.py to triton_pallas_test.py - Fix triton_pallas_test setUp to allow ROCm devices (cherry picked from commit 600fbd3)
The test is already skipped on CUDA (b/442353988) due to HLO debug metadata (source column numbers) being embedded in compiled output, causing semantically identical compilations to produce different as_text() results. The same issue occurs on ROCm. (cherry picked from commit 70b2b99)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> (cherry picked from commit 5ec8419)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR to prepare for 0.9.1 release.