-
Notifications
You must be signed in to change notification settings - Fork 772
[BUG] XOR checksum in buffer debug tool is blind to paired-flip corruption patterns #39850
Description
Summary
The buffer debug XOR checksum kernel (buffer_debug_xor_checksum_kernel_cuda.cu.cc) uses plain XOR reduction to compute buffer checksums. Plain XOR is blind to "sum-preserving" corruption patterns where an even number of elements are flipped with the same bitmask.
Reproduction
Original buffer: [A, B, C, D] XOR = A^B^C^D
Corrupted buffer: [A^K, B^K, C, D] XOR = (A^K)^(B^K)^C^D = A^B^C^D ← identical!
Any even number of elements flipped with the same bitmask K will produce an identical checksum. This is a fundamental algebraic property of XOR: K ^ K = 0.
Impact
- False negatives in
checksum_mismatch_report.py: The tool reports "consistent checksums" when buffer contents have actually been corrupted. - Masked hardware errors: Systematic DRAM bit-flip patterns (e.g., a stuck row affecting multiple elements with the same bit pattern) can go completely undetected.
- Silent numerical errors: Corrupted buffers propagate incorrect values through model training/inference without any diagnostic signal.
This affects the debug tool's reliability, not security — the checksum is gated behind --xla_gpu_experimental_enable_checksum_tracing_on_thunks and is not on any production code path.
Suggested Fix
Replace plain XOR with a position-dependent hash. For example, bit-rotate each 32-bit word by a position-dependent amount before XOR accumulation:
// rotation factor 7 is coprime with 32, cycling through all 32 rotation amounts
scratch[tid] ^= RotateLeft32(input[i], (i * 7u) % 32u);This breaks the cancellation symmetry (ROTL(K, pos_a) ^ ROTL(K, pos_b) != 0 when pos_a != pos_b) while preserving:
- The same
uint32_tchecksum output (no struct/proto changes) - The same parallel reduction structure
- Negligible performance overhead (one extra rotate instruction per element)
Affected Files
xla/stream_executor/cuda/buffer_debug_xor_checksum_kernel_cuda.cu.ccxla/stream_executor/gpu/buffer_debug_xor_checksum_kernel.hxla/backends/gpu/runtime/buffers_checksum_thunk.ccxla/tools/buffer_debug_log/checksum_mismatch_report.py(consumer of checksums)