[BUG] XOR checksum in buffer debug tool is blind to paired-flip corruption patterns

## Summary

The buffer debug XOR checksum kernel (`buffer_debug_xor_checksum_kernel_cuda.cu.cc`) uses plain XOR reduction to compute buffer checksums. Plain XOR is blind to "sum-preserving" corruption patterns where an even number of elements are flipped with the same bitmask.

## Reproduction

```
Original buffer: [A, B, C, D]       XOR = A^B^C^D
Corrupted buffer: [A^K, B^K, C, D]  XOR = (A^K)^(B^K)^C^D = A^B^C^D  ← identical!
```

Any even number of elements flipped with the same bitmask K will produce an identical checksum. This is a fundamental algebraic property of XOR: `K ^ K = 0`.

## Impact

- **False negatives in `checksum_mismatch_report.py`**: The tool reports "consistent checksums" when buffer contents have actually been corrupted.
- **Masked hardware errors**: Systematic DRAM bit-flip patterns (e.g., a stuck row affecting multiple elements with the same bit pattern) can go completely undetected.
- **Silent numerical errors**: Corrupted buffers propagate incorrect values through model training/inference without any diagnostic signal.

This affects the debug tool's reliability, not security — the checksum is gated behind `--xla_gpu_experimental_enable_checksum_tracing_on_thunks` and is not on any production code path.

## Suggested Fix

Replace plain XOR with a position-dependent hash. For example, bit-rotate each 32-bit word by a position-dependent amount before XOR accumulation:

```cuda
// rotation factor 7 is coprime with 32, cycling through all 32 rotation amounts
scratch[tid] ^= RotateLeft32(input[i], (i * 7u) % 32u);
```

This breaks the cancellation symmetry (`ROTL(K, pos_a) ^ ROTL(K, pos_b) != 0` when `pos_a != pos_b`) while preserving:
- The same `uint32_t` checksum output (no struct/proto changes)
- The same parallel reduction structure
- Negligible performance overhead (one extra rotate instruction per element)

## Affected Files

- `xla/stream_executor/cuda/buffer_debug_xor_checksum_kernel_cuda.cu.cc`
- `xla/stream_executor/gpu/buffer_debug_xor_checksum_kernel.h`
- `xla/backends/gpu/runtime/buffers_checksum_thunk.cc`
- `xla/tools/buffer_debug_log/checksum_mismatch_report.py` (consumer of checksums)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] XOR checksum in buffer debug tool is blind to paired-flip corruption patterns #39850

Summary

Reproduction

Impact

Suggested Fix

Affected Files

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] XOR checksum in buffer debug tool is blind to paired-flip corruption patterns #39850

Description

Summary

Reproduction

Impact

Suggested Fix

Affected Files

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions