Add utilities for allocating and managing memory-aligned tensors to
improve memory access efficiency.
The utilities should integrate cleanly with existing kernels.
Planned Benchmarks
- Memory throughput
- Cache line utilization
- Alignment impact on kernels
Learning Objectives
- GPU memory alignment
- Cache behavior
- Tensor layout design