More tasks (FFT, matrix multiply, sorting networks)

## Summary

Expand the benchmark task suite with three new GPU kernel implementations: Fast Fourier Transform (FFT), matrix multiplication, and sorting networks.

## Motivation

The current five tasks demonstrate a range from bandwidth-bound to compute-bound workloads, but they're all element-wise operations with no inter-element dependencies. FFT, matrix multiply, and sorting networks introduce fundamentally different parallel patterns — butterfly networks, tiled shared-memory access, and compare-and-swap — that stress different aspects of GPU architecture and reveal new performance characteristics.

## Acceptance Criteria

- [ ] Implement `FftTask` — a radix-2 Cooley-Tukey FFT on a power-of-two input array
- [ ] Implement `MatrixMultiplyTask` — dense matrix multiplication (e.g. 1024×1024 × 1024×1024)
- [ ] Implement `SortingNetworkTask` — a bitonic sort or odd-even merge sort on the GPU
- [ ] All three tasks implement `IBenchmarkTask` and integrate with the existing harness
- [ ] Each task includes proper host↔device memory transfers (no shortcuts)
- [ ] Results should be validated for correctness against a CPU reference implementation
- [ ] Performance should be benchmarked and reported alongside existing tasks

## Technical Notes

- FFT requires shared memory and butterfly-pattern access — may need ILGPU shared memory primitives
- Matrix multiply benefits hugely from tiled/blocked algorithms using shared memory
- Sorting networks are interesting because they have a fixed comparison pattern, making them ideal for GPU parallelism despite sorting being traditionally sequential
- Batch size semantics differ per task: for FFT it's the transform length, for matrix multiply it's the matrix dimension, for sorting it's the array length
- Consider adding these under `src/GpuStreamingDemo.Core/Tasks/` following existing patterns

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More tasks (FFT, matrix multiply, sorting networks) #11

Summary

Motivation

Acceptance Criteria

Technical Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

More tasks (FFT, matrix multiply, sorting networks) #11

Description

Summary

Motivation

Acceptance Criteria

Technical Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions