Skip to content

Conversation

@steveant
Copy link

Summary

Adds GPU acceleration using Numba CUDA, achieving 45x speedup over CPU multiprocessing (530x over single-core).

Performance

Tested on NVIDIA GB10 (Blackwell, Compute Capability 12.1):

Scene Resolution CPU (20 cores) GPU Speedup
twoballs 960x540 0.82s 0.25s 3x
manyballs 1920x1080 14.3s 0.32s 45x

Changes

  • gpu_engine.py - Numba CUDA kernels with structure-of-arrays memory layout
  • main_gpu.py - GPU entry point
  • pyproject.toml - Modern Python packaging with dependencies
  • examples/manyballs.py - Larger benchmark scene
  • README.md - GPU documentation

Key optimizations

  1. Binary PPM output (P6) - Original text I/O was 96% of runtime
  2. Iterative ray tracing - Replaced recursion for CUDA compatibility
  3. fastmath compilation - Faster floating-point operations
  4. Coalesced memory access - Structure-of-arrays layout

The clean architecture of the original made GPU porting straightforward.

- Numba CUDA kernels for NVIDIA GPUs
- 45x faster than CPU multiprocessing, 530x faster than single-core
- Binary PPM output eliminates I/O bottleneck
- Tested on GB10 Blackwell (Compute Capability 12.1)
- Adds pyproject.toml for modern Python packaging
- New manyballs.py scene for benchmarking
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant