-
Notifications
You must be signed in to change notification settings - Fork 28
Description
Spent around 3 days trying to get this to work on Windows, here are my findings.
Environment
- OS: Windows 11 23H2 (Build 22631.5909)
- GPU: NVIDIA GeForce RTX 4090
- NVIDIA Driver: 581.29 (reports CUDA 13.0 capability)
- Python: 3.9.23
- Numba: Tested with 0.57.1, 0.58.1, 0.60.0
- NumPy: Tested with 1.24.3, 1.26.4, 2.0.1
- PyTorch: 2.5.1+cu121 (installed but not causing the issue)
Problem Description
The script fails with an access violation when Numba attempts to use CUDA, specifically when calling cuda.to_device(). The error occurs at the CUDA driver API level in cuCtxGetDevice.
Error Traceback
OSError: exception: access violation reading 0x0000000000000000
File "handheld_super_resolution/super_resolution.py", line 108, in main
cuda_ref_img = cuda.to_device(ref_img)
File "numba/cuda/cudadrv/devices.py", line 232, in _require_cuda_context
return fn(*args, **kws)
[...]
File "numba/cuda/cudadrv/driver.py", line 505, in __enter__
driver.cuCtxGetDevice(byref(hdevice))
File "numba/cuda/cudadrv/driver.py", line 326, in safe_cuda_api_call
retcode = libfn(*args)Key Findings
- CUDA context creation works in isolation: Running
cuda.current_context()directly succeeds and prints the GPU name correctly - The failure happens specifically in
cuda.to_device(): The crash occurs when Numba tries to use the context for actual operations - Diagnostic output shows nvvm.dll loading issue: While nvrtc and cudart load successfully, nvvm.dll fails to load via simple DLL name lookup
Diagnostic Output
Finding nvvm from CUDA_HOME
Located at nvvm.dll
Trying to open library... ERROR: failed to open nvvm:
Could not find module 'nvvm.dll' (or one of its dependencies).
Finding nvrtc from CUDA_HOME
Located at C:\ProgramData\miniconda3\envs\handheld\Library\bin\nvrtc64_130_0.dll
Trying to open library... ok
Finding cudart from CUDA_HOME
Located at C:\ProgramData\miniconda3\envs\handheld\Library\bin\cudart64_13.dll
Trying to open library... ok
[Simple context test]
Context created OK
Device: b'NVIDIA GeForce RTX 4090'
What I Tried (All Failed)
CUDA Toolkit Versions
- ❌ CUDA 11.8 stack (cudatoolkit + PyTorch CUDA 11.8)
- ❌ CUDA 12.1 stack (cuda-nvrtc 12.1 + PyTorch CUDA 12.1)
- ❌ CUDA 13.0 stack (cuda-nvrtc 13.0.88 + cuda-nvvm 13.0.88 + cuda-cudart 13.0.96)
Numba Versions
- ❌ Numba 0.57.1 + llvmlite 0.40.1
- ❌ Numba 0.58.1 + llvmlite 0.41.1
- ❌ Numba 0.60.0 + llvmlite 0.43.0
NumPy Versions
- ❌ NumPy 1.24.3
- ❌ NumPy 1.26.4
- ❌ NumPy 2.0.1
Environment Fixes Attempted
- ❌ Manual PATH configuration to include nvvm DLL directories
- ❌ Using
os.add_dll_directory()to add DLL search paths - ❌ Installing NVIDIA's cuda-python binding (failed with import errors)
- ❌ Setting
NUMBA_CUDA_USE_NVIDIA_BINDING=1(import error: cannot import 'cuda' from 'cuda') - ❌ Explicitly initializing CUDA context before any operations
- ❌ Using
device.reset()to create fresh primary context - ❌ Removing PyTorch CUDA initialization conflicts
CPU Simulation Mode
- ❌ Setting
NUMBA_ENABLE_CUDASIM=1to run without GPU does not work with this codebase. The code usescuda.as_cuda_array()which doesn't exist in Numba's CPU simulator, causing:
AttributeError: module 'numba.cuda' has no attribute 'as_cuda_array'Seems like significant refactoring would be required to support CPU simulation mode.
Root Cause Analysis
The issue appears to be a fundamental incompatibility between Numba's ctypes-based CUDA driver binding and modern CUDA 13.0 drivers on Windows 11. Specifically:
- Simple context queries work (like
cuda.current_context()) - Complex operations fail (like
cuda.to_device()which requires memory allocation and JIT compilation) - The nvvm.dll loading failure suggests Numba cannot properly locate or load the NVVM library needed for JIT compilation of CUDA kernels
- Windows-specific PATH issues with CUDA 13 where DLLs are in
Library\bin\x64andLibrary\nvvm\bin\x64subdirectories that Numba doesn't search
Workarounds
⚠️ Downgrading NVIDIA Driver
Driver versions 531.x or 528.x from 2023 (CUDA 12.1/12.0 era) might stand a chance, but that's obviously asking for trouble as it's very likely to cause problems with other applications.
✅ WSL2
Running in Windows Subsystem for Linux 2 with Ubuntu works! After one more day of debugging, here's the complete solution for RTX 4090 (Ada Lovelace, CC 8.9) on WSL2.
Prerequisites
- Windows 11 with WSL2 enabled
- NVIDIA GPU driver 531.x or newer (must support CUDA 13.0)
- Miniconda installed in WSL2
Step-by-Step Setup
- Create a clean conda environment with modern numba-cuda:
conda create -n handheld-wsl -y -c conda-forge -c pytorch \
python=3.9 numpy=1.26.4 \
numba-cuda "cuda-version=13" \
rawpy exifread scipy scikit-image opencv colour-demosaicing matplotlib tqdm- Activate and install PyTorch with CUDA support:
conda activate handheld-wsl
pip install torch==2.0.1 --index-url https://download.pytorch.org/whl/cu118- Install CUDA libraries:
conda install -y -c nvidia cuda-nvvm cuda-nvrtc- Configure environment for RTX 4090 (Ada Lovelace CC 8.9):
Create activation script:
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
cat > $CONDA_PREFIX/etc/conda/activate.d/handheld.sh << 'EOF'
#!/usr/bin/env bash
export NUMBA_FORCE_CUDA_CC=8.6
export LD_LIBRARY_PATH="$CONDA_PREFIX/lib:$LD_LIBRARY_PATH"
EOF
chmod +x $CONDA_PREFIX/etc/conda/activate.d/handheld.sh- Clean Numba cache:
rm -rf ~/.numba- Run the project:
conda activate handheld-wsl
python run_handheld.py --impath test_burst --outpath output.pngWhy This Works
- Modern numba-cuda package: Uses NVIDIA's CUDA Python bindings instead of deprecated ctypes
- NUMBA_FORCE_CUDA_CC=8.6: Forces compilation for Ampere (CC 8.6), which runs on Ada (CC 8.9) via driver JIT
- This is NVIDIA's official recommended approach (Ada Compatibility Guide)
- PTX compiled for 8.x is forward-compatible to higher 8.x/9.x devices
- Clean CUDA 13 stack: No mixing of CUDA 11/12/13 components
- PyTorch with CUDA 11.8: Separate CUDA runtime for PyTorch operations
Automated Setup
Use the attached scripts: setup_env.sh and run.sh
bash setup_env.sh # One-time setup
bash run.sh --impath test_burst --outpath output.pngKey Insight: The Wrong Environment Variable
I initially tried NUMBA_CUDA_DEFAULT_PTX_CC=8.6 which doesn't work. That variable only affects cuda.compile_ptx(), not @cuda.jit decorated functions.
The correct variable is: NUMBA_FORCE_CUDA_CC=8.6
Request
This appears to be a Numba issue with Windows + CUDA 13.0 driver compatibility. Specifically:
- The ctypes binding to
cuCtxGetDeviceis getting a NULL pointer - NVVM library loading fails on Windows with conda-installed CUDA 13 components
- The official NVIDIA cuda-python binding doesn't integrate properly with Numba 0.60 on Windows
Would be great if Numba's Windows + CUDA 13 support could be improved. Better error messages could also help diagnose DLL loading issues.
Additional Context
Initially when I set up the conda environment on Windows and tried to run on provided dng bursts I'd get:
OSError: [WinError 182] The operating system cannot run %1. Error loading "C:\ProgramData\miniconda3\envs\handheld\lib\site-packages\torch\lib\shm.dll" or one of its dependencies.
This PyTorch 1.13.1 import error was resolved by upgrading to PyTorch 2.5.1
Related Issues
- Python 3.11 numba/numba#8304 (CUDA 12 Windows compatibility)
- [BUG] Windows:
numba-cudacan’t find NVVM/NVRTC with conda-forgecuda-version=13layout (DLLs inLibrary\bin\x64);libs.test()reports missingnvvm.dll/nvrtc.dllNVIDIA/numba-cuda#452 (Windows path issues with nvvm/nvrtc)