Skip to content

Numba CUDA Access Violation on Windows 11 with Modern NVIDIA Drivers #48

@Shadetail

Description

@Shadetail

Spent around 3 days trying to get this to work on Windows, here are my findings.

Environment

  • OS: Windows 11 23H2 (Build 22631.5909)
  • GPU: NVIDIA GeForce RTX 4090
  • NVIDIA Driver: 581.29 (reports CUDA 13.0 capability)
  • Python: 3.9.23
  • Numba: Tested with 0.57.1, 0.58.1, 0.60.0
  • NumPy: Tested with 1.24.3, 1.26.4, 2.0.1
  • PyTorch: 2.5.1+cu121 (installed but not causing the issue)

Problem Description

The script fails with an access violation when Numba attempts to use CUDA, specifically when calling cuda.to_device(). The error occurs at the CUDA driver API level in cuCtxGetDevice.

Error Traceback

OSError: exception: access violation reading 0x0000000000000000
  File "handheld_super_resolution/super_resolution.py", line 108, in main
    cuda_ref_img = cuda.to_device(ref_img)
  File "numba/cuda/cudadrv/devices.py", line 232, in _require_cuda_context
    return fn(*args, **kws)
  [...]
  File "numba/cuda/cudadrv/driver.py", line 505, in __enter__
    driver.cuCtxGetDevice(byref(hdevice))
  File "numba/cuda/cudadrv/driver.py", line 326, in safe_cuda_api_call
    retcode = libfn(*args)

Key Findings

  1. CUDA context creation works in isolation: Running cuda.current_context() directly succeeds and prints the GPU name correctly
  2. The failure happens specifically in cuda.to_device(): The crash occurs when Numba tries to use the context for actual operations
  3. Diagnostic output shows nvvm.dll loading issue: While nvrtc and cudart load successfully, nvvm.dll fails to load via simple DLL name lookup

Diagnostic Output

Finding nvvm from CUDA_HOME
    Located at nvvm.dll
    Trying to open library...       ERROR: failed to open nvvm:
Could not find module 'nvvm.dll' (or one of its dependencies).

Finding nvrtc from CUDA_HOME
    Located at C:\ProgramData\miniconda3\envs\handheld\Library\bin\nvrtc64_130_0.dll
    Trying to open library...       ok

Finding cudart from CUDA_HOME
    Located at C:\ProgramData\miniconda3\envs\handheld\Library\bin\cudart64_13.dll
    Trying to open library...       ok

[Simple context test]
Context created OK
Device: b'NVIDIA GeForce RTX 4090'

What I Tried (All Failed)

CUDA Toolkit Versions

  • ❌ CUDA 11.8 stack (cudatoolkit + PyTorch CUDA 11.8)
  • ❌ CUDA 12.1 stack (cuda-nvrtc 12.1 + PyTorch CUDA 12.1)
  • ❌ CUDA 13.0 stack (cuda-nvrtc 13.0.88 + cuda-nvvm 13.0.88 + cuda-cudart 13.0.96)

Numba Versions

  • ❌ Numba 0.57.1 + llvmlite 0.40.1
  • ❌ Numba 0.58.1 + llvmlite 0.41.1
  • ❌ Numba 0.60.0 + llvmlite 0.43.0

NumPy Versions

  • ❌ NumPy 1.24.3
  • ❌ NumPy 1.26.4
  • ❌ NumPy 2.0.1

Environment Fixes Attempted

  • ❌ Manual PATH configuration to include nvvm DLL directories
  • ❌ Using os.add_dll_directory() to add DLL search paths
  • ❌ Installing NVIDIA's cuda-python binding (failed with import errors)
  • ❌ Setting NUMBA_CUDA_USE_NVIDIA_BINDING=1 (import error: cannot import 'cuda' from 'cuda')
  • ❌ Explicitly initializing CUDA context before any operations
  • ❌ Using device.reset() to create fresh primary context
  • ❌ Removing PyTorch CUDA initialization conflicts

CPU Simulation Mode

  • ❌ Setting NUMBA_ENABLE_CUDASIM=1 to run without GPU does not work with this codebase. The code uses cuda.as_cuda_array() which doesn't exist in Numba's CPU simulator, causing:
AttributeError: module 'numba.cuda' has no attribute 'as_cuda_array'

Seems like significant refactoring would be required to support CPU simulation mode.

Root Cause Analysis

The issue appears to be a fundamental incompatibility between Numba's ctypes-based CUDA driver binding and modern CUDA 13.0 drivers on Windows 11. Specifically:

  1. Simple context queries work (like cuda.current_context())
  2. Complex operations fail (like cuda.to_device() which requires memory allocation and JIT compilation)
  3. The nvvm.dll loading failure suggests Numba cannot properly locate or load the NVVM library needed for JIT compilation of CUDA kernels
  4. Windows-specific PATH issues with CUDA 13 where DLLs are in Library\bin\x64 and Library\nvvm\bin\x64 subdirectories that Numba doesn't search

Workarounds

⚠️ Downgrading NVIDIA Driver

Driver versions 531.x or 528.x from 2023 (CUDA 12.1/12.0 era) might stand a chance, but that's obviously asking for trouble as it's very likely to cause problems with other applications.

✅ WSL2

Running in Windows Subsystem for Linux 2 with Ubuntu works! After one more day of debugging, here's the complete solution for RTX 4090 (Ada Lovelace, CC 8.9) on WSL2.

Prerequisites

  • Windows 11 with WSL2 enabled
  • NVIDIA GPU driver 531.x or newer (must support CUDA 13.0)
  • Miniconda installed in WSL2

Step-by-Step Setup

  1. Create a clean conda environment with modern numba-cuda:
conda create -n handheld-wsl -y -c conda-forge -c pytorch \
  python=3.9 numpy=1.26.4 \
  numba-cuda "cuda-version=13" \
  rawpy exifread scipy scikit-image opencv colour-demosaicing matplotlib tqdm
  1. Activate and install PyTorch with CUDA support:
conda activate handheld-wsl
pip install torch==2.0.1 --index-url https://download.pytorch.org/whl/cu118
  1. Install CUDA libraries:
conda install -y -c nvidia cuda-nvvm cuda-nvrtc
  1. Configure environment for RTX 4090 (Ada Lovelace CC 8.9):

Create activation script:

mkdir -p $CONDA_PREFIX/etc/conda/activate.d
cat > $CONDA_PREFIX/etc/conda/activate.d/handheld.sh << 'EOF'
#!/usr/bin/env bash
export NUMBA_FORCE_CUDA_CC=8.6
export LD_LIBRARY_PATH="$CONDA_PREFIX/lib:$LD_LIBRARY_PATH"
EOF
chmod +x $CONDA_PREFIX/etc/conda/activate.d/handheld.sh
  1. Clean Numba cache:
rm -rf ~/.numba
  1. Run the project:
conda activate handheld-wsl
python run_handheld.py --impath test_burst --outpath output.png

Why This Works

  • Modern numba-cuda package: Uses NVIDIA's CUDA Python bindings instead of deprecated ctypes
  • NUMBA_FORCE_CUDA_CC=8.6: Forces compilation for Ampere (CC 8.6), which runs on Ada (CC 8.9) via driver JIT
    • This is NVIDIA's official recommended approach (Ada Compatibility Guide)
    • PTX compiled for 8.x is forward-compatible to higher 8.x/9.x devices
  • Clean CUDA 13 stack: No mixing of CUDA 11/12/13 components
  • PyTorch with CUDA 11.8: Separate CUDA runtime for PyTorch operations

Automated Setup

Use the attached scripts: setup_env.sh and run.sh

bash setup_env.sh    # One-time setup
bash run.sh --impath test_burst --outpath output.png

Key Insight: The Wrong Environment Variable

I initially tried NUMBA_CUDA_DEFAULT_PTX_CC=8.6 which doesn't work. That variable only affects cuda.compile_ptx(), not @cuda.jit decorated functions.

The correct variable is: NUMBA_FORCE_CUDA_CC=8.6


Request

This appears to be a Numba issue with Windows + CUDA 13.0 driver compatibility. Specifically:

  • The ctypes binding to cuCtxGetDevice is getting a NULL pointer
  • NVVM library loading fails on Windows with conda-installed CUDA 13 components
  • The official NVIDIA cuda-python binding doesn't integrate properly with Numba 0.60 on Windows

Would be great if Numba's Windows + CUDA 13 support could be improved. Better error messages could also help diagnose DLL loading issues.

Additional Context

Initially when I set up the conda environment on Windows and tried to run on provided dng bursts I'd get:

OSError: [WinError 182] The operating system cannot run %1. Error loading "C:\ProgramData\miniconda3\envs\handheld\lib\site-packages\torch\lib\shm.dll" or one of its dependencies.

This PyTorch 1.13.1 import error was resolved by upgrading to PyTorch 2.5.1

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions