Numba CUDA Access Violation on Windows 11 with Modern NVIDIA Drivers

Spent around 3 days trying to get this to work on Windows, here are my findings.

## Environment

- **OS**: Windows 11 23H2 (Build 22631.5909)
- **GPU**: NVIDIA GeForce RTX 4090
- **NVIDIA Driver**: 581.29 (reports CUDA 13.0 capability)
- **Python**: 3.9.23
- **Numba**: Tested with 0.57.1, 0.58.1, 0.60.0
- **NumPy**: Tested with 1.24.3, 1.26.4, 2.0.1
- **PyTorch**: 2.5.1+cu121 (installed but not causing the issue)

## Problem Description

The script fails with an **access violation** when Numba attempts to use CUDA, specifically when calling `cuda.to_device()`. The error occurs at the CUDA driver API level in `cuCtxGetDevice`.

### Error Traceback

```python
OSError: exception: access violation reading 0x0000000000000000
  File "handheld_super_resolution/super_resolution.py", line 108, in main
    cuda_ref_img = cuda.to_device(ref_img)
  File "numba/cuda/cudadrv/devices.py", line 232, in _require_cuda_context
    return fn(*args, **kws)
  [...]
  File "numba/cuda/cudadrv/driver.py", line 505, in __enter__
    driver.cuCtxGetDevice(byref(hdevice))
  File "numba/cuda/cudadrv/driver.py", line 326, in safe_cuda_api_call
    retcode = libfn(*args)
```

## Key Findings

1. **CUDA context creation works in isolation**: Running `cuda.current_context()` directly succeeds and prints the GPU name correctly
2. **The failure happens specifically in `cuda.to_device()`**: The crash occurs when Numba tries to use the context for actual operations
3. **Diagnostic output shows nvvm.dll loading issue**: While nvrtc and cudart load successfully, nvvm.dll fails to load via simple DLL name lookup

### Diagnostic Output

```
Finding nvvm from CUDA_HOME
    Located at nvvm.dll
    Trying to open library...       ERROR: failed to open nvvm:
Could not find module 'nvvm.dll' (or one of its dependencies).

Finding nvrtc from CUDA_HOME
    Located at C:\ProgramData\miniconda3\envs\handheld\Library\bin\nvrtc64_130_0.dll
    Trying to open library...       ok

Finding cudart from CUDA_HOME
    Located at C:\ProgramData\miniconda3\envs\handheld\Library\bin\cudart64_13.dll
    Trying to open library...       ok

[Simple context test]
Context created OK
Device: b'NVIDIA GeForce RTX 4090'
```

## What I Tried (All Failed)

### CUDA Toolkit Versions
- ❌ CUDA 11.8 stack (cudatoolkit + PyTorch CUDA 11.8)
- ❌ CUDA 12.1 stack (cuda-nvrtc 12.1 + PyTorch CUDA 12.1)
- ❌ CUDA 13.0 stack (cuda-nvrtc 13.0.88 + cuda-nvvm 13.0.88 + cuda-cudart 13.0.96)

### Numba Versions
- ❌ Numba 0.57.1 + llvmlite 0.40.1
- ❌ Numba 0.58.1 + llvmlite 0.41.1
- ❌ Numba 0.60.0 + llvmlite 0.43.0

### NumPy Versions
- ❌ NumPy 1.24.3
- ❌ NumPy 1.26.4
- ❌ NumPy 2.0.1

### Environment Fixes Attempted
- ❌ Manual PATH configuration to include nvvm DLL directories
- ❌ Using `os.add_dll_directory()` to add DLL search paths
- ❌ Installing NVIDIA's cuda-python binding (failed with import errors)
- ❌ Setting `NUMBA_CUDA_USE_NVIDIA_BINDING=1` (import error: cannot import 'cuda' from 'cuda')
- ❌ Explicitly initializing CUDA context before any operations
- ❌ Using `device.reset()` to create fresh primary context
- ❌ Removing PyTorch CUDA initialization conflicts

### CPU Simulation Mode
- ❌ Setting `NUMBA_ENABLE_CUDASIM=1` to run without GPU **does not work** with this codebase. The code uses `cuda.as_cuda_array()` which doesn't exist in Numba's CPU simulator, causing:

```python
AttributeError: module 'numba.cuda' has no attribute 'as_cuda_array'
```

Seems like significant refactoring would be required to support CPU simulation mode.

## Root Cause Analysis

The issue appears to be a **fundamental incompatibility between Numba's ctypes-based CUDA driver binding and modern CUDA 13.0 drivers on Windows 11**. Specifically:

1. **Simple context queries work** (like `cuda.current_context()`)
2. **Complex operations fail** (like `cuda.to_device()` which requires memory allocation and JIT compilation)
3. **The nvvm.dll loading failure** suggests Numba cannot properly locate or load the NVVM library needed for JIT compilation of CUDA kernels
4. **Windows-specific PATH issues** with CUDA 13 where DLLs are in `Library\bin\x64` and `Library\nvvm\bin\x64` subdirectories that Numba doesn't search

## Workarounds

### ⚠️ Downgrading NVIDIA Driver
Driver versions 531.x or 528.x from 2023 (CUDA 12.1/12.0 era) might stand a chance, but that's obviously asking for trouble as it's very likely to cause problems with other applications.

### ✅ WSL2
Running in Windows Subsystem for Linux 2 with Ubuntu works! After one more day of debugging, here's the complete solution for RTX 4090 (Ada Lovelace, CC 8.9) on WSL2.

#### Prerequisites
- Windows 11 with WSL2 enabled
- NVIDIA GPU driver 531.x or newer (must support CUDA 13.0)
- Miniconda installed in WSL2

#### Step-by-Step Setup

1. **Create a clean conda environment with modern numba-cuda**:
```bash
conda create -n handheld-wsl -y -c conda-forge -c pytorch \
  python=3.9 numpy=1.26.4 \
  numba-cuda "cuda-version=13" \
  rawpy exifread scipy scikit-image opencv colour-demosaicing matplotlib tqdm
```

2. **Activate and install PyTorch with CUDA support**:
```bash
conda activate handheld-wsl
pip install torch==2.0.1 --index-url https://download.pytorch.org/whl/cu118
```

3. **Install CUDA libraries**:
```bash
conda install -y -c nvidia cuda-nvvm cuda-nvrtc
```

4. **Configure environment for RTX 4090** (Ada Lovelace CC 8.9):

Create activation script:
```bash
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
cat > $CONDA_PREFIX/etc/conda/activate.d/handheld.sh << 'EOF'
#!/usr/bin/env bash
export NUMBA_FORCE_CUDA_CC=8.6
export LD_LIBRARY_PATH="$CONDA_PREFIX/lib:$LD_LIBRARY_PATH"
EOF
chmod +x $CONDA_PREFIX/etc/conda/activate.d/handheld.sh
```

5. **Clean Numba cache**:
```bash
rm -rf ~/.numba
```

6. **Run the project**:
```bash
conda activate handheld-wsl
python run_handheld.py --impath test_burst --outpath output.png
```

#### Why This Works

- **Modern numba-cuda package**: Uses NVIDIA's CUDA Python bindings instead of deprecated ctypes
- **NUMBA_FORCE_CUDA_CC=8.6**: Forces compilation for Ampere (CC 8.6), which runs on Ada (CC 8.9) via driver JIT
  - This is NVIDIA's official recommended approach (Ada Compatibility Guide)
  - PTX compiled for 8.x is forward-compatible to higher 8.x/9.x devices
- **Clean CUDA 13 stack**: No mixing of CUDA 11/12/13 components
- **PyTorch with CUDA 11.8**: Separate CUDA runtime for PyTorch operations

#### Automated Setup

Use the attached scripts: [setup_env.sh](https://github.com/user-attachments/files/22997402/setup_env.sh) and [run.sh](https://github.com/user-attachments/files/22997405/run.sh)
```bash
bash setup_env.sh    # One-time setup
bash run.sh --impath test_burst --outpath output.png
```

#### Key Insight: The Wrong Environment Variable

I initially tried `NUMBA_CUDA_DEFAULT_PTX_CC=8.6` which **doesn't work**. That variable only affects `cuda.compile_ptx()`, not `@cuda.jit` decorated functions.

The correct variable is: **`NUMBA_FORCE_CUDA_CC=8.6`**

---

## Request

This appears to be a Numba issue with Windows + CUDA 13.0 driver compatibility. Specifically:
- The ctypes binding to `cuCtxGetDevice` is getting a NULL pointer
- NVVM library loading fails on Windows with conda-installed CUDA 13 components
- The official NVIDIA cuda-python binding doesn't integrate properly with Numba 0.60 on Windows

Would be great if Numba's Windows + CUDA 13 support could be improved. Better error messages could also help diagnose DLL loading issues.

## Additional Context

Initially when I set up the conda environment on Windows and tried to run on provided dng bursts I'd get:
```
OSError: [WinError 182] The operating system cannot run %1. Error loading "C:\ProgramData\miniconda3\envs\handheld\lib\site-packages\torch\lib\shm.dll" or one of its dependencies.
```
This PyTorch 1.13.1 import error was resolved by upgrading to PyTorch 2.5.1

## Related Issues

- numba/numba#8304 (CUDA 12 Windows compatibility)
- NVIDIA/numba-cuda#452 (Windows path issues with nvvm/nvrtc)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Numba CUDA Access Violation on Windows 11 with Modern NVIDIA Drivers #48

Environment

Problem Description

Error Traceback

Key Findings

Diagnostic Output

What I Tried (All Failed)

CUDA Toolkit Versions

Numba Versions

NumPy Versions

Environment Fixes Attempted

CPU Simulation Mode

Root Cause Analysis

Workarounds

⚠️ Downgrading NVIDIA Driver

✅ WSL2

Prerequisites

Step-by-Step Setup

Why This Works

Automated Setup

Key Insight: The Wrong Environment Variable

Request

Additional Context

Related Issues

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Numba CUDA Access Violation on Windows 11 with Modern NVIDIA Drivers #48

Description

Environment

Problem Description

Error Traceback

Key Findings

Diagnostic Output

What I Tried (All Failed)

CUDA Toolkit Versions

Numba Versions

NumPy Versions

Environment Fixes Attempted

CPU Simulation Mode

Root Cause Analysis

Workarounds

⚠️ Downgrading NVIDIA Driver

✅ WSL2

Prerequisites

Step-by-Step Setup

Why This Works

Automated Setup

Key Insight: The Wrong Environment Variable

Request

Additional Context

Related Issues

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions