Skip to content

bend run-cu "Failed to launch kernels: invalid argument" on WSL2 (Ubuntu 22.04, GTX 1660 Ti, CUDA 12.4) #752

@siddantharvind

Description

@siddantharvind

Reproducing the behavior

Problem:
When attempting to run a parallelizable Bend program (e.g., parallel_sum.bend) using the CUDA interpreter (bend run-cu), the command consistently fails with the error: "Failed to launch kernels. Error code: invalid argument. Errors: HVM output had no result (An error likely occurred)".

Steps to Reproduce:

  1. System Setup:

    • Windows 11 Host with NVIDIA GeForce GTX 1660 Ti.
    • WSL2 running Ubuntu 22.04.
    • NVIDIA Driver Version on Windows: 552.44 (CUDA Version: 12.4).
    • CUDA Toolkit 12.4.1 installed in WSL2 via NVIDIA's official local .deb method.
    • hvm and bend-lang installed via cargo install, and subsequently uninstalled/reinstalled/cargo cleaned multiple times to ensure linking against correct CUDA 12.4.
  2. Prepare parallel_sum.bend:

    • Create a file named parallel_sum.bend with the following content (summing 1 to 10 for basic correctness verification):
      def Sum(start, target):
        if start == target:
          return start
        else:
          half = (start + target) / 2
          left = Sum(start, half)
          right = Sum(half + 1, target)
          return left + right
      
      def main():
        return Sum(1, 10)
      
  3. Run the command in WSL2 terminal (from the directory containing parallel_sum.bend):

    bend run-cu parallel_sum.bend -s

Expected Behavior:
The program should execute on the GPU, calculate the sum correctly (Result: 55), and display high MIPS and very low execution time, without any kernel launch errors.

Actual Behavior:
The command consistently outputs:
Failed to launch kernels. Error code: invalid argument.
Errors:
HVM output had no result (An error likely occurred)

System Settings

System Settings*

Your System's settings

* Operating System (Host): Windows 11
* WSL2 Distribution: Ubuntu 22.04 LTS
* GPU: NVIDIA GeForce GTX 1660 Ti
* NVIDIA Windows Driver Version (from `nvidia-smi` on Windows):
    
    NVIDIA-SMI 552.44          Driver Version: 552.44          CUDA Version: 12.4
    
* CUDA Toolkit Version (from `nvcc --version` in WSL2):
    
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2024 NVIDIA Corporation
    Built on Thu_Mar_28_02:18:24_PDT_2024
    Cuda compilation tools, release 12.4, V12.4.131
    Build cuda_12.4.r12.4/compiler.34097967_0
    
* Bend Version (from `bend --version`): bend-lang 0.2.38
* HVM Version (from `hvm --version`): hvm 2.0.22
* GCC Version (from `gcc --version`): 
     gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
     Copyright (C) 2021 Free Software Foundation, Inc.
     This is free software; see the source for copying conditions.  There is NO
     warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

* `PATH` environment variable in WSL2:
    `/usr/local/cuda-12.4/bin`
* `LD_LIBRARY_PATH` environment variable in WSL2:
     `/usr/local/cuda-12.4/lib64`

### Additional context

* **Successful `deviceQuery`:** Crucially, NVIDIA's own `deviceQuery` sample (from CUDA Samples v12.4) compiles and runs successfully within the same WSL2 environment, returning `Result = PASS`. This indicates the fundamental CUDA installation and GPU access via WSL2 is functional and not the direct cause of the issue.
    * Path to `deviceQuery` for reference: `/usr/local/cuda-12.4/samples/1_Utilities/deviceQuery`
* **Correct CPU Interpreter Results:**
    * `bend run-rs parallel_sum.bend -s` returns `Result: 55` and executes (though slower).
    * `bend run-c parallel_sum.bend -s` returns `Result: 55` and executes (faster than `run-rs`).
    * This confirms Bendlang's core logic and CPU interpreters are working correctly for the small sum, ruling out a general Bendlang parsing or mathematical error for this specific program. The issue is isolated to the `run-cu` backend.
* **Troubleshooting Steps Taken:**
    * Attempted multiple uninstalls and reinstalls of CUDA Toolkit 12.4.1 (deb local) following NVIDIA's official instructions.
    * Performed aggressive `cargo clean`, `rm -rf ~/.cargo/registry`, `rm -rf ~/.cargo/git` before reinstalling `hvm` and `bend-lang` to ensure clean builds against CUDA 12.4.
    * Confirmed CUDA `PATH` and `LD_LIBRARY_PATH` variables are correctly set and pointing to `cuda-12.4`.
    * Created a symbolic link from `/usr/local/cuda` to `/usr/local/cuda-12.4` to match typical `Makefile` expectations.
    * Ensured WSL2 GPU passthrough is active (`nvidia-smi` works inside WSL2).

This issue seems specific to `bend run-cu`'s interaction with the CUDA runtime in this WSL2 environment, despite a seemingly healthy underlying CUDA installation. Any guidance or potential debug flags would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions