Skip to content

Cannot run CUDA-enabled version  #27

@willkill07

Description

@willkill07

**Issue: ** execution of kripke.exe results in illegal memory access

Tagged release 1.2.4 does not exhibit this behavior. I did not perform any sort of bisection to find the culprit, but I suspect it's an issue with RAJA somewhere.

Build environment:

  • GCC 8.4
  • CUDA Toolkit 11.4
  • AMD CPU (Threadripper 3960X)
  • NVIDIA A6000 GPU (compute capability of 8.6)
  • no warnings when building

host-config file:

set(CMAKE_BUILD_TYPE "Release" CACHE STRING "")

set(CMAKE_CXX_FLAGS "" CACHE STRING "")
set(CMAKE_CXX_FLAGS_RELEASE "-O3 -ffast-math" CACHE STRING "")
set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "-O3 -g -ffast-math" CACHE STRING "")
set(CMAKE_CXX_FLAGS_DEBUG "-O0 -g" CACHE STRING "")

set(ENABLE_CHAI On CACHE BOOL "")
set(ENABLE_CUDA On CACHE BOOL "")
set(CUDA_ARCH "sm_86" CACHE STRING "")
set(ENABLE_OPENMP Off CACHE BOOL "")
set(ENABLE_MPI Off CACHE BOOL "")
set(ENABLE_MPI_WRAPPER Off CACHE BOOL "")

set(CMAKE_CUDA_FLAGS "-restrict -gencode=arch=compute_86,code=sm_86" CACHE STRING "")
set(CMAKE_CUDA_FLAGS_RELEASE "-O3 --expt-extended-lambda" CACHE STRING "")
set(CMAKE_CUDA_FLAGS_RELWITHDEBINFO "-O3 -lineinfo --expt-extended-lambda" CACHE STRING "")
set(CMAKE_CUDA_FLAGS_DEBUG "-O0 -g -G --expt-extended-lambda" CACHE STRING "")
set(CMAKE_CUDA_HOST_COMPILER "${CMAKE_CXX_COMPILER}" CACHE STRING "")

Output:

~/kripke$ ./build/bin/kripke.exe

   _  __       _         _
  | |/ /      (_)       | |
  | ' /  _ __  _  _ __  | | __ ___
  |  <  | '__|| || '_ \ | |/ // _ \
  | . \ | |   | || |_) ||   <|  __/
  |_|\_\|_|   |_|| .__/ |_|\_\\___|
                 | |
                 |_|        Version 1.2.5-dev

LLNL-CODE-775068

Copyright (c) 2014-2019, Lawrence Livermore National Security, LLC

Kripke is released under the BSD 3-Clause License, please see the
LICENSE file for the full license

This work was produced under the auspices of the U.S. Department of
Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344.

Author: Adam J. Kunen <kunen1@llnl.gov>

Compilation Options:
  Architecture:           CUDA
  Compiler:               /usr/bin/c++
  Compiler Flags:         "     -Wall -Wextra  "
  Linker Flags:           " "
  CHAI Enabled:           Yes
  CUDA Enabled:           Yes
    NVCC:                 /usr/local/cuda/bin/nvcc
    NVCC Flags:           "-restrict -gencode=arch=compute_86,code=sm_86 -O3 --expt-extended-lambda"
  MPI Enabled:            No
  OpenMP Enabled:         No
  Caliper Enabled:        No

Input Parameters
================

  Problem Size:
    Zones:                 16 x 16 x 16  (4096 total)
    Groups:                32
    Legendre Order:        4
    Quadrature Set:        Dummy S2 with 96 points

  Physical Properties:
    Total X-Sec:           sigt=[0.100000, 0.000100, 0.100000]
    Scattering X-Sec:      sigs=[0.050000, 0.000050, 0.050000]

  Solver Options:
    Number iterations:     10

  MPI Decomposition Options:
    Total MPI tasks:       1
    Spatial decomp:        1 x 1 x 1 MPI tasks
    Block solve method:    Sweep

  Per-Task Options:
    DirSets/Directions:    8 sets, 12 directions/set
    GroupSet/Groups:       2 sets, 16 groups/set
    Zone Sets:             1 x 1 x 1
    Architecture:          CUDA
    Data Layout:           DGZ

Generating Problem
==================

  Decomposition Space:   Procs:      Subdomains (local/global):
  ---------------------  ----------  --------------------------
  (P) Energy:            1           2 / 2
  (Q) Direction:         1           8 / 8
  (R) Space:             1           1 / 1
  (Rx,Ry,Rz) R in XYZ:   1x1x1       1x1x1 / 1x1x1
  (PQR) TOTAL:           1           16 / 16

  Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]

  Memory breakdown of Field variables:
  Field Variable            Num Elements    Megabytes
  --------------            ------------    ---------
  data/sigs                        15360        0.117
  dx                                  16        0.000
  dy                                  16        0.000
  dz                                  16        0.000
  ell                               2400        0.018
  ell_plus                          2400        0.018
  i_plane                         786432        6.000
  j_plane                         786432        6.000
  k_plane                         786432        6.000
  mixelem_to_fraction               4352        0.033
  phi                            3276800       25.000
  phi_out                        3276800       25.000
  psi                           12582912       96.000
  quadrature/w                        96        0.001
  quadrature/xcos                     96        0.001
  quadrature/ycos                     96        0.001
  quadrature/zcos                     96        0.001
  rhs                           12582912       96.000
  sigt_zonal                      131072        1.000
  volume                            4096        0.031
  --------                  ------------    ---------
  TOTAL                         34238832      261.222

  Generation Complete!

Steady State Solve
==================

CUDAassert: an illegal memory access was encountered /home/williamk/kripke/tpl/raja/include/RAJA/policy/cuda/MemUtils_CUDA.hpp 183
terminate called after throwing an instance of 'std::runtime_error'
  what():  CUDAassert
Aborted (core dumped)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions