DM_cupy kernel crashing when too many pods are involved

I am trying to reconstruct a scan with a bunch of diffraction patterns (27.6k) and multiple modes.
While 1,2,3 and 4 modes all work. I can not run the reconstruction with 5 or more modes.
Reducing the cropping size from initially 128 to 64, 32 or even 16 does **NOT** allow to use more modes. 

The problem does scale with the number of GPUs I am using on the machine:
- using 2 GPUs, as described I can run using 1,2,3,4 modes ... and 5+ fails
- using 1 GPU, I can run using 

The reconstructions do load, create pods, but fail once they should perform the very first iteration.
The error appears after writing the initial state (iteration 0) to a .ptyr file:

> 
> ================================================================================
> WARNING ptypy - Unable to import optimised cufft version - using cufft with separte callbacks instead
> ---------------------------------- Autosaving ----------------------------------
> Generating copies of probe, object and parameters and runtime
> Saving to ----.ptyr
> --------------------------------------------------------------------------------
> 
> Traceback (most recent call last):
> Traceback (most recent call last):
>   File "----", line 269, in <module>
>     P.run()
>     ~~~~~^^
>   File "----/ptypy/ptypy/core/ptycho.py", line 784, in run
>     self.run(engine=engine)
>     ~~~~~~~~^^^^^^^^^^^^^^^
>   File "----/ptypy/ptypy/core/ptycho.py", line 713, in run
>     engine.iterate()
>     ~~~~~~~~~~~~~~^^
>   File "----/ptypy/ptypy/engines/base.py", line 233, in iterate
>     self.error = self.engine_iterate(niter_contiguous)
>                  ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
>   File "----/ptypy/ptypy/accelerate/cuda_cupy/engines/projectional_cupy_stream.py", line 239, in engine_iterate
>     PROP.fw(aux, aux)
>     ~~~~~~~^^^^^^^^^^
>   File "----/ptypy/ptypy/accelerate/cuda_cupy/kernels.py", line 82, in _fw
>     self._fft1.ft(x, y)
>     ~~~~~~~~~~~~~^^^^^^
>   File "----", line 269, in <module>
>     P.run()
>     ~~~~~^^
>   File ----/ptypy/ptypy/core/ptycho.py", line 784, in run
>     self.run(engine=engine)
>     ~~~~~~~~^^^^^^^^^^^^^^^
>   File "----/ptypy/ptypy/core/ptycho.py", line 713, in run
>     engine.iterate()
>     ~~~~~~~~~~~~~~^^
>   File "----/ptypy/ptypy/accelerate/cuda_cupy/cufft.py", line 175, in _ft
>     self._prefilt(x, y)
>     ~~~~~~~~~~~~~^^^^^^
>   File "----/ptypy/ptypy/accelerate/cuda_cupy/cufft.py", line 152, in _prefilt
>     self.pre_fft_knl(grid=self.grid,
>     ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
>                      block=self.block,
>                      ^^^^^^^^^^^^^^^^^
>     ...<3 lines>...
>                            np.int32(self.arr_shape[0]),
>                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>                            np.int32(self.arr_shape[1])))
>                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "cupy/_core/raw.pyx", line 93, in cupy._core.raw.RawKernel.__call__
>   File "cupy/cuda/function.pyx", line 223, in cupy.cuda.function.Function.__call__
>   File "cupy/cuda/function.pyx", line 205, in cupy.cuda.function._launch
>   File "cupy_backends/cuda/api/driver.pyx", line 273, in cupy_backends.cuda.api.driver.launchKernel
>   File "cupy_backends/cuda/api/driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
> cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_INVALID_VALUE: invalid argument
>   File "----/ptypy/ptypy/engines/base.py", line 233, in iterate
>     self.error = self.engine_iterate(niter_contiguous)
>                  ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
>   File "----/ptypy/ptypy/accelerate/cuda_cupy/engines/projectional_cupy_stream.py", line 239, in engine_iterate
>     PROP.fw(aux, aux)
>     ~~~~~~~^^^^^^^^^^
>   File "----/ptypy/ptypy/accelerate/cuda_cupy/kernels.py", line 82, in _fw
>     self._fft1.ft(x, y)
>     ~~~~~~~~~~~~~^^^^^^
>   File "----/ptypy/ptypy/accelerate/cuda_cupy/cufft.py", line 175, in _ft
>     self._prefilt(x, y)
>     ~~~~~~~~~~~~~^^^^^^
>   File "----/ptypy/ptypy/accelerate/cuda_cupy/cufft.py", line 152, in _prefilt
>     self.pre_fft_knl(grid=self.grid,
>     ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
>                      block=self.block,
>                      ^^^^^^^^^^^^^^^^^
>     ...<3 lines>...
>                            np.int32(self.arr_shape[0]),
>                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>                            np.int32(self.arr_shape[1])))
>                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "cupy/_core/raw.pyx", line 93, in cupy._core.raw.RawKernel.__call__
>   File "cupy/cuda/function.pyx", line 223, in cupy.cuda.function.Function.__call__
>   File "cupy/cuda/function.pyx", line 205, in cupy.cuda.function._launch
>   File "cupy_backends/cuda/api/driver.pyx", line 273, in cupy_backends.cuda.api.driver.launchKernel
>   File "cupy_backends/cuda/api/driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
> cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_INVALID_VALUE: invalid argument
> 
> (/conda_envs/env_ptypy) user@node:path$ 

Is there some kind of minimum size a pod blocks on GPU memory?
Or is there a maximum number of PODs that can be addressed on a GPU?

I know that running on a node with even more GPUs or across nodes would solve this issue. But it s frustrating that just running smaller cropping does not at all seem to reduce the problem / allow me to run scans with more pods on the same hardware.

In case it matters... With the scan I have at hand I can run 55.200 pods per GPU, but 69 000 PODs per GPU fails. That smells like 2**16 = 65536 is the limit. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DM_cupy kernel crashing when too many pods are involved #622

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DM_cupy kernel crashing when too many pods are involved #622

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions