-
Notifications
You must be signed in to change notification settings - Fork 24
Description
I am trying to reconstruct a scan with a bunch of diffraction patterns (27.6k) and multiple modes.
While 1,2,3 and 4 modes all work. I can not run the reconstruction with 5 or more modes.
Reducing the cropping size from initially 128 to 64, 32 or even 16 does NOT allow to use more modes.
The problem does scale with the number of GPUs I am using on the machine:
- using 2 GPUs, as described I can run using 1,2,3,4 modes ... and 5+ fails
- using 1 GPU, I can run using
The reconstructions do load, create pods, but fail once they should perform the very first iteration.
The error appears after writing the initial state (iteration 0) to a .ptyr file:
================================================================================
WARNING ptypy - Unable to import optimised cufft version - using cufft with separte callbacks instead
---------------------------------- Autosaving ----------------------------------
Generating copies of probe, object and parameters and runtime
Saving to ----.ptyrTraceback (most recent call last):
Traceback (most recent call last):
File "----", line 269, in
P.run()
~~~~~^^
File "----/ptypy/ptypy/core/ptycho.py", line 784, in run
self.run(engine=engine)
~~~~~~~~^^^^^^^^^^^^^^^
File "----/ptypy/ptypy/core/ptycho.py", line 713, in run
engine.iterate()
~~~~~~~~~~~~~~^^
File "----/ptypy/ptypy/engines/base.py", line 233, in iterate
self.error = self.engine_iterate(niter_contiguous)
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
File "----/ptypy/ptypy/accelerate/cuda_cupy/engines/projectional_cupy_stream.py", line 239, in engine_iterate
PROP.fw(aux, aux)
~~~~~~~^^^^^^^^^^
File "----/ptypy/ptypy/accelerate/cuda_cupy/kernels.py", line 82, in _fw
self._fft1.ft(x, y)
~~~~~~~~~~~~~^^^^^^
File "----", line 269, in
P.run()
~~~~~^^
File ----/ptypy/ptypy/core/ptycho.py", line 784, in run
self.run(engine=engine)
~~~~~~~~^^^^^^^^^^^^^^^
File "----/ptypy/ptypy/core/ptycho.py", line 713, in run
engine.iterate()
~~~~~~~~~~~~~~^^
File "----/ptypy/ptypy/accelerate/cuda_cupy/cufft.py", line 175, in _ft
self._prefilt(x, y)
~~~~~~~~~~~~~^^^^^^
File "----/ptypy/ptypy/accelerate/cuda_cupy/cufft.py", line 152, in _prefilt
self.pre_fft_knl(grid=self.grid,
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
block=self.block,
^^^^^^^^^^^^^^^^^
...<3 lines>...
np.int32(self.arr_shape[0]),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
np.int32(self.arr_shape[1])))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "cupy/_core/raw.pyx", line 93, in cupy._core.raw.RawKernel.call
File "cupy/cuda/function.pyx", line 223, in cupy.cuda.function.Function.call
File "cupy/cuda/function.pyx", line 205, in cupy.cuda.function._launch
File "cupy_backends/cuda/api/driver.pyx", line 273, in cupy_backends.cuda.api.driver.launchKernel
File "cupy_backends/cuda/api/driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_INVALID_VALUE: invalid argument
File "----/ptypy/ptypy/engines/base.py", line 233, in iterate
self.error = self.engine_iterate(niter_contiguous)
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
File "----/ptypy/ptypy/accelerate/cuda_cupy/engines/projectional_cupy_stream.py", line 239, in engine_iterate
PROP.fw(aux, aux)
~~~~~~~^^^^^^^^^^
File "----/ptypy/ptypy/accelerate/cuda_cupy/kernels.py", line 82, in _fw
self._fft1.ft(x, y)
~~~~~~~~~~~~~^^^^^^
File "----/ptypy/ptypy/accelerate/cuda_cupy/cufft.py", line 175, in _ft
self._prefilt(x, y)
~~~~~~~~~~~~~^^^^^^
File "----/ptypy/ptypy/accelerate/cuda_cupy/cufft.py", line 152, in _prefilt
self.pre_fft_knl(grid=self.grid,
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
block=self.block,
^^^^^^^^^^^^^^^^^
...<3 lines>...
np.int32(self.arr_shape[0]),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
np.int32(self.arr_shape[1])))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "cupy/_core/raw.pyx", line 93, in cupy._core.raw.RawKernel.call
File "cupy/cuda/function.pyx", line 223, in cupy.cuda.function.Function.call
File "cupy/cuda/function.pyx", line 205, in cupy.cuda.function._launch
File "cupy_backends/cuda/api/driver.pyx", line 273, in cupy_backends.cuda.api.driver.launchKernel
File "cupy_backends/cuda/api/driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_INVALID_VALUE: invalid argument(/conda_envs/env_ptypy) user@node:path$
Is there some kind of minimum size a pod blocks on GPU memory?
Or is there a maximum number of PODs that can be addressed on a GPU?
I know that running on a node with even more GPUs or across nodes would solve this issue. But it s frustrating that just running smaller cropping does not at all seem to reduce the problem / allow me to run scans with more pods on the same hardware.
In case it matters... With the scan I have at hand I can run 55.200 pods per GPU, but 69 000 PODs per GPU fails. That smells like 2**16 = 65536 is the limit.