Skip to content
This repository was archived by the owner on Oct 31, 2023. It is now read-only.
This repository was archived by the owner on Oct 31, 2023. It is now read-only.

Training on A6000 with CUDA 11.7 #21

@DarylWM

Description

@DarylWM

Hello. Thank you for sharing this code; I think it's really interesting work and I'm keen to try it for my dataset. To get started, I'd like to train the example on an A6000 using CUDA 11.7. I tried the specified versions in the environment.yml but need to bump up the CUDA version to get A6000 support. I get this error:

experiments/experiment_1/
backing up... done.
train.py:1301: DeprecationWarning: Starting with ImageIO v3 the behavior of this function will switch to that of iio.v3.imread. To keep the current behavior (and make this warning disappear) use `import imageio.v2 as imageio` or call `imageio.v2.imread` directly.
  return imageio.imread(f, ignoregamma=True) if f[-4:] == ".png" else imageio.imread(f)
Loaded llff (86, 384, 512, 3) (120, 3, 5) [384.         512.         256.60952759] data/example_sequence/
DEFINING BOUNDS
NEAR FAR 0.0021997066447511314 1.0024441480636597
Found ckpts []
start: 0 args.N_iters: 200000
C:\Users\dwil6816\Anaconda3\envs\nrnerf\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:3191.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
get rays
done, concats
(86, 384, 512, 4, 3)
TRAIN views are [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
 72 73 74 75 76 77 78 79 80 81 82 83 84 85]
TEST views are []
VAL views are []
Begin
  0%|                                                                                       | 0/200000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 2016, in <module>
    main_function(args)
  File "train.py", line 1566, in main_function
    losses = parallel_training(
  File "C:\Users\dwil6816\Anaconda3\envs\nrnerf\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\dwil6816\Anaconda3\envs\nrnerf\lib\site-packages\torch\nn\parallel\data_parallel.py", line 169, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "C:\Users\dwil6816\Anaconda3\envs\nrnerf\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "train.py", line 188, in forward
    imageid_to_timestepid[batch_pixel_indices[:, 0]], :
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

Here are the Torch versions I'm using:

torch               1.13.1+cu117
torchaudio          0.13.1+cu117
torchvision         0.14.1+cu117

I'm new to Pytorch so I'm wondering whether there's a global fix or do I need to go through and check where the tensors are allocated?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions