Skip to content

Frequent Error #7

@tchaton

Description

@tchaton

Config:

python 3.6.4
torch 1.2.0

This error is frequently triggered stopping the training.

Traceback (most recent call last):
File "learning/main.py", line 607, in
main()
File "learning/main.py", line 455, in main
train_metrics, _ = train()
File "learning/main.py", line 296, in train
outputs = model.ecc(embeddings[0], clouds_data[4:6])
File "/home/thomas/.pyenv/versions/spg3.6.4/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/thomas/HELIX/superpoint-graph-job/superpointgraph2/learning/../learning/graphnet.py", line 145, in forward
input = module(input)
File "/home/thomas/.pyenv/versions/spg3.6.4/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/thomas/HELIX/superpoint-graph-job/superpointgraph2/learning/../learning/modules.py", line 88, in forward
input = ecc.GraphConvFunction(nc, nc, idxn, idxe, degs, degs_gpu, self._edge_mem_limit)(hx, weights)
File "/home/thomas/HELIX/superpoint-graph-job/superpointgraph2/learning/../learning/ecc/GraphConvModule.py", line 67, in forward
cuda_kernels.conv_aggregate_fw(output.narrow(0,startd,numd), products.view(-1,self._out_channels), self._degs_gpu.narrow(0,startd,numd))
File "/home/thomas/HELIX/superpoint-graph-job/superpointgraph2/learning/../learning/ecc/cuda_kernels.py", line 123, in conv_aggregate_fw
csdegs = torch.cumsum(degs,0)
RuntimeError: scan failed to synchronize: an illegal memory access was encountered

Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 193, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 82, in cupy.cuda.driver.check_status
TypeError: 'NoneType' object is not callable
Exception ignored in: 'cupy.cuda.function.Module.dealloc'
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 193, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 82, in cupy.cuda.driver.check_status
TypeError: 'NoneT

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions