Skip to content

cifar100 with resnet  #113

@apeterswu

Description

@apeterswu

Hi,

I try to run the resnet-32 model on cifar-100 dataset, with only the difference of the training data in "Deep_Residual_Learning_CIFAR-10.py", but it causes the error like this:

Starting training...
Traceback (most recent call last):
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/theano/compile/function_module.py", line 903, in __call__
    self.fn() if output_subset is None else\
RuntimeError: error getting worksize: CUDNN_STATUS_BAD_PARAM

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "resnet.py", line 390, in <module>
    main(**kwargs)
  File "resnet.py", line 319, in main
    train_err += train_fn(inputs, targets)
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/theano/compile/function_module.py", line 917, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/theano/gof/link.py", line 325, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/theano/compile/function_module.py", line 903, in __call__
    self.fn() if output_subset is None else\
RuntimeError: error getting worksize: CUDNN_STATUS_BAD_PARAM
Apply node that caused the error: GpuDnnConv{algo='small', inplace=True, num_groups=1}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty{dtype='float32', context_name=None}.0, GpuDnnConvDesc{border_mode='half', subsample=(1, 1), dilation=(1, 1), conv_mode='cross', precision='float32', num_groups=1}.0, Constant{1.0}, Constant{0.0})
Toposort index: 399
Inputs types: [GpuArrayType<None>(float32, 4D), GpuArrayType<None>(float32, 4D), GpuArrayType<None>(float32, 4D), <theano.gof.type.CDataType object at 0x7fa464893a20>, Scalar(float32), Scalar(float32)]
Inputs shapes: [(128, 3, 32, 32), (16, 3, 3, 3), (128, 16, 32, 32), 'No shapes', (), ()]
Inputs strides: [(12288, 4096, 128, 4), (108, 36, 12, 4), (65536, 4096, 128, 4), 'No strides', (), ()]
Inputs values: ['not shown', 'not shown', 'not shown', <capsule object NULL at 0x7fa3997c61e0>, 1.0, 0.0]
Outputs clients: [[GpuElemwise{sub,no_inplace}(GpuDnnConv{algo='small', inplace=True, num_groups=1}.0, InplaceGpuDimShuffle{x,0,x,x}.0), GpuContiguous(GpuDnnConv{algo='small', inplace=True, num_groups=1}.0), GpuElemwise{sub,no_inplace}(GpuDnnConv{algo='small', inplace=True, num_groups=1}.0, GpuElemwise{Composite{(((i0 / i1) / i2) / i3)}}[]<gpuarray>.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
  File "resnet.py", line 390, in <module>
    main(**kwargs)
  File "resnet.py", line 267, in main
    prediction = lasagne.layers.get_output(network)
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/lasagne/layers/helper.py", line 197, in get_output
    all_outputs[layer] = layer.get_output_for(layer_inputs, **kwargs)
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/lasagne/layers/conv.py", line 352, in get_output_for
    conved = self.convolve(input, **kwargs)
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/lasagne/layers/conv.py", line 650, in convolve
    **extra_kwargs)

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Seems something wrong happend in the convolution operation. Could you please give any advice? Thanks a lot.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions