Multiple GPUs is broken

Hi Yala! 

Great package. Just letting you know, though, that computation on multiple GPU's is broken for two reasons:

1. The `model.py` file does not import 
``import torch.nn as nn``
that's an easy fix.

2. You have some class-attribute dependencies that are single-thread bound.
https://github.com/pytorch/pytorch/issues/8637

I'm not sure exactly what they are, but here is my error message, which matches the one in the issue I linked to above:

```
Traceback (most recent call last):
  File "scripts/main.py", line 35, in <module>
    epoch_stats, model, gen = train.train_model(train_data, dev_data, model, gen, args)
  File "/auto/rcf-proj/ef/spangher/newspaper-pages/text_nn/rationale_net/learn/train.py", line 59, in train_model
    args=args)
  File "/auto/rcf-proj/ef/spangher/newspaper-pages/text_nn/rationale_net/learn/train.py", line 198, in run_epoch
    mask, z = gen(x_indx)
  File "/home/rcf-40/spangher/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/rcf-40/spangher/.local/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/rcf-40/spangher/.local/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/rcf-40/spangher/.local/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
    output.reraise()
  File "/home/rcf-40/spangher/.local/lib/python3.7/site-packages/torch/_utils.py", line 369, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
  File "/home/rcf-40/spangher/.local/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
    output = module(*input, **kwargs)
  File "/home/rcf-40/spangher/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/auto/rcf-proj/ef/spangher/newspaper-pages/text_nn/rationale_net/models/generator.py", line 55, in forward
    activ = self.cnn(x)
  File "/home/rcf-40/spangher/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/auto/rcf-proj/ef/spangher/newspaper-pages/text_nn/rationale_net/models/cnn.py", line 55, in forward
    activ = self._conv(x)
  File "/auto/rcf-proj/ef/spangher/newspaper-pages/text_nn/rationale_net/models/cnn.py", line 41, in _conv
    next_activ.append( conv(padded_activ) )
  File "/home/rcf-40/spangher/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/rcf-40/spangher/.local/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 200, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)```

Alex

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple GPUs is broken #13

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Multiple GPUs is broken #13

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions