WARNING:root:NaN or Inf found in input tensor.

GPU: 1060 6Gb

`❯ python train.py -c config.json -p train_config.output_directory=outdir
train_config.output_directory=outdir
output_directory=outdir
{'train_config': {'output_directory': 'outdir', 'epochs': 10000000, 'learning_rate': 0.0001, 'weight_decay': 1e-06, 'sigma': 1.0, 'iters_per_checkpoint': 5000, 'batch_size': 1, 'seed': 1234, 'checkpoint_path': '', 'ignore_layers': [], 'include_layers': ['speaker', 'encoder', 'embedding'], 'warmstart_checkpoint_path': '', 'with_tensorboard': True, 'fp16_run': False}, 'data_config': {'training_files': 'filelists/ljs_audiopaths_text_sid_train_filelist.txt', 'validation_files': 'filelists/ljs_audiopaths_text_sid_val_filelist.txt', 'text_cleaners': ['flowtron_cleaners'], 'p_arpabet': 0.5, 'cmudict_path': 'data/cmudict_dictionary', 'sampling_rate': 22050, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'mel_fmin': 0.0, 'mel_fmax': 8000.0, 'max_wav_value': 32768.0}, 'dist_config': {'dist_backend': 'nccl', 'dist_url': 'tcp://localhost:54321'}, 'model_config': {'n_speakers': 1, 'n_speaker_dim': 128, 'n_text': 185, 'n_text_dim': 512, 'n_flows': 2, 'n_mel_channels': 80, 'n_attn_channels': 640, 'n_hidden': 1024, 'n_lstm_layers': 2, 'mel_encoder_n_hidden': 512, 'n_components': 0, 'mean_scale': 0.0, 'fixed_gaussian': True, 'dummy_speaker_embedding': False, 'use_gate_layer': True}}
> got rank 0 and world size 1 ...
Flowtron(
  (speaker_embedding): Embedding(1, 128)
  (embedding): Embedding(185, 512)
  (flows): ModuleList(
    (0): AR_Step(
      (conv): Conv1d(1024, 160, kernel_size=(1,), stride=(1,))
      (lstm): LSTM(1664, 1024, num_layers=2)
      (attention_lstm): LSTM(80, 1024)
      (attention_layer): Attention(
        (softmax): Softmax(dim=2)
        (query): LinearNorm(
          (linear_layer): Linear(in_features=1024, out_features=640, bias=False)
        )
        (key): LinearNorm(
          (linear_layer): Linear(in_features=640, out_features=640, bias=False)
        )
        (value): LinearNorm(
          (linear_layer): Linear(in_features=640, out_features=640, bias=False)
        )
        (v): LinearNorm(
          (linear_layer): Linear(in_features=640, out_features=1, bias=False)
        )
      )
      (dense_layer): DenseLayer(
        (layers): ModuleList(
          (0): LinearNorm(
            (linear_layer): Linear(in_features=1024, out_features=1024, bias=True)
          )
          (1): LinearNorm(
            (linear_layer): Linear(in_features=1024, out_features=1024, bias=True)
          )
        )
      )
    )
    (1): AR_Back_Step(
      (ar_step): AR_Step(
        (conv): Conv1d(1024, 160, kernel_size=(1,), stride=(1,))
        (lstm): LSTM(1664, 1024, num_layers=2)
        (attention_lstm): LSTM(80, 1024)
        (attention_layer): Attention(
          (softmax): Softmax(dim=2)
          (query): LinearNorm(
            (linear_layer): Linear(in_features=1024, out_features=640, bias=False)
          )
          (key): LinearNorm(
            (linear_layer): Linear(in_features=640, out_features=640, bias=False)
          )
          (value): LinearNorm(
            (linear_layer): Linear(in_features=640, out_features=640, bias=False)
          )
          (v): LinearNorm(
            (linear_layer): Linear(in_features=640, out_features=1, bias=False)
          )
        )
        (dense_layer): DenseLayer(
          (layers): ModuleList(
            (0): LinearNorm(
              (linear_layer): Linear(in_features=1024, out_features=1024, bias=True)
            )
            (1): LinearNorm(
              (linear_layer): Linear(in_features=1024, out_features=1024, bias=True)
            )
          )
        )
        (gate_layer): LinearNorm(
          (linear_layer): Linear(in_features=1664, out_features=1, bias=True)
        )
      )
    )
  )
  (encoder): Encoder(
    (convolutions): ModuleList(
      (0): Sequential(
        (0): ConvNorm(
          (conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
        )
        (1): InstanceNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      )
      (1): Sequential(
        (0): ConvNorm(
          (conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
        )
        (1): InstanceNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      )
      (2): Sequential(
        (0): ConvNorm(
          (conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
        )
        (1): InstanceNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
      )
    )
    (lstm): LSTM(512, 256, batch_first=True, bidirectional=True)
  )
)
Number of speakers : 1
output directory outdir
Epoch: 0
C:\AI_Research_Project\flowtron\data.py:40: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  ..\torch\csrc\utils\tensor_numpy.cpp:141.)
  return torch.from_numpy(data).float(), sampling_rate
C:\AI_Research_Project\flowtron\flowtron.py:373: UserWarning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (Triggered internally at  ..\aten\src\ATen\native\cuda\LegacyDefinitions.cpp:19.)
  self.score_mask_value)
0:      nan
WARNING:root:NaN or Inf found in input tensor.
C:\AI_Research_Project\flowtron\data.py:40: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  ..\torch\csrc\utils\tensor_numpy.cpp:141.)
  return torch.from_numpy(data).float(), sampling_rate
Mean None
LogVar None
Prob None
Validation loss 0:       nan
WARNING:root:NaN or Inf found in input tensor.
Saving model and optimizer state at iteration 0 to outdir/model_0
1:      nan
WARNING:root:NaN or Inf found in input tensor.
2:      nan`

And that error keeps on going
EDIT: It just finished doing it's thing

After 319: nan

`319:    nan
WARNING:root:NaN or Inf found in input tensor.
Traceback (most recent call last):
  File "train.py", line 300, in <module>
    train(n_gpus, rank, **train_config)
  File "train.py", line 238, in train
    loss.backward()
  File "C:\Users\vladc\anaconda3\lib\site-packages\torch\tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "C:\Users\vladc\anaconda3\lib\site-packages\torch\autograd\__init__.py", line 127, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: transform: failed to synchronize: cudaErrorLaunchFailure: unspecified launch failure`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WARNING:root:NaN or Inf found in input tensor. #54

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

WARNING:root:NaN or Inf found in input tensor. #54

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions