Skip to content

HDFS Train is failing #51

@srinivaskiranj

Description

@srinivaskiranj

HDFS Training is failing as the fixed_window for a min_len of 10 is returning empty array . This is because the hdfs/train file has only seq number and because of the samples are 0

head ../output/hdfs/train
6
13
1
9
3
5
3
3
3
========================================
 10%|████████████████▋                                                                                                                                                      | 213/2131 [00:00<00:00, 657385.40it/s]
 => Total available sequences: 0
 => Using 21 samples for validation
Traceback (most recent call last):
  File "logbert-main/HDFS/logbert.py", line 103, in <module>
    Trainer(options).train()
  File "logbert-main/HDFS/../bert_pytorch/train_log.py", line 62, in train
    logkey_train, logkey_valid, time_train, time_valid = generate_train_valid(self.output_path + "train", window_size=self.window_size,
  File "logbert-main/HDFS/../bert_pytorch/dataset/sample.py", line 99, in generate_train_valid
    logkey_trainset, logkey_validset, time_trainset, time_validset = train_test_split(logkey_seq_pairs,
  File ".local/lib/python3.10/site-packages/sklearn/utils/_param_validation.py", line 216, in wrapper
    return func(*args, **kwargs)
  File ".local/lib/python3.10/site-packages/sklearn/model_selection/_split.py", line 2851, in train_test_split
    n_train, n_test = _validate_shuffle_split(
  File ".local/lib/python3.10/site-packages/sklearn/model_selection/_split.py", line 2426, in _validate_shuffle_split
    raise ValueError(
ValueError: test_size=21 should be either positive and smaller than the number of samples 0 or a float in the (0, 1) range

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions