Skip to content

Training on large datasets: memory overhead #21

@margaritageleta

Description

@margaritageleta

Hello, I wanted to train the biLSTM model on human chromosome data (a training set of 15 GB). In terms of hardware, I have 240 GB of RAM memory. The parser runs ok:

Screenshot 1

However, when I execute the training, there is a memory error (it requires 849 GB, which is too much):

Screenshot 2

I wonder how you have trained on chromosome 1.

How many samples have you used and how many snps? What was the tensor size?
Also, what is the expansion factor from raw data to processed data (input to the model)? From 15 GB to 849 GB it seems too much.

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions