Training on large datasets: memory overhead

Hello, I wanted to train the biLSTM model on human chromosome data (a training set of 15 GB). In terms of hardware, I have 240 GB of RAM memory. The parser runs ok:

<img width="1111" alt="Screenshot 1" src="https://user-images.githubusercontent.com/31892476/128178940-9f13ff13-fb5a-43b2-a2b3-2794845d7d9b.png">

 However, when I execute the training, there is a memory error (it requires 849 GB, which is too much):

<img width="1114" alt="Screenshot 2" src="https://user-images.githubusercontent.com/31892476/128179097-ebc34f3e-bcaf-4455-ae64-4f7447f57c84.png">

I wonder how you have trained on chromosome 1. 

**How many samples have you used and how many snps? What was the tensor size?**
Also, what is the expansion factor from raw data to processed data (input to the model)? From 15 GB to 849 GB it seems too much.

Thank you.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training on large datasets: memory overhead #21

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Training on large datasets: memory overhead #21

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions