Skip to content

CUDA out of memory #12

@antss8

Description

@antss8

CUDA out of memory. Tried to allocate 14.03 GiB. GPU 0 has a total capacity of 79.14 GiB of which 8.49 GiB is free. Process 17884 has 70.64 GiB memory in use. Of the allocated memory 70.14 GiB is allocated by PyTorch, and 18.26 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

After I entered a 16000 * 61000 matrix and rd_trainer.train() , I get these error messages.
Does RegDiffusion need such a large amount of memory to support training? The A800 graphics card we used already has a large enough.
Is it possible to do distributed arithmetic to prevent large amounts of data from being loaded into the memory?

Here is my code:
import numpy as np
import pandas as pd
import scanpy as sc
import regdiffusion as rd
import loompy as lp
from pyscenic.rss import regulon_specificity_scores
from pyscenic.plotting import plot_rss
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme(style="white")

wd = './'
data_path = f'{wd}/subT.loom'
cisdb_path = f'{wd}/TF-motif.DB/Gh_cistarget_database/Gh.regions_vs_motifs.rankings.feather'

adata = sc.read_loom(data_path, sparse=True)
cisdb = pd.read_feather(cisdb_path)
adata = adata[:, adata.var_names.isin(cisdb.columns)]
x = adata.X.toarray()
rd_trainer = rd.RegDiffusionTrainer(x)
rd_trainer.train()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions