Skip to content

Prevent dataframe fragmentation #3

@dansteiert

Description

@dansteiert

Description of feature

Hi,

I am calculating a set of genesets and observed this warning creepying up:

python3.12/site-packages/pyucell/knn.py:71: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling frame.insert many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
adata.obs[f"{col}{suffix}"] = smoothed

as the tool is written in a way to first generate a larger overhead (by ranking all the genes) and then fairly quickly calulating the Statistics, it might be worth considering collecting the results first in a list of series and subsequently concatenate them and finally merge with the adata.obs dataframe.

Something related, for some users (e.g. me) it would be nice to be able to store the rankings outside the function, to prevent the overhead when calculating additional genesets (but this is probably a good tradeoff between memory/space and time).

Cheers and thank you for this tool!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions