-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Description of feature
Hi,
I am calculating a set of genesets and observed this warning creepying up:
python3.12/site-packages/pyucell/knn.py:71: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling frame.insert many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
adata.obs[f"{col}{suffix}"] = smoothed
as the tool is written in a way to first generate a larger overhead (by ranking all the genes) and then fairly quickly calulating the Statistics, it might be worth considering collecting the results first in a list of series and subsequently concatenate them and finally merge with the adata.obs dataframe.
Something related, for some users (e.g. me) it would be nice to be able to store the rankings outside the function, to prevent the overhead when calculating additional genesets (but this is probably a good tradeoff between memory/space and time).
Cheers and thank you for this tool!