Conversation
|
Thanks for this update, very clearly written Discussion points:
Comments:
|
|
For your second point comment point, I think my interpretation from that conversation was that the clustering would be separate, but we would still pseudobulk within funciton. By all means, we can take that out and put it into the wrapper iterative lsi function though! Do you think it would make sense to do variance and dispersion as just a parameter in the same function instead, rather than separate functions? Overall I agree with your other points. Will reflect here soon! |
|
I was originally thinking clustering + pseudobulk calculation could happen in iterative_lsi. Then we still have a parameter in iterative_lsi to let the user configure how feature selection happens (which could be e.g. |
Pretty much as we discussed during call!
I think the biggest point of contention is the normalization structure. Normalizations like tf-idf, Z-score norms can have data that they will be fit to. However, the biggest problem is that we want something that can interoperate with BPCells operations, while also return the calculated information (mean, variance, idf). Should it follow the same styling as the S3 class for LSI that we are creating, with
cell.embeddings/feature.loadings?I propose just having a boolean param, with the default returning an IterableMatrix, and the other being an option to return a class that we can project with.