Skip to content

consistency in expression estimation #7

@robinredX

Description

@robinredX

From email, could be useful in case of change between current expression estimation script and the development one -

CPM Normalization Strategy:

The manuscript (Page 19, Methods) mentions: "instead of performing CPM normalization of simulated bulks, we normalize each test bulk sample to sum to the mean of sums of simulated samples," which suggests CPM normalization might be avoided. However, in the run_dissect_expr function, it appears that standard CPM normalization is applied (sc.pp.normalize_total(target_sum=1e6)). Could you kindly confirm which normalization strategy is the intended one for the final results reported in the paper?

Consistency Loss and Data Mixing:

The paper describes generating mixed samples ($B^{mix}$) using a mixing parameter $\beta$. In my reading of the run_dissect_expr function, it wasn't immediately clear where this explicit mixing occurs, as the iterators for real and simulated data seem to be processed somewhat separately. I might be missing the specific lines where the mixing happens—could you point me to that logic?

Loss Function Scaling:

I noticed in the code that the input is divided by the number of cell types (/ labels.shape[1]) prior to the loss computation. As I couldn't find a mention of this scaling factor in the manuscript, could you explain the motivation behind this step?

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingdocumentationImprovements or additions to documentationquestionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions