Comment on the section: 3.2.4. Data distribution check

Hi,

Thanks for developing this wonderful  analysis workflow.

I'm running the pipeline outlined in the notebook [LME_Classification.ipynb](https://github.com/BostonGene/LME/blob/main/LME_Classification.ipynb). This section has the following two plots:

<img width="827" alt="image" src="https://github.com/BostonGene/LME/assets/35916509/d82456c9-bc24-46b4-815a-c1ede838a023">


These plots represent the distribution of the mean expression for all genes. The interpretation of it is the following:

_In the plot on the right, the expression values start off very low and then rise before dropping down. This pattern suggests potential RNA degradation, which can compromise the reliability and accuracy of downstream analyses. In contrast, the distribution plot on the left shows good-quality gene expression data. Deviations from such distributions may indicate gene degradation, should be carefully investigated and, if necessary, corrected to ensure high-quality data._

This is how my distribution looks like

<img width="705" alt="image" src="https://github.com/BostonGene/LME/assets/35916509/289cd0b6-5741-4417-be6e-abc25456364e">

However, I don't understand how this should be problematic. A common pre-processing step in any RNA-seq analysis is to exclude lowly expressed genes, which do not contain enough information for robust statistical analysis. This is the plot in my R markdown notebook where I choose the expression to exclude genes:

<img width="670" alt="image" src="https://github.com/BostonGene/LME/assets/35916509/f59dd273-afc3-4c2c-a360-5aaff211096a">


which  looks like the plot on the left. Thus, after filtering all I'm left with is highly expressed, reliable genes. What does that have to do with RNA degradation?

If you could explain it it'd be super useful.

Thanks!

Ramon




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comment on the section: 3.2.4. Data distribution check #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Comment on the section: 3.2.4. Data distribution check #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions