format of expression matrix

Hi, 

I am trying to compute the gsea scores, using the following (similar to the given example code). 

Read signatures
`gmt = read_gene_sets('./signatures/gene_signatures.gmt')  # GMT format like in MSIGdb`

Read expressions
`counts = pd.read_csv("../../Data/RNAseq/TCGA_tpm_LUAD.txt", sep="\t")
counts_transformed = np.log2(counts + 1)`

Calc signature scores
`signature_scores = ssgsea_formula(counts_transformed, gmt)`

Scale signatures
`signature_scores = median_scale(signature_scores)`

Should the counts matrix (dataframe) be in the following format: rows = genes and columns = samples? Because if I do that, the `ssgsea_scores()` function does not work. 

This is from the ssgsea_formula() function:
 `ranks = data.T.rank(method=rank_method, na_option='bottom')`

1. data` -> rows = genes, columns = samples
2. `data.T` -> rows = samples, columns = genes
3. `data.T.rank` -> `ranks.index` = samples as `rank(index=0)` by default. 

So is it correct to say that you need to use as input for `ssgsea_formula()` the counts_transformed with samples = rows and columns = genes (or of course remove the '.T' in the `ssgsea_formula()` itself?

versions of the packages I'm using: `pandas==1.4.2 numpy==1.22.3`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

format of expression matrix #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

format of expression matrix #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions