About

This is a repository that contains information on how to reproduce results corresponding to the cutaneous T cell lymphoma (CTCL) case study reported in Spatial cell graph analysis reveals skin tissue organization characteristic for cutaneous T cell lymphoma.

Data

Overview

As described in our preprint, the data used for our analyses comprised a total of 69 skin tissue samples (21 CTCL, 23 AD, 25 PSO), obtained from 27 treated patients (8 CTCL, 7 AD, 12 PSO). Each sample contained at least 35 images protein channels, each of resolution 2018 X 2018.

Availability

Complete data available publicly over Zenodo. For the purpose of conserving memory, each of the images have been down-scaled to 512 X 512 pixels from their original resolution of 2018 X 2018 pixels.

Description

The repository follows the following data structure:

File data_description.xlsx: Provides information pertaining to PatientID, SampleNumber and Condition; this file has been color coded per condition for improved readability.
File additional_metadata.xlsx: Provides additional information pertaining to the samples specified in data_description.xlsx.
Directories /AD, /PSO and /CTCL: Folders containing the respective samples specified in data_description.xlsx and additional_metadata.xlsx. The sub directory organizations are self explanatory---they are of the form: /Condition/PatientID_/SampleNumber_/. Each sample contains at least 35 .tif images, each corresponding to a (protein channel, dye) combination.

Installation

Install conda environment as follows (there also exists a requirements.txt)

conda create --name ctcl_case_study
conda activate ctcl_case_study
pip install scipy==1.10.1 numpy==1.23.5 squidpy==1.3.0 pandas==1.5.3 scikit-learn==1.2.2

Note: Additionally, modules math and statistics were used, however no installation is required as they are provided with Python by default.

HPA-based cell type assignment

Steps involved:

Algorithm:

Step-1: Fit GMM model.

Step-2: Find list of good split genes.

Step-3: Calculate spread per celltype per gene in set {good_split_genes}.

Step-4: Arrange all genes in set {good_split_genes} in descending order, per celltype.

Step-5: Pick {gene-g, celltype-C} that maximizes spread.

Step-6: Assign g+: C.

Step-7: Repeat step-1.

Once the algorithm is complete, do the following:

i. Assign unassigned cells to 'Unknown' type.

ii. Map back assigned celltypes to the original cells.

iii. Save sample-wise cell type assignment results to a csv file and as an anndata.h5ad file.

Note: For a detailed explanation of the cell type assignment algorithm, please refer to the paper.

Running the code:

Navigate to /scripts/hpa_based_cell_type_assignment/ and run cell_type_assignment.py.

Output:

Sample-wise cell type assignment results saved as /scripts/hpa_based_cell_type_assignment/result/sample-wise celltypes (HPA-based clustering).csv and /scripts/hpa_based_cell_type_assignment/result/celltype_assigned_anndata.h5ad.

Additionally, if you want to save the results as separate .h5ad files per sample, please uncomment and run the last section of cell_type_assignment.py titled Generate separate .h5ad files (lines 77-86). This will result in separate .h5ad files saved as /scripts/hpa_based_cell_type_assignment/result/<sample_id>.h5ad. However, since these files have already been provided under the /data directory, upon which all of our subsequent SHouT heterogeneity score-based analyses have been performed, this code has been commented out by default in order to avoid generation of redundant .h5ad files.

Note: It must be noted that two of the samples, namely 291 and 294, have been removed from the /data directory: that is simply because samples 291 and 294 contain too few segmented cells to run the SHouT heterogeneity scores on.

Generating and saving SHouT heterogeneity scores

Computing the actual SHouT scores

Navigate to /scripts/shout_score_generation/ and run compute_shout_scores.py.

New AnnData objects with SHouT scores saved in /results folder as .h5ad files, with the same name as the original sample number.

Saving SHouT scores as .csv files

Navigate to /scripts/shout_score_generation/ and run results_to_csv.py.

Results are saved in /results/sample_results.csv for global (sample-wise) heterogeneity scores, and /results/cell_results.csv for local (individual cell-wise) heterogeneity scores.

Statistical testing (Mann-Whitney U-test)

Navigate to /scripts/shout_score_generation/ and run statistical_tests.py.

Results are saved in /results/p_values_global.csv for local (individual cell-wise) and global (sample-wise) heterogeneity scores with r={1,2,3,4,5} between all pairs of conditions; and /results/p_values_cell_type.csv for local (individual cell-wise) heterogeneity scores with r={1,2,3,4,5} and per celltype, between all pairs of conditions.

Robustness testing

Shuffled labels

Statistical testing (Mann-Whitney U-test)

Navigate to /scripts/shout_score_generation/ and run save_pvalue_statistics_shuffled_labels.py.

Results are saved in /results/pvalue_statistics_shuffled_labels.csv for local (individual cell-wise) heterogeneity scores with r={1,2,3,4,5} between all pairs of conditions after having randomized the condition labels.

Subsampled patients

Statistical testing (Mann-Whitney U-test)

Navigate to /scripts/shout_score_generation/ and run statistical_tests_subsampled_patientwise.py.

Results for local (individual cell-wise) heterogeneity scores are saved in /results/p_values_cell_type_subsampled_patientwise.csv with r={1,2,3,4,5} between all pairs of conditions after having randomly subsampled 15 patients per condition at a time, while maintaining the actual condition labels, and repeating this subsampling process for 100 iterations.

Results for global (sample-wise) heterogeneity scores are saved in /results/p_values_global_subsampled_patientwise.csv with r={1,2,3,4,5} between all pairs of conditions after having randomly subsampled 15 patients per condition at a time, while maintaining the actual condition labels, and repeating this subsampling process for 100 iterations.

Scalability testing

Runtimes with varying radii

In order to record the runtimes of executing SHouT heterogeneity scores on all 69 samples, with radii r = {1, 5, 10, 20, 40, 80, 100}, navigate to /scripts/plots_for_paper/fig6_scalability_plot/ and run per_sample_runtimes_with_varying_radius.py. The runtimes per sample, for each of r = {1, 5, 10, 20, 40, 80, 100}, is saved as /results/data_all_cells.csv

Reproducing figures

Reproducing results shown in Fig 2

Navigate to /scripts/plots_for_paper/fig2_database/ and run fig2_database.py. Plot saved as fig2_database.pdf.

Reproducing results shown in Fig 3

Navigate to /scripts/plots_for_paper/fig3_subplot_mosaic/ and run fig3_subplot_mosaic.py. Plot saved as fig3_subplot_mosaic.pdf.

Reproducing results shown in Fig 4

Navigate to /scripts/plots_for_paper/fig4_shuffled_labels/ and run fig4_shuffled_labels.py. Plot saved as fig4_shuffled_labels.pdf.

Reproducing results shown in Fig 5

Navigate to /scripts/plots_for_paper/fig5_subsampled_patients/ and run fig5_subsampled_patients.py. Plot saved as fig5_subsampled_patients.pdf.

Reproducing results shown in Fig 6

Navigate to /scripts/plots_for_paper/fig6_scalability_plot/ and run fig6_scalability_plot.py. Plot saved as fig6_scalability_plot.pdf.

Reproducing results shown in Supplementary Fig 2

Navigate to /scripts/plots_for_paper/supfig2_cell_abundance_analyses/ and run supfig2_cell_abundance_analyses.py. Plot saved as supfig2_cell_abundance_analyses.pdf.

Reproducing results shown in Supplementary Fig 3 - 5

Navigate to /scripts/plots_for_paper/supfigs3,4,5_SHouT_scores_radius_5/ and run generate_supfig_1_2_and_3_violinplots.py. Plots saved as SHouT_score_1_violinplots.pdf, SHouT_score_2_violinplots.pdf and SHouT_score_3_violinplots.pdf respectively.

Reproducing results shown in Supplementary Fig 6

Navigate to /scripts/plots_for_paper/supfig6_SHouT_scores_all_radii/ and run supfig6_SHouT_scores_all_radii.py. Plot saved as supfig6_SHouT_scores_all_radii.pdf.

Reproducing results shown in Supplementary Fig 7

Navigate to /scripts/plots_for_paper/supfig7_centrality_scores/ and run generate_supfig7_centrality_scores.py. Plot saved as supfig7_centrality_scores.pdf.

Citing the work

Please cite our work as follows:

Sarkar, S., Möller, A., Hartebrodt, A., Erdmann, M., Ostalecki, C., Baur, A. & Blumenthal, D. B. Spatial cell graph analysis reveals skin tissue organization characteristic for cutaneous T cell lymphoma. bioRxiv 2024.05.17.594629 (2024). doi:10.1101/2024.05.17.594629

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
SHouT @ b83a737		SHouT @ b83a737
data		data
figures		figures
results		results
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

About

Data

Overview

Availability

Description

Installation

HPA-based cell type assignment

Steps involved:

Running the code:

Output:

Generating and saving SHouT heterogeneity scores

Computing the actual SHouT scores

Saving SHouT scores as .csv files

Statistical testing (Mann-Whitney U-test)

Robustness testing

Shuffled labels

Statistical testing (Mann-Whitney U-test)

Subsampled patients

Statistical testing (Mann-Whitney U-test)

Scalability testing

Runtimes with varying radii

Reproducing figures

Reproducing results shown in Fig 2

Reproducing results shown in Fig 3

Reproducing results shown in Fig 4

Reproducing results shown in Fig 5

Reproducing results shown in Fig 6

Reproducing results shown in Supplementary Fig 2

Reproducing results shown in Supplementary Fig 3 - 5

Reproducing results shown in Supplementary Fig 6

Reproducing results shown in Supplementary Fig 7

Citing the work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages