This is a repository that contains information on how to reproduce results corresponding to the cutaneous T cell lymphoma (CTCL) case study reported in Spatial cell graph analysis reveals skin tissue organization characteristic for cutaneous T cell lymphoma.
As described in our preprint, the data used for our analyses comprised a total of 69 skin tissue samples (21 CTCL, 23 AD, 25 PSO), obtained from 27 treated patients (8 CTCL, 7 AD, 12 PSO). Each sample contained at least 35 images protein channels, each of resolution 2018 X 2018.
Complete data available publicly over Zenodo. For the purpose of conserving memory, each of the images have been down-scaled to 512 X 512 pixels from their original resolution of 2018 X 2018 pixels.
The repository follows the following data structure:
-
File data_description.xlsx: Provides information pertaining to PatientID, SampleNumber and Condition; this file has been color coded per condition for improved readability.
-
File additional_metadata.xlsx: Provides additional information pertaining to the samples specified in data_description.xlsx.
-
Directories /AD, /PSO and /CTCL: Folders containing the respective samples specified in data_description.xlsx and additional_metadata.xlsx. The sub directory organizations are self explanatory---they are of the form: /Condition/PatientID_/SampleNumber_/. Each sample contains at least 35 .tif images, each corresponding to a (protein channel, dye) combination.
Install conda environment as follows (there also exists a requirements.txt)
conda create --name ctcl_case_study
conda activate ctcl_case_study
pip install scipy==1.10.1 numpy==1.23.5 squidpy==1.3.0 pandas==1.5.3 scikit-learn==1.2.2Note: Additionally, modules math and statistics were used, however no installation is required as they are provided with Python by default.
Algorithm:
Step-1: Fit GMM model.
Step-2: Find list of good split genes.
Step-3: Calculate spread per celltype per gene in set {good_split_genes}.
Step-4: Arrange all genes in set {good_split_genes} in descending order, per celltype.
Step-5: Pick {gene-g, celltype-C} that maximizes spread.
Step-6: Assign g+: C.
Step-7: Repeat step-1.
Once the algorithm is complete, do the following:
i. Assign unassigned cells to 'Unknown' type.
ii. Map back assigned celltypes to the original cells.
iii. Save sample-wise cell type assignment results to a csv file and as an anndata.h5ad file.
Note: For a detailed explanation of the cell type assignment algorithm, please refer to the paper.
Navigate to /scripts/hpa_based_cell_type_assignment/ and run cell_type_assignment.py.
Sample-wise cell type assignment results saved as /scripts/hpa_based_cell_type_assignment/result/sample-wise celltypes (HPA-based clustering).csv and /scripts/hpa_based_cell_type_assignment/result/celltype_assigned_anndata.h5ad.
Additionally, if you want to save the results as separate .h5ad files per sample, please uncomment and run the last section of cell_type_assignment.py titled Generate separate .h5ad files (lines 77-86). This will result in separate .h5ad files saved as /scripts/hpa_based_cell_type_assignment/result/<sample_id>.h5ad. However, since these files have already been provided under the /data directory, upon which all of our subsequent SHouT heterogeneity score-based analyses have been performed, this code has been commented out by default in order to avoid generation of redundant .h5ad files.
Note: It must be noted that two of the samples, namely 291 and 294, have been removed from the /data directory: that is simply because samples 291 and 294 contain too few segmented cells to run the SHouT heterogeneity scores on.
Navigate to /scripts/shout_score_generation/ and run compute_shout_scores.py.
New AnnData objects with SHouT scores saved in /results folder as .h5ad files, with the same name as the original sample number.
Navigate to /scripts/shout_score_generation/ and run results_to_csv.py.
Results are saved in /results/sample_results.csv for global (sample-wise) heterogeneity scores, and /results/cell_results.csv for local (individual cell-wise) heterogeneity scores.
Navigate to /scripts/shout_score_generation/ and run statistical_tests.py.
Results are saved in /results/p_values_global.csv for local (individual cell-wise) and global (sample-wise) heterogeneity scores with r={1,2,3,4,5} between all pairs of conditions; and /results/p_values_cell_type.csv for local (individual cell-wise) heterogeneity scores with r={1,2,3,4,5} and per celltype, between all pairs of conditions.
Navigate to /scripts/shout_score_generation/ and run save_pvalue_statistics_shuffled_labels.py.
Results are saved in /results/pvalue_statistics_shuffled_labels.csv for local (individual cell-wise) heterogeneity scores with r={1,2,3,4,5} between all pairs of conditions after having randomized the condition labels.
Navigate to /scripts/shout_score_generation/ and run statistical_tests_subsampled_patientwise.py.
Results for local (individual cell-wise) heterogeneity scores are saved in /results/p_values_cell_type_subsampled_patientwise.csv with r={1,2,3,4,5} between all pairs of conditions after having randomly subsampled 15 patients per condition at a time, while maintaining the actual condition labels, and repeating this subsampling process for 100 iterations.
Results for global (sample-wise) heterogeneity scores are saved in /results/p_values_global_subsampled_patientwise.csv with r={1,2,3,4,5} between all pairs of conditions after having randomly subsampled 15 patients per condition at a time, while maintaining the actual condition labels, and repeating this subsampling process for 100 iterations.
In order to record the runtimes of executing SHouT heterogeneity scores on all 69 samples, with radii r = {1, 5, 10, 20, 40, 80, 100}, navigate to /scripts/plots_for_paper/fig6_scalability_plot/ and run per_sample_runtimes_with_varying_radius.py. The runtimes per sample, for each of r = {1, 5, 10, 20, 40, 80, 100}, is saved as /results/data_all_cells.csv
Navigate to /scripts/plots_for_paper/fig2_database/ and run fig2_database.py. Plot saved as fig2_database.pdf.
Navigate to /scripts/plots_for_paper/fig3_subplot_mosaic/ and run fig3_subplot_mosaic.py. Plot saved as fig3_subplot_mosaic.pdf.
Navigate to /scripts/plots_for_paper/fig4_shuffled_labels/ and run fig4_shuffled_labels.py. Plot saved as fig4_shuffled_labels.pdf.
Navigate to /scripts/plots_for_paper/fig5_subsampled_patients/ and run fig5_subsampled_patients.py. Plot saved as fig5_subsampled_patients.pdf.
Navigate to /scripts/plots_for_paper/fig6_scalability_plot/ and run fig6_scalability_plot.py. Plot saved as fig6_scalability_plot.pdf.
Navigate to /scripts/plots_for_paper/supfig2_cell_abundance_analyses/ and run supfig2_cell_abundance_analyses.py. Plot saved as supfig2_cell_abundance_analyses.pdf.
Navigate to /scripts/plots_for_paper/supfigs3,4,5_SHouT_scores_radius_5/ and run generate_supfig_1_2_and_3_violinplots.py. Plots saved as SHouT_score_1_violinplots.pdf, SHouT_score_2_violinplots.pdf and SHouT_score_3_violinplots.pdf respectively.
Navigate to /scripts/plots_for_paper/supfig6_SHouT_scores_all_radii/ and run supfig6_SHouT_scores_all_radii.py. Plot saved as supfig6_SHouT_scores_all_radii.pdf.
Navigate to /scripts/plots_for_paper/supfig7_centrality_scores/ and run generate_supfig7_centrality_scores.py. Plot saved as supfig7_centrality_scores.pdf.
Please cite our work as follows:
- Sarkar, S., Möller, A., Hartebrodt, A., Erdmann, M., Ostalecki, C., Baur, A. & Blumenthal, D. B. Spatial cell graph analysis reveals skin tissue organization characteristic for cutaneous T cell lymphoma. bioRxiv 2024.05.17.594629 (2024). doi:10.1101/2024.05.17.594629