In this study, we introduce a novel approach: STco, a multi-modal deep learning method that operates within a contrastive learning framework.STco learns a multimodal embedding space from H&E images, spot gene expression data, and spot positional information. Specifically, the image is passed through an image encoder to capture visual features, while the spot's gene expression data along with its positional encoding is input to the Spot Encoder to capture fused features incorporating spatial information. Contrastive learning is then applied to the obtained visual features and fused features, maximizing the cosine similarity of embeddings for truly paired images and gene expressions, while minimizing the similarity for incorrectly paired embeddings. To predict gene expression from an image, the test image data is fed into the image encoder to extract its visual features. Subsequently, the cosine similarity is computed between the obtained visual features and the features of
Required package:
- PyTorch >= 2.1.0
- scanpy >= 1.8
- python >=3.9
- human HER2-positive breast tumor ST data https://github.com/almaan/her2st/.
- human cutaneous squamous cell carcinoma 10x Visium data (GSE144240).
- Please run the script download.sh in the folder data;
- Run
train.py - Run
predict.py
