This is a deep learning-based protein subcellular localization pipline for predicting protein subcellular location patterns in single cells from immunofluorescence images. The publication about this source code is 'Automatic recognition of protein subcellular location patterns in single cells from immunofluorescence images based on deep learning'
Kaggle training set can be accessed by link. Run the code in this folder setp by step to obtain the IF data in the HPA database and obtain the index file (.csv) of the data, all data are stored in the folder HPA_data, and all processed index files are stored in the folder data_csv.
Multi-instance learning model based on IF images.Run main.py to get image-based model. Multiple parameters for model training are saved in the configuration file ./configs/config.yaml. Before running the code, add the absolute path of the directory of the cell images to the variable data:data_root in the configuration file. Modify the parameter model:name in the configuration file to get different MIL models. After running part 3, the value 'imagelabel' of variable data:celllabel in configuration file can be modified to 'pseudolabel' to strengthen the MIL model.
Run codes in ./Clustering step by step to get the pseudo-labels by clustering method, and run codes in ./Heuristic step by step to get the pseudo-labels by heuristic method. Run ./S1_CombinedPseudo.py to get the cell labels for cell-based model (part4)
Cell-based model based on cell images and pseudo-labels (part3) .Run main.py to get cell-based model. Multiple parameters for model training are saved in the configuration file ./configs/config.yaml. Before running the code, add the absolute path of the directory of the cell images to the variable data:data_root in the configuration file.
Run S1-validation-ensemble.py to test the performance of ensemble model on manual test set. Multiple parameters for model validation are saved in the configuration file ./configs/config.yaml. Before running the code, add the absolute path of the directory of the test images to data:root_data and path of model's weight to MILmodel:pth_path and CellModel:pth_path in the configuration file. Validation of the ensemble model on the Kaggle test set is available on the Kaggle platform.