We have backed up the sRNA data provided in the BSRD database in BSRD_data, please cite the paper first if you want to use it.Li L, Huang D, Cheung M K, et al. BSRD: a repository for bacterial small regulatory RNA[J]. Nucleic acids research, 2013, 41(D1): D233-D238.
This repository includes the code used in our sRNAdeep method.To use or refer to the sRNAdeep method, cite Qian W, Sun J, Liu T, et al. sRNAdeep: a novel tool for bacterial sRNA prediction based on DistilBERT encoding mode and deep learning algorithms[J]. BMC genomics, 2024, 25(1): 1021.
please download the model from the Kaggle platform: link
Use python 00-do_blastn.py fasta_folder to filter the test set for matches to the training set.
Convert sequences in csv files to fasta
This includes (1) filtering out sequences from blastn results that match the training set; (2) calculating the %GC content of fasta sequences; and (3) converting fasta into the appropriate input format for sRNAdeep.
This file contains the construction, training and prediction of the sRNAdeep model. Just replace the path_of_data and test_path in it with the paths to the training set and test set files after the fasta transformation in step 3 of 01-data_process.ipynb. The required dependencies are listed below:
- pandas 2.1.4
- numpy 1.24.3
- matplotlib 3.7.4
- scikit-learn 1.2.2
- torch 2.0.0
- transformers 4.36.0
It's for R. BP, CC, MF, KEGG downloaded from David.