Releases: morrislab/mRNABench
Releases · morrislab/mRNABench
v1.2.2
v1.2.1
NeurIPS Datasets and Benchmark submission version.
Since last release:
Added additional dataset:
- GO Cellular Component
- GO Biological Process
- mRNA localization (Fazal et al.)
- PERSIST-Seq paired MRL and HL dataset (Leppek et al.)
Model Changes:
- Added NaiveMamba model (unpretrained Mamba)
- Added context window to HyenaDNA
- Updated datasets to use Morrislab HF links
Temporarily removed essentiality datasets due to data quality issues.
Updated license to AGPL for compatibility with multimolecule. Removed AIDO.RNA due to license compatibility issues.
v1.2.0
Changes:
EmbeddingModel
- Removed option to specify nucleotide overlap when chunking sequence. Not found to have much effect.
- Added Evo2, Evo1
- Added CodonBERT (Not fully tested)
- Added Naive Baseline
- Added DNABERT-S
- Added chunking to RiNALMo, set sequence length to 8192
Added datasets:
- MRL-Sample (https://pubmed.ncbi.nlm.nih.gov/31267113/)
- Added compositional splitting to MRL-Sample
- RNA subcellular localization (https://www.sciencedirect.com/science/article/pii/S1097276524005112?via%3Dihub)
- eCLIP (https://www.encodeproject.org/eclip/)
- Variant Effect Prediction (https://huggingface.co/datasets/songlab/TraitGym)
- Uploaded lncRNA and PCG Essentiality tasks.
- Added compositional columns to MRL-Sample
Other data changes:
- Chromosome information added to existing datasets
- Changed dataset columns to have "target_" prefix
- Dataset code changed to pull directly from HF Hub.
- Data now stored in parquet
- BenchmarkDataset API changed to only accept single species.
- Data catalogue format changed
- Added model weight cache specification for models which do not directly call HF Hub.
Added splitters:
- Splitting by k-mer similarity (using k-means clustering).
- Chromosome splitting.
Code changes:
- Added dataframe subset functionality in BenchmarkDataset
- Refactored Linear Probing code to split up class responsibilities.
- Initialization now uses Builder pattern due to relatively complicated entries.
- Added unit testing.
- Added ability to split dataset directly from loaded BenchmarkDataset.
- DataSplitter API changed to allow passing of keyword args.
- Changed get_output_filepath -> get_embedding_filepathin Embedder module to disambiguate name.
- Removed sequence chunking overlap.
v1.1.2
Integrated RNA foundation models from multimolecule package.
v1.0.1
mRNABench initial release.