Skip to content

Releases: morrislab/mRNABench

v1.2.2

13 Jul 20:00
5f49b50

Choose a tag to compare

Pinned dependencies, minor bugfixes, updated README.

v1.2.1

16 May 08:33
3c40f8c

Choose a tag to compare

NeurIPS Datasets and Benchmark submission version.

Since last release:

Added additional dataset:

  • GO Cellular Component
  • GO Biological Process
  • mRNA localization (Fazal et al.)
  • PERSIST-Seq paired MRL and HL dataset (Leppek et al.)

Model Changes:

  • Added NaiveMamba model (unpretrained Mamba)
  • Added context window to HyenaDNA
  • Updated datasets to use Morrislab HF links

Temporarily removed essentiality datasets due to data quality issues.

Updated license to AGPL for compatibility with multimolecule. Removed AIDO.RNA due to license compatibility issues.

v1.2.0

27 Apr 23:43
7190b14

Choose a tag to compare

Changes:

EmbeddingModel

  • Removed option to specify nucleotide overlap when chunking sequence. Not found to have much effect.
  • Added Evo2, Evo1
  • Added CodonBERT (Not fully tested)
  • Added Naive Baseline
  • Added DNABERT-S
  • Added chunking to RiNALMo, set sequence length to 8192

Added datasets:

Other data changes:

  • Chromosome information added to existing datasets
  • Changed dataset columns to have "target_" prefix
  • Dataset code changed to pull directly from HF Hub.
  • Data now stored in parquet
  • BenchmarkDataset API changed to only accept single species.
  • Data catalogue format changed
  • Added model weight cache specification for models which do not directly call HF Hub.

Added splitters:

  • Splitting by k-mer similarity (using k-means clustering).
  • Chromosome splitting.

Code changes:

  • Added dataframe subset functionality in BenchmarkDataset
  • Refactored Linear Probing code to split up class responsibilities.
  • Initialization now uses Builder pattern due to relatively complicated entries.
  • Added unit testing.
  • Added ability to split dataset directly from loaded BenchmarkDataset.
  • DataSplitter API changed to allow passing of keyword args.
  • Changed get_output_filepath -> get_embedding_filepathin Embedder module to disambiguate name.
  • Removed sequence chunking overlap.

v1.1.2

28 Feb 05:20

Choose a tag to compare

Integrated RNA foundation models from multimolecule package.

v1.0.1

23 Jan 23:40

Choose a tag to compare

mRNABench initial release.