Skip to content

EESI/carmania

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CARMANIA

Context-Aware Regularization with Markovian Integration for Attention-Based Nucleotide Analysis

CARMANIA Logo

CARMANIA is a self-supervised genomic language model framework that augments next-token prediction with a transition-matrix regularization loss. This integration improves biological sequence modeling by aligning predicted transitions with empirical bigram(2-mer) statistics, allowing for better long-range dependency modeling and functional interpretation.


🧠 Pretrained Models

The following models are already available for use on Hugging Face Hub:


🚀 Quick Start

from transformers import AutoModel, AutoTokenizer
import torch

model = AutoModel.from_pretrained(
    "MsAlEhR/carmania-160k-seqlen-human",
    trust_remote_code=True,
    torch_dtype=torch.float16,   # fixed dtype (or autocast)
).to("cuda")

tokenizer = AutoTokenizer.from_pretrained(
    "MsAlEhR/carmania-160k-seqlen-human",
    trust_remote_code=True,
    model_max_length=160000,
)

inputs = tokenizer("ACGTAGGCTA", return_tensors="pt").to("cuda")

outputs = model(**inputs)

🧪 Sequence-Guided Generation

An experimental notebook exploring CARMANIA-driven sequence optimization using Enformer scores is now available.
This lightweight module perturbs input DNA sequences and uses Enformer’s predicted regulatory signals as a scoring function to iteratively generate variants with improved activity.

📄 Notebook:
carmania_enformer_guided_generation.ipynb

Citation

@article{refahi2025context,
  title= {Context-Aware Regularization with Markovian Integration for Attention-Based Nucleotide Analysis},
  author= {Refahi, Mohammadsaleh and Abavisani, Mahdi and Sokhansanj, Bahrad A. and Brown, James R. and Rosen, Gail},
  journal= {arXiv preprint arXiv:2507.09378},
  year= {2025}
}

About

Context-Aware Regularization with Markovian Integration for Attention-Based Nucleotide Analysis [NeurIPS2025]

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors