a hierarchical and cell-type-specific genome organization generator
HiCGen is a deep learning framework for predicting multiscale 3D genome organization (1 kb to 128 kb resolution) using DNA sequences and genomic features. Built on Swin-Transformer, HiCGen enables cross-cell-type predictions and in silico perturbation analysis to study structural consequences of genetic/epigenetic changes.
Paper: Formal Publication | bioRxiv Preprint | Demo Data: Data link
- Multiscale Prediction: Generate hierarchical contact maps (1 kb to 128 kb resolutions) from sequence and epigenetic signals.
- Cross-Cell Generalization: Predict chromatin architecture for unseen cell types using cell-specific ATAC-seq/ChIP-seq profiles.
- Perturbation Analysis: Simulate structural changes caused by enhancer/promoter activation/silencing or CTCF boundary editing.
- Python 3.9+
- PyTorch 2.0+
- CUDA 11.7+ (GPU recommended)
- PyTorch Lightning
- cooler, cooltools
- kipoiseq,pyBigWig
- Clone this repository:
git clone https://github.com/JWei2015/HiCGen.git cd HiCGen - Install dependencies via conda:
conda create -n hicgen python=3.9 conda activate hicgen conda env update -f requirements.txt
- Input Formats:
- DNA Sequence: genomic sequences were derived from the GRCh38/hg38 reference genome in hg38.fa format.
- Epigenetic Signals: preprocessed ATAC-seq/ChIP-seq in .bw (BigWig) format.
- Hi-C Data: normalized and zoomified contact matrices in .mcool format.
- Data preprocessing: see Paper: Link
- HiCGen surpports command-line-interface for training and inference. For training on a new cell type, just execute the commands below in a terminal:
python train.py --celltype IMR90 --fold fold1 --pred-mode SwinT4M
- Here the
--celltypeparameter specifies the filename that contains genomic features and contact maps of the training cell. The--foldparameter specifies training/validating/test sets within the fold.txt file. Currently we support two types of--pred-mode: i.e. SwinT4M and SwinT32M. SwinT32M should be trained based on the checkpoints of a pre-trained SwinT4M model.
- For predictions, execute the commands below in a terminal:
python prediction.py --celltype IMR90 --chr chr15 --pos 59100000 --res 1024 --model checkpoints/models/tmp.ckpt
