HiCGen

a hierarchical and cell-type-specific genome organization generator

HiCGen is a deep learning framework for predicting multiscale 3D genome organization (1 kb to 128 kb resolution) using DNA sequences and genomic features. Built on Swin-Transformer, HiCGen enables cross-cell-type predictions and in silico perturbation analysis to study structural consequences of genetic/epigenetic changes.

Paper: Formal Publication | bioRxiv Preprint | Demo Data: Data link

Key Features

Multiscale Prediction: Generate hierarchical contact maps (1 kb to 128 kb resolutions) from sequence and epigenetic signals.
Cross-Cell Generalization: Predict chromatin architecture for unseen cell types using cell-specific ATAC-seq/ChIP-seq profiles.
Perturbation Analysis: Simulate structural changes caused by enhancer/promoter activation/silencing or CTCF boundary editing.

Installation

Dependencies

Python 3.9+
PyTorch 2.0+
CUDA 11.7+ (GPU recommended)
PyTorch Lightning
cooler, cooltools
kipoiseq,pyBigWig

Setup

Clone this repository:

git clone https://github.com/JWei2015/HiCGen.git
cd HiCGen

Install dependencies via conda:

conda create -n hicgen python=3.9
conda activate hicgen
conda env update -f requirements.txt

Usage

Data Preparation

Input Formats:

DNA Sequence: genomic sequences were derived from the GRCh38/hg38 reference genome in hg38.fa format.
Epigenetic Signals: preprocessed ATAC-seq/ChIP-seq in .bw (BigWig) format.
Hi-C Data: normalized and zoomified contact matrices in .mcool format.

Data preprocessing： see Paper: Link

Training process

HiCGen surpports command-line-interface for training and inference. For training on a new cell type, just execute the commands below in a terminal:
```
python train.py --celltype IMR90 --fold fold1 --pred-mode SwinT4M 
```
Here the --celltype parameter specifies the filename that contains genomic features and contact maps of the training cell. The --fold parameter specifies training/validating/test sets within the fold.txt file. Currently we support two types of --pred-mode: i.e. SwinT4M and SwinT32M. SwinT32M should be trained based on the checkpoints of a pre-trained SwinT4M model.

Prediction

For predictions, execute the commands below in a terminal:

python prediction.py --celltype IMR90 --chr chr15 --pos 59100000 --res 1024 --model checkpoints/models/tmp.ckpt

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
docs		docs
LICENSE		LICENSE
README.md		README.md
data.py		data.py
dataset.py		dataset.py
model.py		model.py
swint.py		swint.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HiCGen

Key Features

Installation

Dependencies

Setup

Usage

Data Preparation

Training process

Prediction

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

JWei2015/HiCGen

Folders and files

Latest commit

History

Repository files navigation

HiCGen

Key Features

Installation

Dependencies

Setup

Usage

Data Preparation

Training process

Prediction

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages