NeRV-3D-DC: A Nonlinear Dimensionality Reduction visualization method for 3D Chromosome Structure Reconstruction with high Resolution Hi-C Data
python 3.8.10 numpy pandas matplotlib xlrd openpyxl
../chrtest:simulation data and results
../GM12878:results of 50kb and 5kb resolution in real Hi-C data
../IMR90:results of 50kb and 5kb resolution in real Hi-C data
run 'generate_test_structer.ipynb'
run functions of "normalize.py"
or
bash generateKR_hic_matrix.sh
then
python tuple2matrix(in_dir,out_dir,resolution)
change the directory of the input Hi-C contact file and output file, and the conversion factor alpha.
bash generateStructue.sh
change the directory of the input Hi-C contact file and output file, and the conversion factor alpha.
bash generateStructuetruehic.sh
bash generateHighStructure.sh
run 'plot3D.ipynb'
python evalMetrics.py
before running, please modify the dirctores of your structure files.
python plotmetrics.py
or
run 'plotmetric.ipynb'
run calculateFISHRMSDLoop.sh
Or run evaluate_with_FISH.ipynb
cpu: 95 cpu cores: 24*95 mem: 503 GB
1)miniMDS:download source code from https://github.com/seqcode/miniMDS
By default, full MDS is used:
python minimds.py GM12878_combined_22_5kb.bed
To use partitioned MDS:
python minimds.py --partitioned GM12878_combined_22_5kb.bed
2)Hierarchical3DGenome:download source code from https://github.com/BDM-Lab/Hierarchical3DGenome
java -jar HierarchicalModeller.jar chr_id resolution observed_contact_data normalized_contact_data domain_file output_folder
Parameters:
chr_id: eg. 1, 2, ..
resolution: e.g 5000
observed_contact_data: observed hi-C contact file, each line contains 3 numbers (separated by a space) of a contact, position_1 position_2 interaction_frequencies (input/chr10_5kb.RAWobserved)(can be downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63525)
normalized_contact_data: normalized hi-C contact file, each line contains 3 numbers (separated by a space) of a contact, position_1 position_2 interaction_frequencies (input/chr10_5kb_gm12878_list.txt) (can be downloaded and normalized from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63525)
domain_file: file contains domains identified by Juicer (input/GSE63525_GM12878_primary+replicate_Arrowhead_domainlist_whole.txt) (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63525) output_folder: output folder