-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Have been looking into some publicly available Hi-C data from GEO (GSE147123). Has .hic and .mcool-files for a bunch of cell lines we are interested in. Have looked into using mustache for loop calling and plotgardener for visualization from the hic-files. Both seem to use straw for data read-in. But have run into some issues for both packages with reading in data at basically most resolutions <250kb. A fraction of cell line/chromosome combinations just fail. Makes uniform processing/analysis difficult.
I think this traces back to non-convergence of KR. Apparently results in empty slots in the hic-files. The hic-files however also have data based on the normalization method "GW_KR" available. This seems to mostly work as failed chromosomes are very rare. I found that manually changing the data read-in function in mustache.py to use "GW_KR" instead of "KR" worked across all the cell lines we are interested in at 5/10kb resolution.
My question is whether making this change is appropriate? Results still roughly as valid? Some other suggested fix?
Also found that installing using the Conda-approach on the landing page initially resulted in a broken package. Was unable to read any hic-file. Think it traced back to mustache.py which called "import straw" instead of "import hicstraw" in miniconda3/envs/mustache/lib/python3.8/site-packages/mustache/mustache.py. Replacing this file with the one from here or changing that line fixed it.
Thanks!