I know this may not be the right forum for this question, but I tried posting this in biostars and the mailing list, but I haven't heard back so I'm putting it here as well.
I'm hoping to run a segmentation using quite a few tracks derived from Hi-C across the human genome. Half of these will be continuous with values ranging from 0-0.15 and the other half will be binary discrete with values 0 and 1. Both sets of tracks are very sparse with many 0 regions. I have a few questions on the best way to approach this.
- Will the discrete and continuous tracks both be treated exactly the same or can I specify within Segway which tracks are continuous and discrete?
- Do I need to (or how could I best) normalize the values in some way to avoid the binary 0,1 tracks from drowning out the lower scored 0-0.15 tracks?
- Should I be training on certain regions of the genome that I know have a signal? I'm worried that only including 5% or less in the minibatch training may not pick up all the variations in the tracks due to the sparsity of the data.
Thanks for any guidance.
I know this may not be the right forum for this question, but I tried posting this in biostars and the mailing list, but I haven't heard back so I'm putting it here as well.
I'm hoping to run a segmentation using quite a few tracks derived from Hi-C across the human genome. Half of these will be continuous with values ranging from 0-0.15 and the other half will be binary discrete with values 0 and 1. Both sets of tracks are very sparse with many 0 regions. I have a few questions on the best way to approach this.
Thanks for any guidance.