-
Notifications
You must be signed in to change notification settings - Fork 6
unexpected behaviour with subsampling #21
Description
Describe the bug
Hi and first of all thanks for developing this packages. It's really cool to see quality work. I am recently trying to run it on a mouse dataset and had a hard time getting it to work as described in the tutorial. After working through the code a bit I found that this is due to a weird choice of settings occurring during runtime in the sliding_window_chromunity function and its lower level child concatemer_chromunity_sliding. The dataset at hand is a preliminary one and hence quite small. Running it as described in the tutorial failed and I suspected the subsampling to be the culprit. So I decided to turn it of by setting take_sub_sample = FALSE. However, it still failed. Looking through the code I this conditional which unexpectedly turns the subsampling back on if subsample.frac is set which is always the case since it defaults to 0.5 but really shouldn't be like this when setting take_sub_sample = FALSE. Furthermore, the reason the subsampling fails is due to this comparison which in cases where the number of rows is smaller than 1000 it will be set to 1k and the sampler subsequently fails because it cannot sample more than the actual number of rows without replacement. My expectations here would be that if I set take_sub_sample = FALSE than subsampling is turned off and if it is not it at least should respect the number of rows in my data (I know the last point may be disregarded because the statistics may not hold up with datasets this small but still a warning or something would be appreciated so one does not have to waste a day to get it to run only to find the dataset is too small)
To Reproduce
Steps to reproduce the behavior:
- run
sliding_window_chromunitywith a small enough dataset (in my case I had around 150 reads per window or at least in one window) - try disabling subsampling by
take_sub_sample = FALSE(you can always disable it by additionally settingsubsample.frac = NULLbut this really should not be like this) - First error encountered will be
object not found: tixso one would have to run the steps manually to find the right reason. Which is how I found out ;)
Expected behavior
Subsampling should be turned of when setting take_sub_sample = FALSE. In addition the sample size should not be larger than the actual dataset size. A warning would be appreciated when this condition is met.
Additional context
Add any other context about the problem here.