-
Notifications
You must be signed in to change notification settings - Fork 181
Description
Version Info
Tested on v0.9.10, v0.9.11, v0.9.12
What is this all about
When running individual steps of pipeline I get different results as compared to when using batch command
Using batch leads to noisy results
When running commands individually
when using the batch command
Step by Step difference
I have not included here the coverage commands as I use multiple normals and they didnt look much different except for the antitarget region
Binning
| Individual | Batch |
|---|---|
| > Detected file format: bed > Detected file format: bed > Estimated read length 101.0 > Wrote /tmp/tmp0od03n9s.bed with 100 regions > Splitting large targets > Wrote Agilent_SureSelect_XT_HS2_All_Exon_V8_Regions.target.bed with 204770 regions > Skipping untargeted chromosomes MT > Wrote Agilent_SureSelect_XT_HS2_All_Exon_V8_Regions.antitarget.bed with 134163 regions |
> Detected file format: bed > Splitting large targets > Wrote Agilent_SureSelect_XT_HS2_All_Exon_V8_Regions.target.bed with 232655 regions > Wrote Agilent_SureSelect_XT_HS2_All_Exon_V8_Regions.antitarget.bed with 38328 regions |
Reference
| Individual | Batch |
|---|---|
| > Targets: 9665 (4.72%) bins failed filters (log2 < -5.0, log2 > 5.0, spread > 1.0) > Antitargets: 18937 (14.11%) bins failed filters > Wrote reference.cnn with 338933 regions |
> Targets: 13067 (5.616%) bins failed filters (log2 < -5.0, log2 > 5.0, spread > 1.0) > Antitargets: 1894 (4.764%) bins failed filters > Wrote reference.cnn with 272408 regions |
Difference in fix
| Individual | Batch |
|---|---|
| > Processing target: tumour > Keeping 195105 of 204770 bins > Correcting for GC bias... > Correcting for density bias... > Processing antitarget: tumour > Keeping 115226 of 134163 bins > Correcting for GC bias... > Correcting for RepeatMasker bias... > Antitargets are 3.42 x more variable than targets |
> Processing target: tumour > Keeping 219588 of 232655 bins > Correcting for GC bias... > Correcting for density bias... > Processing antitarget: tumour > Keeping 37859 of 39753 bins > Correcting for GC bias... > Correcting for RepeatMasker bias... > Antitargets are 1.39 x more variable than targets |
Difference in Segment
This is a bit different because at some point segment call when just run as
cnvkit.py segment tumor.cnr -o tumor.cns
still starts to smoothing by default which is pretty strange as --smooth-cbs i thought was an opt in feature or is this something else.
| Individual | Batch |
|---|---|
| > Segmenting with method 'cbs', significance threshold 0.0001, in 1 processes > Smoothing overshot at 8 / 233 indices: (-30.268828150561895, -0.21054706747579377) vs. original (-27.9209, 0.53479) > Smoothing overshot at 10 / 595 indices: (-29.16372209174013, 1.8425761663484446) vs. original (-27.9546, -0.028386) |
> Segmenting with method 'cbs', significance threshold 0.0001, in 1 processes > Dropped 3 / 13645 bins on chromosome 1 > Dropped 2 / 11956 bins on chromosome 1 > Dropped 1 / 9698 bins on chromosome 5 > Dropped 2 / 10534 bins on chromosome 12 > Dropped 48 / 126 bins on chromosome Y > Dropped 254 / 375 bins on chromosome Y |
Then there are bunch of postprocessing step in batch mode which isnt documented as part of the batch pipeline altogether in the stable release version of the readthedocs like segmetrics and call to filter based on ci
CI filtering
| Individual | Batch |
|---|---|
| > Applying filter 'ci' > Filtered by 'ci' from 59 to 34 rows > Wrote tumor.ci.cns with 34 regions |
> Applying filter 'ci' > Filtered by 'ci' from 729 to 395 rows |
This was followed by median centering and p-t-test
and finally endining with bintest
Bintest
| Individual | Batch |
|---|---|
| > Ignoring 115226 off-target bins > Significant hits in 7141/195105 bins (3.66%) |
> Ignoring 37859 off-target bins > Significant hits in 5976/219588 bins (2.72%) |
Overall i see two differences
- At the step of binning which uses target and antitarget instead of autobin if i am not wrong (refer to my comment on batch hybrid: Use autobin for target and antitarget bin sizes #302)
- Or it could be due to the automatic smmothing in segment step which i dont undestand how is it even happening

