prepare_detergent failing when using all samples

Hello,

After collecting a test set of fragCounter coverage profiles for 4 normal samples, I attempted to run the `dryclean` workflow.
I encountered the following error while trying the first step of creating the PoN in `prepare_detergent`:

```
pon_detergent <- prepare_detergent(normal.table.path = "/drycleanRun/test_ton.rds",
                                   use.all = TRUE,
                                   num.cores = 2,
                                   build = "hg38",
                                   path.to.save = "drycleanRun/",
                                   nochr = T,
                                   save.pon = T)

### OUTPUT ###
Starting the preparation of Panel of Normal samples a.k.a detergent
4 samples available
Using all samples
PAR file not provided, using hg38 default. If this is not the correct build, please provide a GRange object delineating for corresponding build
PAR read
Checking for existence of files
4 files present
  |=====================================================================================================================| 100%, Elapsed 07:21
Error in setattr(ans, "names", c(keep.names, paste0("V", seq_len(length(ans) -  : 
  'names' attribute [1] must be the same length as the vector [0]
```

While troubleshooting, it seems like others have encountered the same error, but at a different stage of the workflow (#2).
Based on the output message, it looks like the error occurs within `pbmclapply` function call at line 259 although I am not exactly sure where.

I then decided to test `prepare_detergent` under the other possible approaches instead of using all samples.
Interestingly, using either of the two alternative options `choose.randomly = TRUE` or `choose.by.clustering = TRUE` both executed without an error.

Here using `choose.randomly = TRUE` and selecting 2 of the 4 samples:
```
pon_detergent <- prepare_detergent(normal.table.path = "/drycleanRun/test_ton.rds",
                                   use.all = FALSE,
                                   choose.randomly = TRUE,
                                   number.of.samples = 2,
                                   choose.by.clustering = FALSE,
                                   num.cores = 2,
                                   build = "hg38",
                                   path.to.save = "drycleanRun/",
                                   nochr = T,
                                   save.pon = T)

### OUTPUT ###
Starting the preparation of Panel of Normal samples a.k.a detergent
4 samples available
Selecting 2 normal samples randomly
PAR file not provided, using hg38 default. If this is not the correct build, please provide a GRange object delineating for corresponding build
PAR read
Checking for existence of files
2 files present
  |============================================================================================================| 100%, Elapsed 03:28
Starting decomposition
This is version 2
Warning: Item 1 has 3031053 rows but longest item has 15155223; recycled with remainder.Finished making the PON or detergent and saving it to the path provided
```

And here using `choose.by.clustering = TRUE` 
```
pon_detergent <- prepare_detergent(normal.table.path = "/drycleanRun/test_ton.rds",
                                   use.all = FALSE,
                                   choose.randomly = FALSE,
                                   number.of.samples = 2,
                                   choose.by.clustering = TRUE,
                                   num.cores = 2,
                                   build = "hg38",
                                   path.to.save = "drycleanRun/",
                                   nochr = T,
                                   save.pon = T)

### OUTPUT ###
Starting the preparation of Panel of Normal samples a.k.a detergent
4 samples available
Starting the clustering
Starting decomposition on a small section of genome
This is version 2
Starting clustering
PAR file not provided, using hg38 default. If this is not the correct build, please provide a GRange object delineating for corresponding build
PAR read
Checking for existence of files
2 files present
  |============================================================================================================| 100%, Elapsed 01:52
Starting decomposition
This is version 2
Warning: Item 1 has 3031053 rows but longest item has 15155223; recycled with remainder.Finished making the PON or detergent and saving it to the path provided
```

The output `detergent.rds` is in working order as I was able to run `start_wash_cycle` without any problems.
I will likely use the clustering method for further analysis but wanted to point out this issue for others who encounter it.

Best,
Patrick

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prepare_detergent failing when using all samples #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

prepare_detergent failing when using all samples #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions