Conversation
|
@nf-core-bot fix linting |
|
Moving this back to draft to keep polishing the missing and extra files |
|
The |
|
I had to patch |
There was a problem hiding this comment.
The problem with sgdemux (and sgdemux's flowcell) tests:
sgdemux in this test basically reads from the fastq directory and writes the output files in the same directory:
sgdemux \
--sample-metadata out.sample_meta.csv \
--fastqs sim-data \
--output-dir sim-data \
--demux-threads 4 \
--compressor-threads 4 \
--writer-threads 4 \
So that's why there are some extra files:
- The files prefixed with
outsuch asout_L001_R1_001.fastq.gzare the input fastq files. - The files such as
s10_S10_L001_R1_001.fastq.gzands10_S10_L001_R2_001.fastq.gzare the produced output files. - The module does not differentiate capturing between undetermined and regular fastq, so that's why undetermined fastq (such as
Undetermined_S25_L001_R1_001.fastq.gz) also appear (and flow into fastp).
So with workflow outputs we are now capturing everything from sample_fastq and that's why we see more files in the snapshot. Changing (basically, improving) the module to separate between input fastq, output fastq and undetermined will also change this snapshot (all the undetermined fastp results will also dissapear), so, in order for the changes not to be so big, I would merge this as-is and then do a whole refactoring of the sgdemux module and the outputs that are then captured in the workflow.
There was a problem hiding this comment.
Fully agree on that approach 👍🏻
apeltzer
left a comment
There was a problem hiding this comment.
Looks good to me, follow up issues are tracked accordingly, nice :-)
Closes #367