The problem with sgdemux (and sgdemux's flowcell) tests:
sgdemux in this test basically reads from the fastq directory and writes the output files in the same directory:
sgdemux \
--sample-metadata out.sample_meta.csv \
--fastqs sim-data \
--output-dir sim-data \
--demux-threads 4 \
--compressor-threads 4 \
--writer-threads 4 \
So that's why there are some extra files:
- The files prefixed with
out such as out_L001_R1_001.fastq.gz are the input fastq files.
- The files such as
s10_S10_L001_R1_001.fastq.gz and s10_S10_L001_R2_001.fastq.gz are the produced output files.
- The module does not differentiate capturing between undetermined and regular fastq, so that's why undetermined fastq (such as
Undetermined_S25_L001_R1_001.fastq.gz) also appear (and flow into fastp).
So with workflow outputs we are now capturing everything from sample_fastq and that's why we see more files in the snapshot. Changing (basically, improving) the module to separate between input fastq, output fastq and undetermined will also change this snapshot (all the undetermined fastp results will also dissapear), so, in order for the changes not to be so big, I would merge this as-is and then do a whole refactoring of the sgdemux module and the outputs that are then captured in the workflow.
Originally posted by @atrigila in #379 (comment)
The problem with sgdemux (and sgdemux's flowcell) tests:
sgdemuxin this test basically reads from the fastq directory and writes the output files in the same directory:So that's why there are some extra files:
outsuch asout_L001_R1_001.fastq.gzare the input fastq files.s10_S10_L001_R1_001.fastq.gzands10_S10_L001_R2_001.fastq.gzare the produced output files.Undetermined_S25_L001_R1_001.fastq.gz) also appear (and flow into fastp).So with workflow outputs we are now capturing everything from
sample_fastqand that's why we see more files in the snapshot. Changing (basically, improving) the module to separate between input fastq, output fastq and undetermined will also change this snapshot (all the undetermined fastp results will also dissapear), so, in order for the changes not to be so big, I would merge this as-is and then do a whole refactoring of the sgdemux module and the outputs that are then captured in the workflow.Originally posted by @atrigila in #379 (comment)