Skip to content

sgdemux reads the input and writes the output files in the same directory #380

@atrigila

Description

@atrigila

The problem with sgdemux (and sgdemux's flowcell) tests:

sgdemux in this test basically reads from the fastq directory and writes the output files in the same directory:

sgdemux \
    --sample-metadata out.sample_meta.csv \
    --fastqs sim-data \
    --output-dir sim-data \
    --demux-threads 4 \
    --compressor-threads 4 \
    --writer-threads 4 \

So that's why there are some extra files:

  • The files prefixed with out such as out_L001_R1_001.fastq.gz are the input fastq files.
  • The files such as s10_S10_L001_R1_001.fastq.gz and s10_S10_L001_R2_001.fastq.gz are the produced output files.
  • The module does not differentiate capturing between undetermined and regular fastq, so that's why undetermined fastq (such as Undetermined_S25_L001_R1_001.fastq.gz) also appear (and flow into fastp).

So with workflow outputs we are now capturing everything from sample_fastq and that's why we see more files in the snapshot. Changing (basically, improving) the module to separate between input fastq, output fastq and undetermined will also change this snapshot (all the undetermined fastp results will also dissapear), so, in order for the changes not to be so big, I would merge this as-is and then do a whole refactoring of the sgdemux module and the outputs that are then captured in the workflow.

Originally posted by @atrigila in #379 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementImprovement for existing functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions