Conversation
|
|
||
| ## Pipeline tools | ||
|
|
||
| - [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) |
| ## Pipeline tools | ||
|
|
||
| - [GffRead](https://pubmed.ncbi.nlm.nih.gov/32489650/) | ||
|
|
||
| > Pertea G, Pertea M. GFF Utilities: GffRead and GffCompare. F1000Res. 2020 Apr 28;9:ISCB Comm J-304. doi: 10.12688/f1000research.23297.2. eCollection 2020. PubMed PMID: 32489650; PubMed Central PMCID: PMC7222033. | ||
|
|
||
| - [HISAT2](https://pubmed.ncbi.nlm.nih.gov/31375807/) | ||
|
|
There was a problem hiding this comment.
updated pipeline tools
| samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') | ||
| END_VERSIONS | ||
| """ | ||
| } |
| output: | ||
| tuple val(meta), path("*.hisat2_Aligned.bam") , emit: bam | ||
| tuple val(meta), path("*_summary.txt") , emit: summary | ||
| tuple val(meta), path("*fastq.gz"), optional:true, emit: fastq |
There was a problem hiding this comment.
Is this for unmapped reads?
I'm not sure we're producing unmapped reads:
https://daehwankimlab.github.io/hisat2/manual/#:~:text=in%20the%20input.-,%2D%2Dun%2Dconc,-%3Cpath%3E%2C
| genome_annotation: "${params.genome_annotation}", | ||
| read_groups_count: "${meta.numLanes}", | ||
| study_id : "${meta.study_id}", | ||
| date :"${new Date().format("yyyyMMdd")}", |
There was a problem hiding this comment.
Date was defined twice? Ideally date should be set prior to payload generation. If it's used before then we run the risk of a workflow terminating and duplicate work being generated b/c of a new date variable.
| .set{ch_h_aln_payload} | ||
|
|
||
| // Make ALN payload | ||
| PAYLOAD_ALIGNMENT_H( // [val (meta), [path(cram),path(crai)],path(analysis_json)] |
There was a problem hiding this comment.
Minor nitpick about the comment. Should be inline with the variable.
e.g.
PAYLOAD_ALIGNMENT_H(
ch_h_aln_payload.upload, // [val (meta), [path(cram),path(crai)],path(analysis_json)]
Channel.empty()
.mix(STAGE_INPUT.out.versions)
.mix(HISAT2_ALIGN.out.versions)
.mix(MERG_DUP_H.out.versions)
.collectFile(name: 'collated_versions.yml')
)
| experiment:"${meta.experiment}", | ||
| date:"${meta.date}", | ||
| read_group:"${info.read_group.collect()}", | ||
| data_type:"${info.data_type.collect()}", // later check whether data type is correct ** |
| if (params.tools.split(',').contains('hisat2_aln')){ | ||
|
|
||
| // HISAT2 - ALIGN // | ||
| index = Channel.fromPath(params.hisat2_index).collect() |
There was a problem hiding this comment.
I don't recall, was it decided to add indexing step into the workflow?
| ch_multiqc = Channel.empty() | ||
| ch_multiqc = ch_multiqc.mix(ch_reports.collect{meta, report -> report}).ifEmpty([]) | ||
|
|
||
| ch_multiqc_config = Channel.fromPath("$projectDir/assets/multiqc_config.yml", checkIfExists: true) |
There was a problem hiding this comment.
For defining variable, better to declare all files at the start of workflow. Easier management and readability
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| */ | ||
|
|
||
| params.study_id = WorkflowMain.getGenomeAttribute(params, 'study_id') |
There was a problem hiding this comment.
Redefining these are not needed.
See:
https://github.com/icgc-argo-workflows/prealnqc/blob/main/main.nf
RNA Seq AlignmentWorkflow Version 1.0.0Please refer to
READMEfor test details