-
Notifications
You must be signed in to change notification settings - Fork 6
Strict syntax and linting #220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
|
c6d19c8 to
5b7f6b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request applies strict syntax and linting improvements to the sanger-tol/blobtoolkit Nextflow pipeline. The changes enforce stricter coding standards including lowercase channel factory methods, explicit closure parameter naming, and conversion from switch statements to if/else blocks.
Changes:
- Standardized all Channel factory methods to lowercase (channel.empty(), channel.fromPath(), channel.value())
- Added explicit parameter names to all closures instead of implicit
itparameter, with unused parameters prefixed with underscore - Replaced switch statements with if/else if/else blocks for better linting compliance
- Moved conda profile compatibility checks from module-level to script section in BlobToolKit modules
- Updated DIAMOND modules from version 2.1.8 to 2.1.16
- Removed unused variables and helper functions from configuration files
- Replaced inline function calls with explicit mathematical expressions in resource configuration
Reviewed changes
Copilot reviewed 47 out of 47 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| workflows/blobtoolkit.nf | Converted Channel factory methods to lowercase |
| subworkflows/local/*.nf | Applied lowercase Channel methods and explicit closure parameters |
| modules/nf-core/diamond/blastp/* | Updated DIAMOND version to 2.1.16, converted switch to if/else, changed API from string out_ext to integer outfmt |
| modules/nf-core/diamond/blastx/* | Updated DIAMOND version to 2.1.16, converted switch to if/else |
| modules/local/blobtoolkit/*.nf | Moved conda checks to script section |
| modules/local/blobtk/*.nf | Moved conda checks to script section |
| modules/local/*.nf | Removed unused variables, added explicit closure parameters |
| nextflow.config | Removed unused helper functions (log_increase_cpus, positive_log) |
| conf/base.config | Replaced function calls with inline mathematical expressions |
| tests/*.snap | Updated test snapshots for DIAMOND version change and timestamps |
Comments suppressed due to low confidence (11)
modules/local/blobtoolkit/createblobdir.nf:40
BLOBTOOLKIT_CREATEBLOBDIRderivesprefixfrommeta.idand injects it unquoted intoblobtools replaceand filesystem operations (mkdir ${prefix},cp ... ${prefix}/), and also buildsbusco_argsby concatenating rawbuscopaths. This means a malicious sample ID or manipulated BUSCO path containing shell metacharacters could be expanded by the shell and used to execute arbitrary commands as part of this process. Quote or shell-escapeprefixand each BUSCO path before use so that user-controlled names and paths cannot break out of the intended command context.
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"
def busco_args = (busco instanceof List ? busco : [busco]).collect { file -> "--busco " + file } .join(' ')
def hits_blastp = blastp ? "--hits ${blastp}" : ""
"""
blobtools replace \\
--bedtsvdir windowstats \\
--meta ${yaml} \\
--taxdump \$(dirname ${taxdump}) \\
--taxrule buscogenes \\
${busco_args} \\
${hits_blastp} \\
--threads ${task.cpus} \\
$args \\
${prefix}
modules/local/blobtoolkit/windowstats.nf:27
BLOBTOOLKIT_WINDOWSTATSderivesprefixfrommeta.idand uses it unquoted in the--out ${prefix}_window_stats.tsvargument tobtk pipeline window-stats. Sincemeta.idcomes from user-supplied sample identifiers that may contain shell metacharacters, this allows command substitution or other shell injection when the process runs. Quote or shell-escapeprefix(ormeta.idupstream) before using it in the shell command so that sample IDs cannot alter command execution.
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
"""
btk pipeline window-stats \\
--in ${tsv} \\
$args \\
--out ${prefix}_window_stats.tsv
modules/local/blobtoolkit/updatemeta.nf:30
BLOBTOOLKIT_UPDATEMETAusesprefix = task.ext.prefix ?: "${meta.id}"and then writes to${prefix}.meta.jsonin the shell command without any quoting. A maliciousmeta.idvalue from the samplesheet containing shell metacharacters such as$(...)would be expanded by the shell here and can be used to run arbitrary commands. Quote or shell-escapeprefix/meta.idbefore interpolation so that file names derived from sample IDs cannot inject code.
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"
"""
update_versions.py \\
${args} \\
--meta_in ${input}/meta.json \\
--software ${versions} \\
--meta_out ${prefix}.meta.json
modules/local/blobtoolkit/summary.nf:27
BLOBTOOLKIT_SUMMARYsetsprefixfrommeta.idand then passes it unquoted toblobtools filterin--summary ${prefix}.summary.json. Becausemeta.idis derived from the user-controlled sample ID, shell metacharacters in that field (e.g.$(...)) can be used to inject arbitrary commands into this process. Quote or shell-escapeprefix/meta.idbefore building the command so that sample identifiers cannot alter shell execution.
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
"""
blobtools filter \\
${args} \\
--summary ${prefix}.summary.json ${blobdir}
modules/local/blobtk/images.nf:33
BLOBTK_IMAGEStakesprefixfrommeta.idand uses it in the unquoted output path-o ${prefix}.${plot}.${format}in theblobtk plotinvocation. Becausemeta.idis user-controlled via the samplesheet, any shell metacharacters embedded in the sample ID will be interpreted by the shell and can be abused for arbitrary command execution. Quote or shell-escapeprefixand other user-influenced values before interpolation to ensure they are treated as literal file names.
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def legend = plot.equals("snail") ? "" : "--legend full"
"""
blobtk plot \\
-v ${plot} \\
-d ${blobdir} \\
-o ${prefix}.${plot}.${format} \\
${legend} \\
modules/local/compressblobdir.nf:29
COMPRESSBLOBDIRderivesprefixfrommeta.idand then uses it unquoted in shell commands (mkdir ${prefix},cp ${input}/* ${prefix}/, etc.). Sincemeta.idis controlled via the samplesheet, an attacker can supply a sample name containing shell metacharacters so that these operations execute unintended commands instead of just creating/copying directories. Quote or shell-escapeprefixat each use so that sample IDs are treated as literal directory names in the shell.
prefix = task.ext.prefix ?: "${meta.id}"
"""
mkdir ${prefix}
cp ${input}/* ${prefix}/
cp ${summary_json} ${prefix}/summary.json
cp ${meta_json} ${prefix}/meta.json
pigz --processes $task.cpus ${prefix}/*.json
modules/local/blobtoolkit/unchunk.nf:27
BLOBTOOLKIT_UNCHUNKsetsprefixfrom theblast_tablepath and interpolates both${blast_table}and${prefix}directly into thebtk pipeline unchunk-blastcommand without quoting. If the staged BLAST table path contains shell metacharacters (for example propagated from an unsafeprefixearlier in the pipeline), the shell will interpret them here and can be used to execute arbitrary commands. Quote or shell-escapeblast_tableandprefixso that file paths cannot be used as a shell injection vector.
def prefix = task.ext.prefix ?: "${blast_table}"
"""
btk pipeline unchunk-blast \\
--in ${blast_table} \\
--out ${prefix}.out \\
modules/nf-core/diamond/blastx/main.nf:81
- In
DIAMOND_BLASTX,prefixis derived frommeta.idand then embedded unquoted into thediamondCLI via--out ${prefix}.${out_ext}along with unescapedblast_columns/taxidarguments. Sincemeta.idcomes from the user-controlled samplesheet (validated only to disallow whitespace), a crafted sample ID likemy$(malicious_cmd)would trigger shell command substitution and run arbitrary code when this process executes. Ensuremeta.id,blast_columns,taxidand other user-influenced values are safely quoted or shell-escaped before interpolation, or passed as separate, properly quoted arguments.
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def is_compressed = fasta.getExtension() == "gz" ? true : false
def fasta_name = is_compressed ? fasta.getBaseName() : fasta
def columns = blast_columns ? "${blast_columns}" : ''
def exclude_taxon = taxid ? "--taxon-exclude ${taxid}" : ''
if (out_ext == 'blast') {
outfmt = 0
}
else if (out_ext == 'xml') {
outfmt = 5
}
else if (out_ext == 'txt') {
outfmt = 6
}
else if (out_ext == 'daa') {
outfmt = 100
}
else if (out_ext == 'sam') {
outfmt = 101
}
else if (out_ext == 'tsv') {
outfmt = 102
}
else if (out_ext == 'paf') {
outfmt = 103
}
else {
outfmt = 6
out_ext = 'txt'
log.warn("Unknown output file format provided (${out_ext}): selecting DIAMOND default of tabular BLAST output (txt)")
}
"""
if [ "${is_compressed}" == "true" ]; then
gzip -c -d ${fasta} > ${fasta_name}
fi
mkdir -p ./blastx_tmp
DB=`find -L ./ -name "*.dmnd" | sed 's/\\.dmnd\$//'`
diamond \\
blastx \\
--threads ${task.cpus} \\
--db \$DB \\
--query ${fasta_name} \\
--outfmt ${outfmt} ${columns} \\
${exclude_taxon} \\
${args} \\
--out ${prefix}.${out_ext} \\
modules/nf-core/diamond/blastp/main.nf:80
- In
DIAMOND_BLASTP,prefixis set frommeta.idand used unquoted in thediamondcommand as part of the--out ${prefix}.${out_ext}argument. Becausemeta.idoriginates from the samplesheet (sample/run_accession) and is only constrained to be non-whitespace, a value containing shell metacharacters (e.g.$(...)or backticks) will be expanded by the shell and can execute arbitrary commands. Protect against this by shell-escaping or quotingmeta.id(and any other user-controlled values) before interpolation, or by constructing the command so that these values are passed as literal arguments rather than raw string fragments.
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def columns = blast_columns ? "${blast_columns}" : ''
def exclude_taxon = taxid ? "--taxon-exclude ${taxid}" : ''
def out_ext = ""
if (outfmt == 0) {
out_ext = "blast"
}
else if (outfmt == 5) {
out_ext = "xml"
}
else if (outfmt == 6) {
out_ext = "txt"
}
else if (outfmt == 100) {
out_ext = "daa"
}
else if (outfmt == 101) {
out_ext = "sam"
}
else if (outfmt == 102) {
out_ext = "tsv"
}
else if (outfmt == 103) {
out_ext = "paf"
}
else {
log.warn("Unknown output file format provided (${outfmt}): selecting DIAMOND default of tabular BLAST output (txt)")
outfmt = 6
out_ext = 'txt'
}
if (args =~ /--compress\s+1/) {
out_ext += '.gz'
}
"""
mkdir -p ./blastp_tmp
diamond \\
blastp \\
--threads ${task.cpus} \\
--db ${db} \\
--query ${fasta} \\
--outfmt ${outfmt} ${columns} \\
${exclude_taxon} \\
${args} \\
--out ${prefix}.${out_ext}
modules/local/blobtoolkit/updateblobdir.nf:47
BLOBTOOLKIT_UPDATEBLOBDIRconstructsprefixfrommeta.idand uses it unquoted in multiple shell contexts (mkdir ${prefix},cp ... ${prefix}/, and as the finalblobtools replaceargument). If a user-supplied sample ID contains shell metacharacters, the shell will expand them in these positions, giving an attacker code execution in any environment where sample sheets or params can be influenced by an untrusted party. Make sureprefixis safely quoted or shell-escaped everywhere it is used so that sample IDs are treated as literal directory/file names.
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"
def hits_blastx = blastx ? "--hits ${blastx}" : ""
def hits_blastn = blastn ? "--hits ${blastn}" : ""
def syn = synonyms_tsv ? "--synonyms ${synonyms_tsv}" : ""
def cat = categories_tsv ? "--text ${categories_tsv}" : ""
def head = (synonyms_tsv || categories_tsv) ? "--text-header" : ""
"""
# In-place modifications are not great in Nextflow, so work on a copy of ${input}
mkdir ${prefix}
cp --preserve=timestamp ${input}/* ${prefix}/
blobtools replace \\
--taxdump \$(dirname ${taxdump}) \\
--taxrule bestdistorder=buscoregions \\
${hits_blastx} \\
${hits_blastn} \\
${syn} ${cat} ${head} \\
--threads ${task.cpus} \\
$args \\
${prefix}
modules/local/blobtk/depth.nf:28
BLOBTK_DEPTHusesprefix = task.ext.prefix ?: "${meta.id}"and then interpolates it unquoted into-O ${prefix}.regions.bed.gzfor theblobtk depthcall. Sincemeta.idoriginates from the samplesheet and is only restricted to be non-whitespace, an attacker-controlled sample ID containing$(...)or backticks would be executed by the shell at this point. To prevent command injection, ensureprefixis correctly quoted or shell-escaped before being used in the shell command.
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
"""
blobtk depth \\
-b ${bam} \\
$args \\
-O ${prefix}.regions.bed.gz \\
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
6921341 to
5049a3e
Compare
sainsachiko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, just one very small comment about meta
| input: | ||
| tuple val(meta), path(table, stageAs: 'dir??/*') | ||
| tuple val(meta1), path(table, stageAs: 'dir??/*') | ||
| tuple val(meta), path(bed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the meta for this be meta2, and the modified one be meta (to be consistent with other files).
Still need to get the nf-core modules updated, but everything else is done and good for review
PR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).