Skip to content

Conversation

@muffato
Copy link
Member

@muffato muffato commented Feb 6, 2026

Still need to get the nf-core modules updated, but everything else is done and good for review

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@muffato muffato self-assigned this Feb 6, 2026
Copilot AI review requested due to automatic review settings February 6, 2026 08:17
@github-actions
Copy link

github-actions bot commented Feb 6, 2026

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 5049a3e

+| ✅ 197 tests passed       |+
#| ❔  28 tests were ignored |#
!| ❗  20 tests had warnings |!
Details

❗ Test warnings:

❔ Tests ignored:

  • files_exist - File is ignored: CODE_OF_CONDUCT.md
  • files_exist - File is ignored: assets/nf-core-blobtoolkit_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-blobtoolkit_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-blobtoolkit_logo_dark.png
  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/config.yml
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • files_exist - File is ignored: conf/igenomes.config
  • nextflow_config - Config variable ignored: manifest.name
  • nextflow_config - Config variable ignored: manifest.homePage
  • files_unchanged - File ignored due to lint config: CODE_OF_CONDUCT.md
  • files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File does not exist: .github/ISSUE_TEMPLATE/config.yml
  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
  • files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting_comment.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
  • files_unchanged - File ignored due to lint config: assets/email_template.html
  • files_unchanged - File ignored due to lint config: assets/email_template.txt
  • files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
  • files_unchanged - File ignored due to lint config: assets/nf-core-blobtoolkit_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-blobtoolkit_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-blobtoolkit_logo_dark.png
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/blobtoolkit/blobtoolkit/.github/workflows/awstest.yml
  • template_strings - template_strings
  • merge_markers - merge_markers

✅ Tests passed:

Run details

  • nf-core/tools version 3.5.2
  • Run at 2026-02-07 02:01:52

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request applies strict syntax and linting improvements to the sanger-tol/blobtoolkit Nextflow pipeline. The changes enforce stricter coding standards including lowercase channel factory methods, explicit closure parameter naming, and conversion from switch statements to if/else blocks.

Changes:

  • Standardized all Channel factory methods to lowercase (channel.empty(), channel.fromPath(), channel.value())
  • Added explicit parameter names to all closures instead of implicit it parameter, with unused parameters prefixed with underscore
  • Replaced switch statements with if/else if/else blocks for better linting compliance
  • Moved conda profile compatibility checks from module-level to script section in BlobToolKit modules
  • Updated DIAMOND modules from version 2.1.8 to 2.1.16
  • Removed unused variables and helper functions from configuration files
  • Replaced inline function calls with explicit mathematical expressions in resource configuration

Reviewed changes

Copilot reviewed 47 out of 47 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
workflows/blobtoolkit.nf Converted Channel factory methods to lowercase
subworkflows/local/*.nf Applied lowercase Channel methods and explicit closure parameters
modules/nf-core/diamond/blastp/* Updated DIAMOND version to 2.1.16, converted switch to if/else, changed API from string out_ext to integer outfmt
modules/nf-core/diamond/blastx/* Updated DIAMOND version to 2.1.16, converted switch to if/else
modules/local/blobtoolkit/*.nf Moved conda checks to script section
modules/local/blobtk/*.nf Moved conda checks to script section
modules/local/*.nf Removed unused variables, added explicit closure parameters
nextflow.config Removed unused helper functions (log_increase_cpus, positive_log)
conf/base.config Replaced function calls with inline mathematical expressions
tests/*.snap Updated test snapshots for DIAMOND version change and timestamps
Comments suppressed due to low confidence (11)

modules/local/blobtoolkit/createblobdir.nf:40

  • BLOBTOOLKIT_CREATEBLOBDIR derives prefix from meta.id and injects it unquoted into blobtools replace and filesystem operations (mkdir ${prefix}, cp ... ${prefix}/), and also builds busco_args by concatenating raw busco paths. This means a malicious sample ID or manipulated BUSCO path containing shell metacharacters could be expanded by the shell and used to execute arbitrary commands as part of this process. Quote or shell-escape prefix and each BUSCO path before use so that user-controlled names and paths cannot break out of the intended command context.
    def args = task.ext.args ?: ''
    prefix = task.ext.prefix ?: "${meta.id}"
    def busco_args = (busco instanceof List ? busco : [busco]).collect { file -> "--busco " + file } .join(' ')
    def hits_blastp = blastp ? "--hits ${blastp}" : ""
    """
    blobtools replace \\
        --bedtsvdir windowstats \\
        --meta ${yaml} \\
        --taxdump \$(dirname ${taxdump}) \\
        --taxrule buscogenes \\
        ${busco_args} \\
        ${hits_blastp} \\
        --threads ${task.cpus} \\
        $args \\
        ${prefix}

modules/local/blobtoolkit/windowstats.nf:27

  • BLOBTOOLKIT_WINDOWSTATS derives prefix from meta.id and uses it unquoted in the --out ${prefix}_window_stats.tsv argument to btk pipeline window-stats. Since meta.id comes from user-supplied sample identifiers that may contain shell metacharacters, this allows command substitution or other shell injection when the process runs. Quote or shell-escape prefix (or meta.id upstream) before using it in the shell command so that sample IDs cannot alter command execution.
    def args = task.ext.args ?: ''
    def prefix = task.ext.prefix ?: "${meta.id}"
    """
    btk pipeline window-stats \\
            --in ${tsv} \\
            $args \\
            --out ${prefix}_window_stats.tsv

modules/local/blobtoolkit/updatemeta.nf:30

  • BLOBTOOLKIT_UPDATEMETA uses prefix = task.ext.prefix ?: "${meta.id}" and then writes to ${prefix}.meta.json in the shell command without any quoting. A malicious meta.id value from the samplesheet containing shell metacharacters such as $(...) would be expanded by the shell here and can be used to run arbitrary commands. Quote or shell-escape prefix/meta.id before interpolation so that file names derived from sample IDs cannot inject code.
    def args = task.ext.args ?: ''
    prefix = task.ext.prefix ?: "${meta.id}"
    """
    update_versions.py \\
        ${args} \\
        --meta_in ${input}/meta.json \\
        --software ${versions} \\
        --meta_out ${prefix}.meta.json

modules/local/blobtoolkit/summary.nf:27

  • BLOBTOOLKIT_SUMMARY sets prefix from meta.id and then passes it unquoted to blobtools filter in --summary ${prefix}.summary.json. Because meta.id is derived from the user-controlled sample ID, shell metacharacters in that field (e.g. $(...)) can be used to inject arbitrary commands into this process. Quote or shell-escape prefix/meta.id before building the command so that sample identifiers cannot alter shell execution.
    def args = task.ext.args ?: ''
    def prefix = task.ext.prefix ?: "${meta.id}"
    """
    blobtools filter \\
        ${args} \\
        --summary ${prefix}.summary.json ${blobdir}

modules/local/blobtk/images.nf:33

  • BLOBTK_IMAGES takes prefix from meta.id and uses it in the unquoted output path -o ${prefix}.${plot}.${format} in the blobtk plot invocation. Because meta.id is user-controlled via the samplesheet, any shell metacharacters embedded in the sample ID will be interpreted by the shell and can be abused for arbitrary command execution. Quote or shell-escape prefix and other user-influenced values before interpolation to ensure they are treated as literal file names.
    def args = task.ext.args ?: ''
    def prefix = task.ext.prefix ?: "${meta.id}"
    def legend = plot.equals("snail") ? "" : "--legend full"
    """
    blobtk plot \\
        -v ${plot} \\
        -d ${blobdir} \\
        -o ${prefix}.${plot}.${format} \\
        ${legend} \\

modules/local/compressblobdir.nf:29

  • COMPRESSBLOBDIR derives prefix from meta.id and then uses it unquoted in shell commands (mkdir ${prefix}, cp ${input}/* ${prefix}/, etc.). Since meta.id is controlled via the samplesheet, an attacker can supply a sample name containing shell metacharacters so that these operations execute unintended commands instead of just creating/copying directories. Quote or shell-escape prefix at each use so that sample IDs are treated as literal directory names in the shell.
    prefix = task.ext.prefix ?: "${meta.id}"
    """
    mkdir ${prefix}
    cp ${input}/* ${prefix}/
    cp ${summary_json} ${prefix}/summary.json
    cp ${meta_json} ${prefix}/meta.json
    pigz --processes $task.cpus ${prefix}/*.json

modules/local/blobtoolkit/unchunk.nf:27

  • BLOBTOOLKIT_UNCHUNK sets prefix from the blast_table path and interpolates both ${blast_table} and ${prefix} directly into the btk pipeline unchunk-blast command without quoting. If the staged BLAST table path contains shell metacharacters (for example propagated from an unsafe prefix earlier in the pipeline), the shell will interpret them here and can be used to execute arbitrary commands. Quote or shell-escape blast_table and prefix so that file paths cannot be used as a shell injection vector.
    def prefix = task.ext.prefix ?: "${blast_table}"
    """
    btk pipeline unchunk-blast \\
        --in ${blast_table} \\
        --out ${prefix}.out \\

modules/nf-core/diamond/blastx/main.nf:81

  • In DIAMOND_BLASTX, prefix is derived from meta.id and then embedded unquoted into the diamond CLI via --out ${prefix}.${out_ext} along with unescaped blast_columns/taxid arguments. Since meta.id comes from the user-controlled samplesheet (validated only to disallow whitespace), a crafted sample ID like my$(malicious_cmd) would trigger shell command substitution and run arbitrary code when this process executes. Ensure meta.id, blast_columns, taxid and other user-influenced values are safely quoted or shell-escaped before interpolation, or passed as separate, properly quoted arguments.
    def args = task.ext.args ?: ''
    def prefix = task.ext.prefix ?: "${meta.id}"
    def is_compressed = fasta.getExtension() == "gz" ? true : false
    def fasta_name = is_compressed ? fasta.getBaseName() : fasta
    def columns = blast_columns ? "${blast_columns}" : ''
    def exclude_taxon = taxid ? "--taxon-exclude ${taxid}" : ''
    if (out_ext == 'blast') {
        outfmt = 0
    }
    else if (out_ext == 'xml') {
        outfmt = 5
    }
    else if (out_ext == 'txt') {
        outfmt = 6
    }
    else if (out_ext == 'daa') {
        outfmt = 100
    }
    else if (out_ext == 'sam') {
        outfmt = 101
    }
    else if (out_ext == 'tsv') {
        outfmt = 102
    }
    else if (out_ext == 'paf') {
        outfmt = 103
    }
    else {
        outfmt = 6
        out_ext = 'txt'
        log.warn("Unknown output file format provided (${out_ext}): selecting DIAMOND default of tabular BLAST output (txt)")
    }
    """
    if [ "${is_compressed}" == "true" ]; then
        gzip -c -d ${fasta} > ${fasta_name}
    fi

    mkdir -p ./blastx_tmp

    DB=`find -L ./ -name "*.dmnd" | sed 's/\\.dmnd\$//'`

    diamond \\
        blastx \\
        --threads ${task.cpus} \\
        --db \$DB \\
        --query ${fasta_name} \\
        --outfmt ${outfmt} ${columns} \\
        ${exclude_taxon} \\
        ${args} \\
        --out ${prefix}.${out_ext} \\

modules/nf-core/diamond/blastp/main.nf:80

  • In DIAMOND_BLASTP, prefix is set from meta.id and used unquoted in the diamond command as part of the --out ${prefix}.${out_ext} argument. Because meta.id originates from the samplesheet (sample / run_accession) and is only constrained to be non-whitespace, a value containing shell metacharacters (e.g. $(...) or backticks) will be expanded by the shell and can execute arbitrary commands. Protect against this by shell-escaping or quoting meta.id (and any other user-controlled values) before interpolation, or by constructing the command so that these values are passed as literal arguments rather than raw string fragments.
    def args = task.ext.args ?: ''
    def prefix = task.ext.prefix ?: "${meta.id}"

    def columns = blast_columns ? "${blast_columns}" : ''
    def exclude_taxon = taxid ? "--taxon-exclude ${taxid}" : ''
    def out_ext = ""

    if (outfmt == 0) {
        out_ext = "blast"
    }
    else if (outfmt == 5) {
        out_ext = "xml"
    }
    else if (outfmt == 6) {
        out_ext = "txt"
    }
    else if (outfmt == 100) {
        out_ext = "daa"
    }
    else if (outfmt == 101) {
        out_ext = "sam"
    }
    else if (outfmt == 102) {
        out_ext = "tsv"
    }
    else if (outfmt == 103) {
        out_ext = "paf"
    }
    else {
        log.warn("Unknown output file format provided (${outfmt}): selecting DIAMOND default of tabular BLAST output (txt)")
        outfmt = 6
        out_ext = 'txt'
    }

    if (args =~ /--compress\s+1/) {
        out_ext += '.gz'
    }

    """
    mkdir -p ./blastp_tmp

    diamond \\
        blastp \\
        --threads ${task.cpus} \\
        --db ${db} \\
        --query ${fasta} \\
        --outfmt ${outfmt} ${columns} \\
        ${exclude_taxon} \\
        ${args} \\
        --out ${prefix}.${out_ext}

modules/local/blobtoolkit/updateblobdir.nf:47

  • BLOBTOOLKIT_UPDATEBLOBDIR constructs prefix from meta.id and uses it unquoted in multiple shell contexts (mkdir ${prefix}, cp ... ${prefix}/, and as the final blobtools replace argument). If a user-supplied sample ID contains shell metacharacters, the shell will expand them in these positions, giving an attacker code execution in any environment where sample sheets or params can be influenced by an untrusted party. Make sure prefix is safely quoted or shell-escaped everywhere it is used so that sample IDs are treated as literal directory/file names.
    def args = task.ext.args ?: ''
    prefix = task.ext.prefix ?: "${meta.id}"
    def hits_blastx = blastx ? "--hits ${blastx}" : ""
    def hits_blastn = blastn ? "--hits ${blastn}" : ""
    def syn = synonyms_tsv ? "--synonyms ${synonyms_tsv}" : ""
    def cat = categories_tsv ? "--text ${categories_tsv}" : ""
    def head = (synonyms_tsv || categories_tsv) ? "--text-header" : ""
    """
    # In-place modifications are not great in Nextflow, so work on a copy of ${input}
    mkdir ${prefix}
    cp --preserve=timestamp ${input}/* ${prefix}/
    blobtools replace \\
        --taxdump \$(dirname ${taxdump}) \\
        --taxrule bestdistorder=buscoregions \\
        ${hits_blastx} \\
        ${hits_blastn} \\
        ${syn} ${cat} ${head} \\
        --threads ${task.cpus} \\
        $args \\
        ${prefix}

modules/local/blobtk/depth.nf:28

  • BLOBTK_DEPTH uses prefix = task.ext.prefix ?: "${meta.id}" and then interpolates it unquoted into -O ${prefix}.regions.bed.gz for the blobtk depth call. Since meta.id originates from the samplesheet and is only restricted to be non-whitespace, an attacker-controlled sample ID containing $(...) or backticks would be executed by the shell at this point. To prevent command injection, ensure prefix is correctly quoted or shell-escaped before being used in the shell command.
    def args = task.ext.args ?: ''
    def prefix = task.ext.prefix ?: "${meta.id}"
    """
    blobtk depth \\
        -b ${bam} \\
        $args \\
        -O ${prefix}.regions.bed.gz \\

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@sainsachiko sainsachiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, just one very small comment about meta

input:
tuple val(meta), path(table, stageAs: 'dir??/*')
tuple val(meta1), path(table, stageAs: 'dir??/*')
tuple val(meta), path(bed)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the meta for this be meta2, and the modified one be meta (to be consistent with other files).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants