Skip to content

Update artic pipeline to 1.8.4#283

Open
fischer-hub wants to merge 20 commits intoreplikation:masterfrom
fischer-hub:update_artic
Open

Update artic pipeline to 1.8.4#283
fischer-hub wants to merge 20 commits intoreplikation:masterfrom
fischer-hub:update_artic

Conversation

@fischer-hub
Copy link

@fischer-hub fischer-hub commented Jul 22, 2025

With newer artic releases, the usage of medaka and nanopolish is deprecated. Instead, Clair3 is used by default. This PR updates the artic version and adjusts some code around the artic sub workflow to work with the new output. It also removes some of the old code related to nanopolish/medaka and adjusts the parameters, as well as throwing warnings when old flags are used anyway.

I also had to patch the main script of CoVarPlot since the new VCF output writes e.g. the allele frequency and read depth to the FORMAT and SAMPLE fields instead of INFO. I created a PR on the CoVarPlots repo, but until this is merged, we can get away with running the patched main script from the workflows /bin directory. Once we get a merge, we can rebuild the container and switch back to the original tool.

EDIT: this was merged already and Martin helped update the respective container :)

Since the allele frequency is added to the VCF by default with Clair3, the add_allel_frequencies() process was renamed to count_mixed_sites(). It's only counting mixed sites in the VCF file now (from the correct VCF fields again).

Had to also rename some hardcoded paths in process seqrs(), and I just now saw some part of the artic pipeline is actually pulling primer schemes into 'data/external_primer_schemes/artic-sars-cov-2' and I think its ´artic minion´, will check what is happening there ^^

Everything is running fine with the test data that I have (some fastq files) but I feel like its quite some changes in this PR, what do you think is the best way to test this fully? Comparing the VCF output? It has quite a different structure, especially in the INFO, FORMAT, SAMPLE fields. I also couldn't test the custom BED file subworkflow since I don't have any custom BED files, haha. If you have some in mind that I can use for testing that would be great.

Lastly, I wasn't sure whether to remove the --fast5 parameters completely, from my understanding, this only worked with --nanopolish, but maybe I'm missing something here - lmk what you think.

EDIT 2: Since I started on this artic got a few more updates to now 1.8.4 being the newest version. I created a new sequera container for that version and had to adjust some more things, but this now seems good to merge!

  • artic dropped support for old V1 primer scheme BED files (which are missing the 7th column, primer sequence), I added a python script to path that column in case the BED file is missing it
  • somewhere along the way artic_make_depth_mask was integrated to run as part of artic minion and running it manually afterwards results in duplicated lines in the coverage_mask.depth files, this is fixed by writing the output of both calls to artic_make_depth_mask to separate files
  • the primer.bed files are sorted by primer seq name in the artic_custom_bed subworkflow, but they were not naturally sorteg (e.g. SARS-CoV-10_LEFT would come before SARS-CoV-2_LEFT), this is now fixed

closes #280

@fischer-hub fischer-hub reopened this Jul 28, 2025
@replikation
Copy link
Owner

currently updating porecov to some adjusted syntax and thus higher nextflow version requirment. (see #281). i will merge this into the master this week. you might need a git rebase to avoid some merge conflicts later on.

@replikation
Copy link
Owner

@fischer-hub now in master


script:
def normalise_arg = normalise_threshold ? "--normalise ${normalise_threshold}" : '--normalise 0'
def normalise_arg = normalise_threshold ? "--normalise ${normalise_threshold}" : '--normalise 0' // why is the --normalise flag not part of the bash script ^^
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand your comment 🤔 Can you elaborate?

@replikation
Copy link
Owner

would recommend to add some log.err logic since you changed the primer version names.

e.g.

    if ( params.primerV == 'V1') { 
        log.error "V1 has been renamed to V1.0.0 please use --primerV V1.0.0 instead" 
        exit 10
    }
// and so on

alternatively still allow the old ones and change them via workflow (if this is possible).

@MarieLataretu
Copy link
Collaborator

I also couldn't test the custom BED file subworkflow since I don't have any custom BED files, haha. If you have some in mind that I can use for testing that would be great.

You could copy the BED file from an ARTIC primer, and use it as custom input. The consensus sequnces should then be identical (between the poreCov run w/ custom primer and the poreCov run w/ the same pre-defined primer)

@fischer-hub fischer-hub marked this pull request as draft August 15, 2025 10:01
@fischer-hub
Copy link
Author

Some clarifying comments:

The models for Clair3 that are default after the update are pre-installed; they are from the artic conda installation that I used to create the Sequera container with. Additional models can be downloaded from the Clair3
The GitHub page is used in the pipeline with --clair3_model_path and --clair3_model_name.

would recommend to add some log.err logic since you changed the primer version names.

e.g.

    if ( params.primerV == 'V1') { 
        log.error "V1 has been renamed to V1.0.0 please use --primerV V1.0.0 instead" 
        exit 10
    }
// and so on

alternatively still allow the old ones and change them via workflow (if this is possible).

Regarding the new primer version format,s after talking back to @MarieLataretu, the current behaviour is:

  • old format V1, V2, ... etc will use the respective old pre-installed primer schemes within data/external_primer_schemes/nCov-2019 via the artic_custom_bed sub workflow
  • new format v1.0.0, v2.0.0, ... etc will trigger the standard artic subworkflow which pulls primer schemes from the quicklab repository, where --primerLength 1200 pulls varvamp primer schemes and --primerLength 1200 pulls artic primer schemes to data/external_primer_schemes/$scheme_name
  • providing a .bed file to --primerV requires the user to also provide a fasta file to --primerRef and triggers the artic_custom_bed subworkflow

I also added a version format check, so there will be warnings if the format is neither the old nor the new one, and if a bed file without a reference fasta is provided. There are also warnings when using the old --nanopolish and --medaka_model parameters, suggesting that those are not used anymore and what to use instead.

I'll check if the output with these changes is reasonable, and then I think this would be ready to merge from my side :)

@MarieLataretu
Copy link
Collaborator

Thanks for the summary @fischer-hub!

I think there are two general questions to discuss @replikation, @hoelzer :
a) which ARTIC container to use, and
b) how to manage the primer schemes.

ad a):
Do we keep the Seqera container, or do you want to setup a nanozoo container?
Or should we (try) to set it up on our side? (I ran into a python version conflict, which I didn't debug, since the seqera container was working.
I think you tested the ARTIC container from biocontainers, but if it didn't work, @fischer-hub ?

With the new ARTIC version and Clair3 it's at least easier to update the models.

ad b):
Should we keep the primer resources in poreCov's repository (data/external_primer_schemes/nCov-2019) and offer all three options that David listed?
That would make poreCov more dependent on external resources.

@fischer-hub
Copy link
Author

ad a): Do we keep the Seqera container, or do you want to setup a nanozoo container? Or should we (try) to set it up on our side? (I ran into a python version conflict, which I didn't debug, since the seqera container was working. I think you tested the ARTIC container from biocontainers, but if it didn't work, @fischer-hub ?

I tested this container: quay.io/artic/fieldbioinformatics:1.6.0, which apparently is their automatic release build, but I didn't get it to run.
So I ran with the Sequera container instead.

@replikation
Copy link
Owner

As long as the container runs, I'm fine with it. Most of the nanozoo containers we use are because alternatives were not compatible with a cluster or cloud setting, or simply weren't working.

@fischer-hub fischer-hub changed the title Update artic pipeline to 1.7.4 Update artic pipeline to 1.8.4 Sep 29, 2025
@fischer-hub fischer-hub marked this pull request as ready for review September 29, 2025 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ARTIC/fieldbioinformatics update

3 participants