-
Notifications
You must be signed in to change notification settings - Fork 3
feat: VEP annotation with REVEL, Sift and PolyPhen Scores to fp/fn vcfs #154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds VEP/REVEL annotation to the variant benchmarking workflow: new annotation rules for fetching VEP caches/plugins and REVEL scores, processing/indexing REVEL, and annotating FP/FN VCFs. FP/FN output paths changed from Changes
Sequence Diagram(s)sequenceDiagram
participant Workflow as Workflow Engine
participant Cache as VEP Cache/Plugins
participant REVEL as REVEL Provider
participant Index as Tabix Indexer
participant VEP as VEP Annotator
Workflow->>Cache: get_downsampled_vep_cache / get_vep_cache / get_vep_plugins
Cache-->>Workflow: VEP cache & plugins available
Workflow->>REVEL: download_revel (zip)
REVEL->>REVEL: process_revel_scores (build-specific TSV)
REVEL-->>Workflow: revel TSV
Workflow->>Index: tabix_revel_scores (TSV -> .tbi)
Index-->>Workflow: TSV + .tbi
Workflow->>VEP: annotate_shared_fn / annotate_unique_fp_fn (VCF + plugins + cache)
VEP-->>Workflow: annotated VCF (.annotated.vcf.gz) + stats HTML
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (2)
workflow/rules/common.smk (1)
744-748: Type inconsistency: returns string vs empty list.
get_plugin_auxreturns a string whenplugin == "REVEL"but an empty list[]otherwise. In Snakemake, this mixed return type can work for inputs, but consider returning an empty string or consistently using lists for clarity.🔎 Proposed fix for consistent return type
def get_plugin_aux(plugin, index=False): if plugin == "REVEL": suffix = ".tbi" if index else "" return "resources/revel_scores.tsv.gz{suffix}".format(suffix=suffix) - return [] + return ""workflow/envs/htslib.yaml (1)
1-6: Consider updating htslib to version 1.22.1.htslib 1.12 was released in March 2021. The latest available version in bioconda is 1.22.1, which includes bug fixes and performance improvements. Updating is straightforward and poses minimal compatibility risk for tabix/bgzip operations.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
workflow/Snakefile(2 hunks)workflow/envs/curl.yaml(1 hunks)workflow/envs/htslib.yaml(1 hunks)workflow/rules/annotation.smk(1 hunks)workflow/rules/common.smk(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T08:47:27.118Z
Learnt from: BiancaStoecker
Repo: snakemake-workflows/dna-seq-benchmark PR: 149
File: workflow/Snakefile:36-40
Timestamp: 2025-11-11T08:47:27.118Z
Learning: In the dna-seq-benchmark workflow, VCF files are sorted by a generic sort_vcf rule in workflow/rules/utils.smk that transforms {prefix}.vcf.gz to {prefix}.sorted.vcf.gz, so rules that produce VCF outputs don't include .sorted in their filenames - the sorting is handled as a separate downstream step.
Applied to files:
workflow/Snakefileworkflow/rules/annotation.smk
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Testing
🔇 Additional comments (4)
workflow/rules/common.smk (1)
738-741: LGTM!The tabix parameter logic correctly maps the reference genome to the appropriate column for REVEL score indexing.
workflow/Snakefile (2)
18-18: LGTM!The annotation rules are correctly included within the
variant-callsconfig block, making them conditionally available only when variant calls are configured.
33-41: LGTM!The output paths are correctly updated to reference the new annotated VCF outputs, which aligns with the annotation pipeline that takes sorted VCFs as input and produces annotated VCFs.
workflow/rules/annotation.smk (1)
76-99: Clarify theaccess.random()version requirement and confirm intentional parameter difference.The
access.random()function is available in Snakemake 7.17.1+ (the current min_version requirement), not exclusively an 8.x feature. The two annotation rules intentionally differ in their extra parameters:annotate_shared_fnincludes--sift b --polyphen bwhileannotate_unique_fp_fnomits these flags. Verify this difference aligns with your analysis requirements.
…s/dna-seq-benchmark into feat/add-revel-scores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
workflow/rules/annotation.smk (2)
26-34: Add-Land--failflags to curl command.The curl command should include
-Lto follow redirects and--failto exit with error status on HTTP failures, ensuring robust downloads from Zenodo.🔎 Proposed fix
shell: - "curl https://zenodo.org/records/7072866/files/revel-v1.3_all_chromosomes.zip -o {output} &> {log}" + "curl -L --fail https://zenodo.org/records/7072866/files/revel-v1.3_all_chromosomes.zip -o {output} &> {log}"
37-60: Missingresourcesblock causes runtime error.The shell script references
{resources.tmpdir}on line 50, but noresourcesblock is declared in the rule. This will cause a Snakemake runtime error. Additionally, the temporary file is not cleaned up on exit.🔎 Proposed fix
rule process_revel_scores: input: "resources/revel_scores.zip", output: "resources/revel_scores.tsv.gz", params: build=get_reference_genome_build(), log: "logs/vep_plugins/process_revel_scores.log", + resources: + tmpdir=config.get("tmpdir", "/tmp"), conda: "../envs/htslib.yaml" shell: """ - tmpfile=$(mktemp {resources.tmpdir}/revel_scores.XXXXXX) + tmpfile=$(mktemp -p {resources.tmpdir} revel_scores.XXXXXX) + trap 'rm -f "$tmpfile"' EXIT unzip -p {input} | tr "," "\t" | sed '1s/.*/#&/' | bgzip -c > $tmpfile if [ "{params.build}" == "GRCh38" ] ; then zgrep -h -v ^#chr $tmpfile | awk '$3 != "." ' | sort -k1,1 -k3,3n - | cat <(zcat $tmpfile | head -n1) - | bgzip -c > {output} elif [ "{params.build}" == "GRCh37" ] ; then cat $tmpfile > {output} else echo "Annotation of REVEL scores only supported for GRCh37 or GRCh38" > {log} exit 125 fi """
🧹 Nitpick comments (1)
workflow/rules/annotation.smk (1)
102-125: LGTM! SIFT and PolyPhen flags are correctly included.The rule structure is correct and now includes the
--sift b --polyphen bflags in theextraparameter (line 118) as intended per the PR objectives.Note: There's a minor trailing space after
"...polyphen b "at line 118 that can be trimmed (optional formatting nitpick).
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
workflow/envs/curl.yaml(1 hunks)workflow/rules/annotation.smk(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- workflow/envs/curl.yaml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T08:47:27.118Z
Learnt from: BiancaStoecker
Repo: snakemake-workflows/dna-seq-benchmark PR: 149
File: workflow/Snakefile:36-40
Timestamp: 2025-11-11T08:47:27.118Z
Learning: In the dna-seq-benchmark workflow, VCF files are sorted by a generic sort_vcf rule in workflow/rules/utils.smk that transforms {prefix}.vcf.gz to {prefix}.sorted.vcf.gz, so rules that produce VCF outputs don't include .sorted in their filenames - the sorting is handled as a separate downstream step.
Applied to files:
workflow/rules/annotation.smk
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Testing
🔇 Additional comments (5)
workflow/rules/annotation.smk (5)
15-23: LGTM!Standard VEP plugins setup with appropriate wrapper usage.
12-12: No action needed. The snakemake-wrappers version v8.0.2 used throughout the workflow is valid and current.
76-99: No issues found. The helper functionget_plugin_aux()is correctly implemented inworkflow/rules/common.smk, the lambda function calls with arguments are proper, andaccess.random()is the correct Snakemake API for this resource access pattern. The trailing space in theextraparameter at line 92 can optionally be trimmed for consistency.
63-73: Rule structure and implementation are correct.The
get_tabix_revel_params()function inworkflow/rules/common.smk(lines 738-741) correctly returns build-appropriate tabix parameters for REVEL score indexing. It selects the correct column (2 for GRCh37, 3 otherwise) and uses appropriate tabix flags (-f -s 1 -b {column} -e {column}) for indexing the TSV file across different reference genomes.
1-12: Helper function is properly implemented and returns correct build strings.The
get_reference_genome_build()function inworkflow/rules/common.smkis correctly implemented. Wrapper version v8.0.2 exists and is available in the snakemake-wrappers repository. The function validates the configuration and returns the expected values:
- "GRCh37" for grch37 configuration
- "GRCh38" for grch38 configuration
The rule structure correctly passes this value to the VEP cache wrapper as the build parameter.
famosab
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments / questions :)
|
We still get this error: |
Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
Added a step to free disk space on Ubuntu before testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @.github/workflows/main.yml:
- Around line 45-59: The Free Disk Space step "Free Disk Space (Ubuntu)" has
inconsistent indentation under the with: block and may reference a non-existent
tag; fix by aligning the with: children (tool-cache, android, dotnet, haskell,
large-packages, swap-storage, docker-images) to use the same 8-space indentation
as other workflow steps, and verify that jlumbroso/free-disk-space@v1.3.1 is a
valid released tag — if not, change the action reference to a stable branch like
`@main` or a valid release tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@workflow/rules/annotation.smk`:
- Around line 113-136: The rule annotate_unique_fp_fn uses a hardcoded cache
access.random("resources/vep/cache") which can diverge from the cache path
resolved by get_vep_cache_dir() used in annotate_shared_fn; update
annotate_unique_fp_fn to use get_vep_cache_dir() (same symbol used by
annotate_shared_fn) for the cache input so both rules reference the same VEP
cache path (replace the cache=access.random(...) entry with
cache=get_vep_cache_dir()).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@workflow/rules/common.smk`:
- Around line 751-754: The function get_vep_cache_dir has inconsistent return
types: the "limit-reads" branch returns a plain value while the other branch
returns a single-element tuple because of the trailing comma; update
get_vep_cache_dir so both branches return the same type (either both plain
values or both tuples) by removing the trailing comma in the second return or by
wrapping the first return in a tuple, and ensure callers expect that unified
type.
♻️ Duplicate comments (2)
workflow/rules/annotation.smk (2)
37-47: Add-Land--failflags to curl for redirect handling and error detection.The Zenodo URL may redirect, and curl error 78 ("Remote file not found") reported in PR comments could stem from this. Adding
-Lfollows redirects;--failensures curl exits with an error on HTTP failures.shell: - "curl {params.url} -o {output} &> {log}" + "curl -L --fail {params.url} -o {output} &> {log}"
50-73: Missingresourcesdeclaration for{resources.tmpdir}and temp file cleanup.The shell script references
{resources.tmpdir}(line 63) but noresourcesblock is declared, causing a runtime error. Additionally, the temporary file$tmpfileis never cleaned up.rule process_revel_scores: input: "resources/revel_scores.zip", output: "resources/revel_scores.tsv.gz", params: build=get_reference_genome_build(), log: "logs/vep_plugins/process_revel_scores.log", + resources: + tmpdir=temp("tmpdir"), conda: "../envs/tools.yaml" shell: """ - tmpfile=$(mktemp {resources.tmpdir}/revel_scores.XXXXXX) + tmpfile=$(mktemp "${{TMPDIR:-/tmp}}"/revel_scores.XXXXXX) + trap "rm -f $tmpfile" EXIT unzip -p {input} | tr "," "\t" | sed '1s/.*/#&/' | bgzip -c > $tmpfile if [ "{params.build}" == "GRCh38" ] ; then zgrep -h -v ^#chr $tmpfile | awk '$3 != "." ' | sort -k1,1 -k3,3n - | cat <(zcat $tmpfile | head -n1) - | bgzip -c > {output} elif [ "{params.build}" == "GRCh37" ] ; then cat $tmpfile > {output} else echo "Annotation of REVEL scores only supported for GRCh37 or GRCh38" > {log} exit 125 fi """
🧹 Nitpick comments (1)
workflow/rules/annotation.smk (1)
1-23: Log file path conflict between rules.Both
get_downsampled_vep_cache(line 5) andget_vep_cache(line 20) write to the same log pathlogs/vep/cache.log. While only one rule should run based on thelimit-readsconfig, using distinct log paths improves traceability and avoids potential conflicts.rule get_downsampled_vep_cache: output: directory("resources/vep/cache_downsampled"), log: - "logs/vep/cache.log", + "logs/vep/cache_downsampled.log",
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@workflow/rules/annotation.smk`:
- Around line 1-9: The two rules get_downsampled_vep_cache and get_vep_cache
both write to the same log file "logs/vep/cache.log", which can cause
overwrites; update the get_downsampled_vep_cache rule (the rule header named
get_downsampled_vep_cache and its log directive) to use a distinct log path (for
example "logs/vep/cache_downsampled.log" or include a unique suffix/wildcard) so
each rule writes to its own log file and preserves logs when both run.
♻️ Duplicate comments (2)
workflow/rules/annotation.smk (2)
37-47: Add-Land--failflags to curl - likely cause of reported runtime error.The reported runtime error "Libcurl reported error 78 (Remote file not found)" is likely caused by the Zenodo URL redirecting while curl is not configured to follow redirects. Add
-Lto follow redirects and--failto exit on HTTP errors.Proposed fix
shell: - "curl {params.url} -o {output} &> {log}" + "curl -L --fail {params.url} -o {output} &> {log}"
50-73: Missingresourcesdeclaration fortmpdir- will cause runtime error.Line 63 references
{resources.tmpdir}but noresources:block is declared. This will cause Snakemake to fail. Additionally, the temporary file$tmpfileis never cleaned up.Proposed fix
rule process_revel_scores: input: "resources/revel_scores.zip", output: "resources/revel_scores.tsv.gz", params: build=get_reference_genome_build(), log: "logs/vep_plugins/process_revel_scores.log", + resources: + tmpdir=temp("tmpdir"), conda: "../envs/tools.yaml" shell: """ - tmpfile=$(mktemp {resources.tmpdir}/revel_scores.XXXXXX) + tmpfile=$(mktemp "${{TMPDIR:-/tmp}}"/revel_scores.XXXXXX) + trap "rm -f $tmpfile" EXIT unzip -p {input} | tr "," "\t" | sed '1s/.*/#&/' | bgzip -c > $tmpfile if [ "{params.build}" == "GRCh38" ] ; then zgrep -h -v ^#chr $tmpfile | awk '$3 != "." ' | sort -k1,1 -k3,3n - | cat <(zcat $tmpfile | head -n1) - | bgzip -c > {output} elif [ "{params.build}" == "GRCh37" ] ; then cat $tmpfile > {output} else echo "Annotation of REVEL scores only supported for GRCh37 or GRCh38" > {log} exit 125 fi """
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@workflow/rules/common.smk`:
- Around line 757-763: The function get_revel_tsv returns None when
config.get("limit-reads") is true because the if branch calls
workflow.source_path("../resources/new_tabbed_revel_grch38.1pct.tsv.gz") but
does not return its value; update the if branch to return the result of
workflow.source_path(...) so get_revel_tsv always returns the expected file path
(keep the else branch returning "resources/revel_scores.tsv.gz"), referencing
get_revel_tsv, workflow.source_path, and config.get("limit-reads") to locate the
code.
- Around line 751-754: The function get_vep_cache_dir() calls access.random(...)
but access is not defined; fix by either importing/defining the access object or
replacing those calls with the proper Snakemake remote provider (e.g., use
snakemake.remote.S3.random("resources/vep/cache") or the appropriate remote
class for your storage). Update get_vep_cache_dir() and the corresponding
access.random usages in annotation.smk (the four occurrences) so they reference
a defined symbol (imported access) or use snakemake.remote.S3 (or the correct
snakemake.remote.* class) to obtain the random path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@workflow/rules/common.smk`:
- Around line 744-754: get_plugin_aux currently returns a GRCh38-only downsample
file when config.get("limit-reads") is true, which will be wrong for GRCh37
runs; update get_plugin_aux to first detect the active reference (e.g., from
config.get("reference") or the project reference setting) and if plugin ==
"REVEL" and limit-reads is true choose the matching GRCh37 downsample resource
when available or else raise a clear, fast-fail error explaining that a GRCh37
downsample resource is missing and how to provide one; keep the existing GRCh38
path for GRCh38 runs and ensure the error mentions get_plugin_aux and the
config.get("limit-reads") condition so it is easy to locate.
- Around line 738-741: get_tabix_revel_params currently treats any non-"grch37"
reference as grch38 and silently falls back; change it to validate the
config["reference-genome"] explicitly using the project's validated helper
(e.g., the reference validation function used elsewhere) and raise or fail fast
on unknown values, then return "-f -s 1 -b 2 -e 2" for "grch37" and "-f -s 1 -b
3 -e 3" for "grch38"; reference get_tabix_revel_params and
config["reference-genome"] when locating where to replace the fallback logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@workflow/rules/common.smk`:
- Around line 744-753: The path construction in get_plugin_aux (function) uses
workflow.basedir + "../.test/resources/..." which can break resolution; replace
that concatenation with
workflow.source_path(".test/resources/new_tabbed_revel_grch38.1pct.tsv.gz{suffix}".format(suffix=suffix))
so the REVEL test resource path is resolved via Snakemake's API (use the same
approach for the indexed suffix when index is True) instead of direct basedir
manipulation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@workflow/rules/annotation.smk`:
- Around line 1-11: The shell command in rule get_downsampled_vep_cache
incorrectly calls tar without -f so it treats {input} as a member name; update
the shell line in get_downsampled_vep_cache to pass the archive file to tar
using the -f option (i.e., include -f {input} before -C {output}), preserving
the mkdir, --strip-components 1, and stderr redirection to {log}.
♻️ Duplicate comments (4)
workflow/rules/common.smk (2)
738-741: Validatereference-genomeinstead of defaulting to GRCh38.Any non-
grch37value currently falls through to column 3, which can mask config mistakes. Use the validated helper to fail fast on invalid values.♻️ Proposed fix
def get_tabix_revel_params(): # Indexing of REVEL-score file where the column depends on the reference - column = 2 if config["reference-genome"] == "grch37" else 3 + build = get_reference_genome_build() + column = 2 if build == "GRCh37" else 3 return f"-f -s 1 -b {column} -e {column}"
744-754: Fail fast whenlimit-readsuses a GRCh38-only REVEL downsample.With
limit-reads, the helper always points to a GRCh38 downsample, which is wrong for GRCh37 runs. Guard this and raise a clear error (or add a GRCh37 downsample).🐛 Proposed fix (fail fast if unsupported)
def get_plugin_aux(plugin, index=False): if plugin == "REVEL": suffix = ".tbi" if index else "" if config.get("limit-reads"): + if get_reference_genome_build() != "GRCh38": + raise ValueError( + "limit-reads REVEL downsample is only available for GRCh38" + ) return workflow.source_path( "../../.test/resources/downsampled_tabbed_revel_grch38.1pct.tsv.gz{suffix}".format( suffix=suffix ) ) return "resources/revel_scores.tsv.gz{suffix}".format(suffix=suffix) return []workflow/rules/annotation.smk (2)
39-47: Follow redirects / fail on HTTP errors for the REVEL download.This avoids silent 404s/redirects and aligns with the reported libcurl error 78.
🔧 Proposed fix
- shell: - "curl https://zenodo.org/records/7072866/files/revel-v1.3_all_chromosomes.zip -o {output} &> {log}" + shell: + "curl -L --fail https://zenodo.org/records/7072866/files/revel-v1.3_all_chromosomes.zip -o {output} &> {log}"
50-73: Define a tmpdir source and clean up the temp file.
{resources.tmpdir}is undefined here, and the temp file is never removed.🐛 Proposed fix
- tmpfile=$(mktemp {resources.tmpdir}/revel_scores.XXXXXX) + tmpfile=$(mktemp "${TMPDIR:-/tmp}"/revel_scores.XXXXXX) + trap "rm -f $tmpfile" EXIT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@workflow/rules/annotation.smk`:
- Around line 16-27: The get_vep_plugins rule currently hardcodes release="111"
while get_vep_cache may use cache_downsampled (release 113) when
limit-reads=true; update get_vep_plugins (the rule named get_vep_plugins and its
release param) so its release value is computed the same way as get_vep_cache
(i.e., use the same helper/variable that picks cache release or branch on
limit-reads/cache_downsampled) or pass through the cache-matched release value,
ensuring the VEP plugins release matches the cache release when downsampled
caches are used.
In `@workflow/rules/common.smk`:
- Around line 745-758: The REVEL branch in get_plugin_aux is returning a CI
resource path that doesn't match the actual file name; update the returned
workflow.source_path call in get_plugin_aux (REVEL branch) to reference the
correct file name "new_tabbed_revel_grch38.1pct.tsv.gz{suffix}" (or
alternatively rename the resource to the current string) so the returned path
matches the CI resource and avoids FileNotFoundError.
♻️ Duplicate comments (1)
workflow/rules/annotation.smk (1)
41-49: Harden curl downloads against redirects/HTTP errors.
The current curl command doesn’t follow redirects or fail on HTTP errors, which can surface as libcurl error 78. Add-L --failfor reliability.🔧 Suggested tweak
- "curl https://zenodo.org/records/7072866/files/revel-v1.3_all_chromosomes.zip -o {output} &> {log}" + "curl -L --fail https://zenodo.org/records/7072866/files/revel-v1.3_all_chromosomes.zip -o {output} &> {log}"
| conda: | ||
| "../envs/tools.yaml" | ||
| shell: | ||
| "(mkdir -p {output}; curl -L https://github.com/snakemake-workflows/dna-seq-benchmark/raw/0181ccf16c5483c0d7d1ad1b8f9dfa87376b5b1f/workflow/resources/ci-test-references/vep_cache_113_GRCh38_chr22.tar.gz | tar -xz -C {output} --strip-components 1) 2> {log}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hier hab ichs irgendwie nicht geschafft dass es funktioniert, dass er die Datei relativ zur .smk file nimmt. Deswegen zieht er die jetzt mit curl aus unserem repo. Ich wollte sie halt auch selbst zur verfügung haben und nicht angewiesen sein auf ein anderes repo deswegen hab ich sie bei uns jetzt abgelegt.
Summary by CodeRabbit
New Features
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.