feat: VEP annotation with REVEL, Sift and PolyPhen Scores to fp/fn vcfs #154

BiancaStoecker · 2025-12-19T13:10:51Z

Summary by CodeRabbit

New Features
- Added VEP- and REVEL-based variant annotation and processing to produce annotated VCF outputs and summary reports.
Documentation
- Added CI-friendly reference docs and scripts describing downsampled resources and indexing/subsampling procedures.
Chores
- Updated workflow outputs and paths to reflect annotated VCF files and integrated annotation steps.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…/fn vcfs.

coderabbitai · 2025-12-19T13:11:01Z

📝 Walkthrough

Walkthrough

Adds VEP/REVEL annotation to the variant benchmarking workflow: new annotation rules for fetching VEP caches/plugins and REVEL scores, processing/indexing REVEL, and annotating FP/FN VCFs. FP/FN output paths changed from results/fp-fn/vcf/...*.sorted.vcf.gz to results/fp-fn/annotated_vcf/...*.annotated.vcf.gz.

Changes

Cohort / File(s)	Summary
Snakefile configuration `workflow/Snakefile`	Includes `rules/annotation.smk`; updates FP/FN output paths from `results/fp-fn/vcf/....sorted.vcf.gz` to `results/fp-fn/annotated_vcf/....annotated.vcf.gz`
Annotation rules `workflow/rules/annotation.smk`	New file with 8 rules: `get_downsampled_vep_cache`, `get_vep_cache`, `get_vep_plugins`, `download_revel`, `process_revel_scores`, `tabix_revel_scores`, `annotate_shared_fn`, `annotate_unique_fp_fn` (uses VEP wrappers v8.1.1, REVEL plugin integration, grouping "annotation")
Helper functions `workflow/rules/common.smk`	Adds `get_tabix_revel_params()`, `get_plugin_aux(plugin, index=False)`, and `get_vep_cache_dir()` to select tabix columns, plugin auxiliary paths (.tbi/downsampled), and VEP cache dir based on genome build and limit-reads
CI documentation / resources `workflow/resources/ci-test-references/README.md`	New README describing CI-friendly downsampled REVEL table and reduced VEP cache, plus two helper scripts (`subsample_all_vars.sh`, `index_subsample.sh`) and example commands

Sequence Diagram(s)

sequenceDiagram
    participant Workflow as Workflow Engine
    participant Cache as VEP Cache/Plugins
    participant REVEL as REVEL Provider
    participant Index as Tabix Indexer
    participant VEP as VEP Annotator

    Workflow->>Cache: get_downsampled_vep_cache / get_vep_cache / get_vep_plugins
    Cache-->>Workflow: VEP cache & plugins available
    Workflow->>REVEL: download_revel (zip)
    REVEL->>REVEL: process_revel_scores (build-specific TSV)
    REVEL-->>Workflow: revel TSV
    Workflow->>Index: tabix_revel_scores (TSV -> .tbi)
    Index-->>Workflow: TSV + .tbi
    Workflow->>VEP: annotate_shared_fn / annotate_unique_fp_fn (VCF + plugins + cache)
    VEP-->>Workflow: annotated VCF (.annotated.vcf.gz) + stats HTML

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat: add fp-fn-filtering and write to vcf #137: Changes FP/FN VCF generation and is directly related to introducing VEP/REVEL annotation consumers and path updates.
fix: add dirs with results to rule all #120: Modifies results/fp-fn targets/paths—overlaps with the FP/FN output path changes here.
feat: automate picking of rename file based on reference genome used for calling + setup for more than 2 reference genomes #113: Introduces genome-build helpers and get_reference_genome_build() related to the new helper functions and build-specific REVEL handling.

Suggested reviewers

johanneskoester

Poem

🐰 I hopped through caches, plugins bright and new,

VEP and REVEL stitched each variant true,
Logs and TSVs, indexed with care,
Annotated VCFs now dance in the air,
A rabbit's cheer for pipelines made anew.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding VEP annotation with REVEL, SIFT and PolyPhen scores to FP/FN VCFs, which is the core objective reflected throughout the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

workflow/rules/common.smk (1)
744-748: Type inconsistency: returns string vs empty list.

get_plugin_aux returns a string when plugin == "REVEL" but an empty list [] otherwise. In Snakemake, this mixed return type can work for inputs, but consider returning an empty string or consistently using lists for clarity.
🔎 Proposed fix for consistent return type
 def get_plugin_aux(plugin, index=False):
     if plugin == "REVEL":
         suffix = ".tbi" if index else ""
         return "resources/revel_scores.tsv.gz{suffix}".format(suffix=suffix)
-    return []
+    return ""
workflow/envs/htslib.yaml (1)

1-6: Consider updating htslib to version 1.22.1.

htslib 1.12 was released in March 2021. The latest available version in bioconda is 1.22.1, which includes bug fixes and performance improvements. Updating is straightforward and poses minimal compatibility risk for tabix/bgzip operations.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d820558 and 30894c7.

📒 Files selected for processing (5)

workflow/Snakefile (2 hunks)
workflow/envs/curl.yaml (1 hunks)
workflow/envs/htslib.yaml (1 hunks)
workflow/rules/annotation.smk (1 hunks)
workflow/rules/common.smk (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-11-11T08:47:27.118Z

Learnt from: BiancaStoecker
Repo: snakemake-workflows/dna-seq-benchmark PR: 149
File: workflow/Snakefile:36-40
Timestamp: 2025-11-11T08:47:27.118Z
Learning: In the dna-seq-benchmark workflow, VCF files are sorted by a generic sort_vcf rule in workflow/rules/utils.smk that transforms {prefix}.vcf.gz to {prefix}.sorted.vcf.gz, so rules that produce VCF outputs don't include .sorted in their filenames - the sorting is handled as a separate downstream step.

Applied to files:

workflow/Snakefile
workflow/rules/annotation.smk

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Testing

🔇 Additional comments (4)

workflow/rules/common.smk (1)

738-741: LGTM!

The tabix parameter logic correctly maps the reference genome to the appropriate column for REVEL score indexing.

workflow/Snakefile (2)

18-18: LGTM!

The annotation rules are correctly included within the variant-calls config block, making them conditionally available only when variant calls are configured.

33-41: LGTM!

The output paths are correctly updated to reference the new annotated VCF outputs, which aligns with the annotation pipeline that takes sorted VCFs as input and produces annotated VCFs.

workflow/rules/annotation.smk (1)

76-99: Clarify the access.random() version requirement and confirm intentional parameter difference.

The access.random() function is available in Snakemake 7.17.1+ (the current min_version requirement), not exclusively an 8.x feature. The two annotation rules intentionally differ in their extra parameters: annotate_shared_fn includes --sift b --polyphen b while annotate_unique_fp_fn omits these flags. Verify this difference aligns with your analysis requirements.

workflow/envs/curl.yaml

workflow/rules/annotation.smk

…s/dna-seq-benchmark into feat/add-revel-scores

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

workflow/rules/annotation.smk (2)

26-34: Add -L and --fail flags to curl command.

The curl command should include -L to follow redirects and --fail to exit with error status on HTTP failures, ensuring robust downloads from Zenodo.

🔎 Proposed fix

     shell:
-        "curl https://zenodo.org/records/7072866/files/revel-v1.3_all_chromosomes.zip -o {output} &> {log}"
+        "curl -L --fail https://zenodo.org/records/7072866/files/revel-v1.3_all_chromosomes.zip -o {output} &> {log}"

37-60: Missing resources block causes runtime error.

The shell script references {resources.tmpdir} on line 50, but no resources block is declared in the rule. This will cause a Snakemake runtime error. Additionally, the temporary file is not cleaned up on exit.

🔎 Proposed fix

 rule process_revel_scores:
     input:
         "resources/revel_scores.zip",
     output:
         "resources/revel_scores.tsv.gz",
     params:
         build=get_reference_genome_build(),
     log:
         "logs/vep_plugins/process_revel_scores.log",
+    resources:
+        tmpdir=config.get("tmpdir", "/tmp"),
     conda:
         "../envs/htslib.yaml"
     shell:
         """
-        tmpfile=$(mktemp {resources.tmpdir}/revel_scores.XXXXXX)
+        tmpfile=$(mktemp -p {resources.tmpdir} revel_scores.XXXXXX)
+        trap 'rm -f "$tmpfile"' EXIT
         unzip -p {input} | tr "," "\t" | sed '1s/.*/#&/' | bgzip -c > $tmpfile
         if [ "{params.build}" == "GRCh38" ] ; then
             zgrep -h -v ^#chr $tmpfile | awk '$3 != "." ' | sort -k1,1 -k3,3n - | cat <(zcat $tmpfile | head -n1) - | bgzip -c > {output}
         elif [ "{params.build}" == "GRCh37" ] ; then
             cat $tmpfile > {output}
         else
             echo "Annotation of REVEL scores only supported for GRCh37 or GRCh38" > {log}
             exit 125
         fi
         """

🧹 Nitpick comments (1)

workflow/rules/annotation.smk (1)

102-125: LGTM! SIFT and PolyPhen flags are correctly included.

The rule structure is correct and now includes the --sift b --polyphen b flags in the extra parameter (line 118) as intended per the PR objectives.

Note: There's a minor trailing space after "...polyphen b " at line 118 that can be trimmed (optional formatting nitpick).

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 30894c7 and ba99965.

📒 Files selected for processing (2)

workflow/envs/curl.yaml (1 hunks)
workflow/rules/annotation.smk (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

workflow/envs/curl.yaml

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-11-11T08:47:27.118Z

Learnt from: BiancaStoecker
Repo: snakemake-workflows/dna-seq-benchmark PR: 149
File: workflow/Snakefile:36-40
Timestamp: 2025-11-11T08:47:27.118Z
Learning: In the dna-seq-benchmark workflow, VCF files are sorted by a generic sort_vcf rule in workflow/rules/utils.smk that transforms {prefix}.vcf.gz to {prefix}.sorted.vcf.gz, so rules that produce VCF outputs don't include .sorted in their filenames - the sorting is handled as a separate downstream step.

Applied to files:

workflow/rules/annotation.smk

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Testing

🔇 Additional comments (5)

workflow/rules/annotation.smk (5)

15-23: LGTM!

Standard VEP plugins setup with appropriate wrapper usage.

12-12: No action needed. The snakemake-wrappers version v8.0.2 used throughout the workflow is valid and current.

76-99: No issues found. The helper function get_plugin_aux() is correctly implemented in workflow/rules/common.smk, the lambda function calls with arguments are proper, and access.random() is the correct Snakemake API for this resource access pattern. The trailing space in the extra parameter at line 92 can optionally be trimmed for consistency.

63-73: Rule structure and implementation are correct.

The get_tabix_revel_params() function in workflow/rules/common.smk (lines 738-741) correctly returns build-appropriate tabix parameters for REVEL score indexing. It selects the correct column (2 for GRCh37, 3 otherwise) and uses appropriate tabix flags (-f -s 1 -b {column} -e {column}) for indexing the TSV file across different reference genomes.

1-12: Helper function is properly implemented and returns correct build strings.

The get_reference_genome_build() function in workflow/rules/common.smk is correctly implemented. Wrapper version v8.0.2 exists and is available in the snakemake-wrappers repository. The function validates the configuration and returns the expected values:

"GRCh37" for grch37 configuration

"GRCh38" for grch38 configuration

The rule structure correctly passes this value to the VEP cache wrapper as the build parameter.

famosab

Some comments / questions :)

workflow/envs/curl.yaml

workflow/envs/htslib.yaml

workflow/rules/annotation.smk

famosab · 2026-01-16T15:04:05Z

We still get this error:

[E::easy_errno] Libcurl reported error 78 (Remote file not found)
[E::easy_errno] Libcurl reported error 78 (Remote file not found)

Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>

Added a step to free disk space on Ubuntu before testing.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @.github/workflows/main.yml:
- Around line 45-59: The Free Disk Space step "Free Disk Space (Ubuntu)" has
inconsistent indentation under the with: block and may reference a non-existent
tag; fix by aligning the with: children (tool-cache, android, dotnet, haskell,
large-packages, swap-storage, docker-images) to use the same 8-space indentation
as other workflow steps, and verify that jlumbroso/free-disk-space@v1.3.1 is a
valid released tag — if not, change the action reference to a stable branch like
`@main` or a valid release tag.

.github/workflows/main.yml

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@workflow/rules/annotation.smk`:
- Around line 113-136: The rule annotate_unique_fp_fn uses a hardcoded cache
access.random("resources/vep/cache") which can diverge from the cache path
resolved by get_vep_cache_dir() used in annotate_shared_fn; update
annotate_unique_fp_fn to use get_vep_cache_dir() (same symbol used by
annotate_shared_fn) for the cache input so both rules reference the same VEP
cache path (replace the cache=access.random(...) entry with
cache=get_vep_cache_dir()).

workflow/rules/annotation.smk

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@workflow/rules/common.smk`:
- Around line 751-754: The function get_vep_cache_dir has inconsistent return
types: the "limit-reads" branch returns a plain value while the other branch
returns a single-element tuple because of the trailing comma; update
get_vep_cache_dir so both branches return the same type (either both plain
values or both tuples) by removing the trailing comma in the second return or by
wrapping the first return in a tuple, and ensure callers expect that unified
type.

♻️ Duplicate comments (2)

workflow/rules/annotation.smk (2)

37-47: Add -L and --fail flags to curl for redirect handling and error detection.

The Zenodo URL may redirect, and curl error 78 ("Remote file not found") reported in PR comments could stem from this. Adding -L follows redirects; --fail ensures curl exits with an error on HTTP failures.

     shell:
-        "curl {params.url} -o {output} &> {log}"
+        "curl -L --fail {params.url} -o {output} &> {log}"

50-73: Missing resources declaration for {resources.tmpdir} and temp file cleanup.

The shell script references {resources.tmpdir} (line 63) but no resources block is declared, causing a runtime error. Additionally, the temporary file $tmpfile is never cleaned up.

 rule process_revel_scores:
     input:
         "resources/revel_scores.zip",
     output:
         "resources/revel_scores.tsv.gz",
     params:
         build=get_reference_genome_build(),
     log:
         "logs/vep_plugins/process_revel_scores.log",
+    resources:
+        tmpdir=temp("tmpdir"),
     conda:
         "../envs/tools.yaml"
     shell:
         """
-        tmpfile=$(mktemp {resources.tmpdir}/revel_scores.XXXXXX)
+        tmpfile=$(mktemp "${{TMPDIR:-/tmp}}"/revel_scores.XXXXXX)
+        trap "rm -f $tmpfile" EXIT
         unzip -p {input} | tr "," "\t" | sed '1s/.*/#&/' | bgzip -c > $tmpfile
         if [ "{params.build}" == "GRCh38" ] ; then
             zgrep -h -v ^#chr $tmpfile | awk '$3 != "." ' | sort -k1,1 -k3,3n - | cat <(zcat $tmpfile | head -n1) - | bgzip -c > {output}
         elif [ "{params.build}" == "GRCh37" ] ; then
             cat $tmpfile > {output}
         else
             echo "Annotation of REVEL scores only supported for GRCh37 or GRCh38" > {log}
             exit 125
         fi
         """

🧹 Nitpick comments (1)

workflow/rules/annotation.smk (1)
1-23: Log file path conflict between rules.

Both get_downsampled_vep_cache (line 5) and get_vep_cache (line 20) write to the same log path logs/vep/cache.log. While only one rule should run based on the limit-reads config, using distinct log paths improves traceability and avoids potential conflicts.
 rule get_downsampled_vep_cache:
     output:
         directory("resources/vep/cache_downsampled"),
     log:
-        "logs/vep/cache.log",
+        "logs/vep/cache_downsampled.log",

workflow/rules/common.smk

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@workflow/rules/annotation.smk`:
- Around line 1-9: The two rules get_downsampled_vep_cache and get_vep_cache
both write to the same log file "logs/vep/cache.log", which can cause
overwrites; update the get_downsampled_vep_cache rule (the rule header named
get_downsampled_vep_cache and its log directive) to use a distinct log path (for
example "logs/vep/cache_downsampled.log" or include a unique suffix/wildcard) so
each rule writes to its own log file and preserves logs when both run.

♻️ Duplicate comments (2)

workflow/rules/annotation.smk (2)
37-47: Add -L and --fail flags to curl - likely cause of reported runtime error.

The reported runtime error "Libcurl reported error 78 (Remote file not found)" is likely caused by the Zenodo URL redirecting while curl is not configured to follow redirects. Add -L to follow redirects and --fail to exit on HTTP errors.
Proposed fix
     shell:
-        "curl {params.url} -o {output} &> {log}"
+        "curl -L --fail {params.url} -o {output} &> {log}"
50-73: Missing resources declaration for tmpdir - will cause runtime error.

Line 63 references {resources.tmpdir} but no resources: block is declared. This will cause Snakemake to fail. Additionally, the temporary file $tmpfile is never cleaned up.
Proposed fix
 rule process_revel_scores:
     input:
         "resources/revel_scores.zip",
     output:
         "resources/revel_scores.tsv.gz",
     params:
         build=get_reference_genome_build(),
     log:
         "logs/vep_plugins/process_revel_scores.log",
+    resources:
+        tmpdir=temp("tmpdir"),
     conda:
         "../envs/tools.yaml"
     shell:
         """
-        tmpfile=$(mktemp {resources.tmpdir}/revel_scores.XXXXXX)
+        tmpfile=$(mktemp "${{TMPDIR:-/tmp}}"/revel_scores.XXXXXX)
+        trap "rm -f $tmpfile" EXIT
         unzip -p {input} | tr "," "\t" | sed '1s/.*/#&/' | bgzip -c > $tmpfile
         if [ "{params.build}" == "GRCh38" ] ; then
             zgrep -h -v ^#chr $tmpfile | awk '$3 != "." ' | sort -k1,1 -k3,3n - | cat <(zcat $tmpfile | head -n1) - | bgzip -c > {output}
         elif [ "{params.build}" == "GRCh37" ] ; then
             cat $tmpfile > {output}
         else
             echo "Annotation of REVEL scores only supported for GRCh37 or GRCh38" > {log}
             exit 125
         fi
         """

workflow/rules/annotation.smk

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@workflow/rules/common.smk`:
- Around line 757-763: The function get_revel_tsv returns None when
config.get("limit-reads") is true because the if branch calls
workflow.source_path("../resources/new_tabbed_revel_grch38.1pct.tsv.gz") but
does not return its value; update the if branch to return the result of
workflow.source_path(...) so get_revel_tsv always returns the expected file path
(keep the else branch returning "resources/revel_scores.tsv.gz"), referencing
get_revel_tsv, workflow.source_path, and config.get("limit-reads") to locate the
code.
- Around line 751-754: The function get_vep_cache_dir() calls access.random(...)
but access is not defined; fix by either importing/defining the access object or
replacing those calls with the proper Snakemake remote provider (e.g., use
snakemake.remote.S3.random("resources/vep/cache") or the appropriate remote
class for your storage). Update get_vep_cache_dir() and the corresponding
access.random usages in annotation.smk (the four occurrences) so they reference
a defined symbol (imported access) or use snakemake.remote.S3 (or the correct
snakemake.remote.* class) to obtain the random path.

workflow/rules/common.smk

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@workflow/rules/common.smk`:
- Around line 744-754: get_plugin_aux currently returns a GRCh38-only downsample
file when config.get("limit-reads") is true, which will be wrong for GRCh37
runs; update get_plugin_aux to first detect the active reference (e.g., from
config.get("reference") or the project reference setting) and if plugin ==
"REVEL" and limit-reads is true choose the matching GRCh37 downsample resource
when available or else raise a clear, fast-fail error explaining that a GRCh37
downsample resource is missing and how to provide one; keep the existing GRCh38
path for GRCh38 runs and ensure the error mentions get_plugin_aux and the
config.get("limit-reads") condition so it is easy to locate.
- Around line 738-741: get_tabix_revel_params currently treats any non-"grch37"
reference as grch38 and silently falls back; change it to validate the
config["reference-genome"] explicitly using the project's validated helper
(e.g., the reference validation function used elsewhere) and raise or fail fast
on unknown values, then return "-f -s 1 -b 2 -e 2" for "grch37" and "-f -s 1 -b
3 -e 3" for "grch38"; reference get_tabix_revel_params and
config["reference-genome"] when locating where to replace the fallback logic.

workflow/rules/common.smk

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@workflow/rules/common.smk`:
- Around line 744-753: The path construction in get_plugin_aux (function) uses
workflow.basedir + "../.test/resources/..." which can break resolution; replace
that concatenation with
workflow.source_path(".test/resources/new_tabbed_revel_grch38.1pct.tsv.gz{suffix}".format(suffix=suffix))
so the REVEL test resource path is resolved via Snakemake's API (use the same
approach for the indexed suffix when index is True) instead of direct basedir
manipulation.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@workflow/rules/annotation.smk`:
- Around line 1-11: The shell command in rule get_downsampled_vep_cache
incorrectly calls tar without -f so it treats {input} as a member name; update
the shell line in get_downsampled_vep_cache to pass the archive file to tar
using the -f option (i.e., include -f {input} before -C {output}), preserving
the mkdir, --strip-components 1, and stderr redirection to {log}.

♻️ Duplicate comments (4)

workflow/rules/common.smk (2)
738-741: Validate reference-genome instead of defaulting to GRCh38.

Any non-grch37 value currently falls through to column 3, which can mask config mistakes. Use the validated helper to fail fast on invalid values.
♻️ Proposed fix
 def get_tabix_revel_params():
     # Indexing of REVEL-score file where the column depends on the reference
-    column = 2 if config["reference-genome"] == "grch37" else 3
+    build = get_reference_genome_build()
+    column = 2 if build == "GRCh37" else 3
     return f"-f -s 1 -b {column} -e {column}"
744-754: Fail fast when limit-reads uses a GRCh38-only REVEL downsample.

With limit-reads, the helper always points to a GRCh38 downsample, which is wrong for GRCh37 runs. Guard this and raise a clear error (or add a GRCh37 downsample).
🐛 Proposed fix (fail fast if unsupported)
 def get_plugin_aux(plugin, index=False):
     if plugin == "REVEL":
         suffix = ".tbi" if index else ""
         if config.get("limit-reads"):
+            if get_reference_genome_build() != "GRCh38":
+                raise ValueError(
+                    "limit-reads REVEL downsample is only available for GRCh38"
+                )
             return workflow.source_path(
                 "../../.test/resources/downsampled_tabbed_revel_grch38.1pct.tsv.gz{suffix}".format(
                     suffix=suffix
                 )
             )
         return "resources/revel_scores.tsv.gz{suffix}".format(suffix=suffix)
     return []
workflow/rules/annotation.smk (2)
39-47: Follow redirects / fail on HTTP errors for the REVEL download.

This avoids silent 404s/redirects and aligns with the reported libcurl error 78.
🔧 Proposed fix
-    shell:
-        "curl https://zenodo.org/records/7072866/files/revel-v1.3_all_chromosomes.zip -o {output} &> {log}"
+    shell:
+        "curl -L --fail https://zenodo.org/records/7072866/files/revel-v1.3_all_chromosomes.zip -o {output} &> {log}"
50-73: Define a tmpdir source and clean up the temp file.

{resources.tmpdir} is undefined here, and the temp file is never removed.
🐛 Proposed fix
-        tmpfile=$(mktemp {resources.tmpdir}/revel_scores.XXXXXX)
+        tmpfile=$(mktemp "${TMPDIR:-/tmp}"/revel_scores.XXXXXX)
+        trap "rm -f $tmpfile" EXIT

workflow/rules/annotation.smk

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@workflow/rules/annotation.smk`:
- Around line 16-27: The get_vep_plugins rule currently hardcodes release="111"
while get_vep_cache may use cache_downsampled (release 113) when
limit-reads=true; update get_vep_plugins (the rule named get_vep_plugins and its
release param) so its release value is computed the same way as get_vep_cache
(i.e., use the same helper/variable that picks cache release or branch on
limit-reads/cache_downsampled) or pass through the cache-matched release value,
ensuring the VEP plugins release matches the cache release when downsampled
caches are used.

In `@workflow/rules/common.smk`:
- Around line 745-758: The REVEL branch in get_plugin_aux is returning a CI
resource path that doesn't match the actual file name; update the returned
workflow.source_path call in get_plugin_aux (REVEL branch) to reference the
correct file name "new_tabbed_revel_grch38.1pct.tsv.gz{suffix}" (or
alternatively rename the resource to the current string) so the returned path
matches the CI resource and avoids FileNotFoundError.

♻️ Duplicate comments (1)

workflow/rules/annotation.smk (1)
41-49: Harden curl downloads against redirects/HTTP errors.
The current curl command doesn’t follow redirects or fail on HTTP errors, which can surface as libcurl error 78. Add -L --fail for reliability.
🔧 Suggested tweak
-        "curl https://zenodo.org/records/7072866/files/revel-v1.3_all_chromosomes.zip -o {output} &> {log}"
+        "curl -L --fail https://zenodo.org/records/7072866/files/revel-v1.3_all_chromosomes.zip -o {output} &> {log}"

workflow/rules/annotation.smk

workflow/rules/common.smk

famosab · 2026-01-23T12:42:21Z

workflow/rules/annotation.smk

+    conda:
+        "../envs/tools.yaml"
+    shell:
+        "(mkdir -p {output}; curl -L https://github.com/snakemake-workflows/dna-seq-benchmark/raw/0181ccf16c5483c0d7d1ad1b8f9dfa87376b5b1f/workflow/resources/ci-test-references/vep_cache_113_GRCh38_chr22.tar.gz | tar -xz -C {output} --strip-components 1) 2> {log}"


Hier hab ichs irgendwie nicht geschafft dass es funktioniert, dass er die Datei relativ zur .smk file nimmt. Deswegen zieht er die jetzt mit curl aus unserem repo. Ich wollte sie halt auch selbst zur verfügung haben und nicht angewiesen sein auf ein anderes repo deswegen hab ich sie bei uns jetzt abgelegt.

feat: added VEP annotation with REVEL, Sift and PolyPhen Scores to fp…

1e63fb0

…/fn vcfs.

Merge branch 'main' into feat/add-revel-scores

30894c7

coderabbitai bot reviewed Dec 19, 2025

View reviewed changes

workflow/envs/curl.yaml Outdated Show resolved Hide resolved

workflow/rules/annotation.smk Show resolved Hide resolved

workflow/rules/annotation.smk Show resolved Hide resolved

workflow/rules/annotation.smk Outdated Show resolved Hide resolved

BiancaStoecker added 3 commits December 19, 2025 14:22

fix: also all sift and polyphen to unique fp fn.

e991968

fix: update curl version

b6e3955

Merge branch 'feat/add-revel-scores' of github.com:snakemake-workflow…

ba99965

…s/dna-seq-benchmark into feat/add-revel-scores

coderabbitai bot reviewed Dec 19, 2025

View reviewed changes

Merge branch 'main' into feat/add-revel-scores

9777d13

famosab reviewed Jan 16, 2026

View reviewed changes

BiancaStoecker and others added 8 commits January 20, 2026 10:17

fix: Update wrapper to latests version

e9657ce

Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>

fix: use existing env tools

4812629

Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>

fix: Update wrapper to latest version

49ef78e

Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>

fix: Update wrapper to latest version

2aba570

Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>

fix: Update wrapper to latest version

90d50f1

Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>

fix: Update wrapper to latest version

45a13e3

Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>

fix: use existing env tools

1caa2e0

Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>

test: Add disk space cleanup step for github action

8725eb2

Added a step to free disk space on Ubuntu before testing.

coderabbitai bot reviewed Jan 20, 2026

View reviewed changes

.github/workflows/main.yml Outdated Show resolved Hide resolved

BiancaStoecker and others added 2 commits January 20, 2026 13:27

test: remove docker-images in free disc space action

24d716f

try: using downsampled VEP cache for CI tests

8f63d86

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

workflow/rules/annotation.smk Show resolved Hide resolved

famosab added 3 commits January 22, 2026 13:30

fix: use downsampled cache for both annotations

44f5fe5

fix: gunzip vep tar file

bf82976

fix: utilize chr22 revel for testing

66888a2

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

workflow/rules/common.smk Outdated Show resolved Hide resolved

fix: linting

8c2a640

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

workflow/rules/annotation.smk Outdated Show resolved Hide resolved

test: use downsampled revel score file

dcb06ca

fix: linting

e5ff56b

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

workflow/rules/common.smk Show resolved Hide resolved

workflow/rules/common.smk Outdated Show resolved Hide resolved

famosab added 4 commits January 22, 2026 16:05

fix: correct path

7b5b5dc

fix: use locally created tbi

a05f625

fix: use locally created tbi

99441a4

fix: linting

9bf2d89

coderabbitai bot reviewed Jan 23, 2026

View reviewed changes

workflow/rules/common.smk Show resolved Hide resolved

workflow/rules/common.smk Show resolved Hide resolved

test: move resources to test folder

da06c03

coderabbitai bot reviewed Jan 23, 2026

View reviewed changes

test: change to source_path

6a0db08

coderabbitai bot reviewed Jan 23, 2026

View reviewed changes

workflow/rules/annotation.smk Outdated Show resolved Hide resolved

famosab added 4 commits January 23, 2026 12:43

fix: move to resources folder

02b09b7

fix: gitignore

8772120

fix: tar

e469e50

fix: path

bb8f426

coderabbitai bot reviewed Jan 23, 2026

View reviewed changes

workflow/rules/annotation.smk Show resolved Hide resolved

workflow/rules/common.smk Show resolved Hide resolved

famosab added 3 commits January 23, 2026 13:11

fix: path

8b23d23

fix: path

0181ccf

fix: get tar from git

7e291c7

famosab reviewed Jan 23, 2026

View reviewed changes

famosab changed the title ~~feat: Add VEP annotation with REVEL, Sift and PolyPhen Scores to fp…~~ feat: VEP annotation with REVEL, Sift and PolyPhen Scores to fp/fn vcfs Jan 23, 2026

feat: VEP annotation with REVEL, Sift and PolyPhen Scores to fp/fn vcfs #154

Are you sure you want to change the base?

feat: VEP annotation with REVEL, Sift and PolyPhen Scores to fp/fn vcfs #154

Uh oh!

Conversation

BiancaStoecker commented Dec 19, 2025 • edited by famosab Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

famosab left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

famosab commented Jan 16, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

BiancaStoecker commented Dec 19, 2025 •

edited by famosab

Loading

coderabbitai bot commented Dec 19, 2025 •

edited

Loading