-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Description
Introduce a new pipeline step named variant_selection responsible for selecting reportable variants from the variant_collection attribute stored in the Sample object.
This step will act as a logical filter layer between variant collection and downstream reporting, ensuring that only variants relevant to each analysis context are propagated further in the pipeline.
Motivation
Currently, variants are collected and stored in variant_collection according to category (e.g. PR, RR) and source, but there is no dedicated step that encapsulates the logic required to select subsets of variants depending on clinical and analytical context.
Adding a variant_selection step will:
- Centralize and formalize variant selection logic.
- Clearly separate variant collection from variant interpretation/selection.
- Improve maintainability as selection rules evolve.
- Facilitate future extensions (new categories, modes, or selection criteria).
Current behavior
- Variants are aggregated and stored in
variant_collectionat the sample level. - Selection criteria are either applied implicitly downstream or are not explicitly structured as a pipeline step.
- There is no single orchestration point responsible for variant selection across categories and analytical modes.
Proposed refactor
Add a new step named variant_selection. This step will operate per sample and will be responsible for orchestrating variant selection across categories (PR, RR) and modes (screening, advanced for RR)
High-level design
-
The
variant_selectionstep will:- Iterate over the categories present in the sample (e.g. PR, RR).
- For each category, call an auxiliary variant selection function.
- Store the resulting selected variants in a structured, category-aware output (exact storage model to be defined later).
-
The auxiliary selection function will:
- Iterate over the set of variants stored in
variant_collection. - Apply selection rules based on:
- Variant category (PR vs RR).
- Sex of the individual.
- RR mode (e.g. screening vs advanced).
- Return only the variants that satisfy the applicable criteria.
- Iterate over the set of variants stored in
Selection logic (high-level, non-exhaustive)
-
Personal Risk (PR)
- Variants will be selected according to disease/gene inheritance models.
- Different casuistics (e.g. heterozygous, homozygous, compound heterozygous) will be considered depending on the inheritance pattern.
-
Reproductive Risk (RR)
- Variant selection will depend on:
- RR mode (screening or advanced).
- Sex of the individual.
- Different zygosity-based selection rules will apply depending on the context.
- Variant selection will depend on:
Tasks
- Add
variant_selectionattribute to Sample / SampleContext - Add
variant_selection.pystep to steps folder - Add
pr_variant_selection.pyto variant_selection folder (Personal Risk module) and auxiliary functions to ùtils.py`in variant_selection folder - Add
rr_variant_selection.pyto variant_selection folder (Reproductive Risk module)
Additional context
- This step is conceptually downstream of
variant_collection. - It should be designed to be extensible to additional categories or analytical modes.
- The implementation should remain compatible with previous refactors