Skip to content

Improve speed (& logging) of file QC checks #1

@selkamand

Description

@selkamand

When processing 288 samples (each with SNV, SV, and CNV entries), the log suggests it takes over 4 minutes from checking manifest headers to counting samples, leaving users without any progress updates or understanding of the delay.

[2025-03-26T08:57:09Z INFO scarscape::utils] Found Manifest with 4 columns: sample,snv,cnv,sv
[2025-03-26T09:01:04Z INFO scarscape::utils] Manifest describes 288 unique samples

During this non-logged period, we validate manifest files for downstream analysis by checking file extensions, ensuring VCFs are bgzipped and indexed. We should make 2 changes to the codebase

  1. Log the ongoing checks to inform users of the current activity:
    info!("Checking all files described in manifest are suitable for analysis");
  2. Optimize these validations for speed — investigate whether the bgzip compression check is causing the delay

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions