Skip to content

simpleaf index: runtime expectations #166

@kevinrue

Description

@kevinrue

Cross-posting from https://www.reddit.com/r/bioinformatics/comments/1g6zfu6/simpleaf_index_long_runtime/

Is there some guidance about the expected runtime of simpleaf index anywhere?

The post above reports 20 min runtime for human using 16 CPUs.

In my current situation, Drosophila has a genome of approx. 180 Mb and my HPC job with 16 CPUs timed out after an hour.

  • Is there a rule of thumb that can help users guesstimate runtime based on genome size and/or annotated features?
  • Is there guidance on reasonable range of values for the number of CPU (maximum after which more CPUs don't help much)
  • Any other guidance on sanity checks and steps users can take to optimise performance and runtime?

PS: my command is simpleaf index --output resources/genome/index/alevin --fasta tmp_alevin_index.fa --gtf resources/genome/genome.gtf.gz --rlen 150 --threads 16 --use-piscem

In particular, I've set --rlen 150 based on the length of my scRNAseq reads. Is that alright?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions