simpleaf index: runtime expectations

Cross-posting from https://www.reddit.com/r/bioinformatics/comments/1g6zfu6/simpleaf_index_long_runtime/

Is there some guidance about the expected runtime of `simpleaf index` anywhere?

The post above reports 20 min runtime for human using 16 CPUs.

In my current situation, Drosophila has a genome of approx. 180 Mb and my HPC job with 16 CPUs timed out after an hour.

- Is there a rule of thumb that can help users guesstimate runtime based on genome size and/or annotated features?
- Is there guidance on reasonable range of values for the number of CPU (maximum after which more CPUs don't help much)
- Any other guidance on sanity checks and steps users can take to optimise performance and runtime?

PS: my command is `simpleaf index --output resources/genome/index/alevin --fasta tmp_alevin_index.fa --gtf resources/genome/genome.gtf.gz --rlen 150 --threads 16 --use-piscem`

In particular, I've set `--rlen 150` based on the length of my scRNAseq reads. Is that alright?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simpleaf index: runtime expectations #166

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

simpleaf index: runtime expectations #166

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions