-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
Cross-posting from https://www.reddit.com/r/bioinformatics/comments/1g6zfu6/simpleaf_index_long_runtime/
Is there some guidance about the expected runtime of simpleaf index anywhere?
The post above reports 20 min runtime for human using 16 CPUs.
In my current situation, Drosophila has a genome of approx. 180 Mb and my HPC job with 16 CPUs timed out after an hour.
- Is there a rule of thumb that can help users guesstimate runtime based on genome size and/or annotated features?
- Is there guidance on reasonable range of values for the number of CPU (maximum after which more CPUs don't help much)
- Any other guidance on sanity checks and steps users can take to optimise performance and runtime?
PS: my command is simpleaf index --output resources/genome/index/alevin --fasta tmp_alevin_index.fa --gtf resources/genome/genome.gtf.gz --rlen 150 --threads 16 --use-piscem
In particular, I've set --rlen 150 based on the length of my scRNAseq reads. Is that alright?
Thanks!
Metadata
Metadata
Assignees
Labels
No labels