Skip to content

Two ideas on optimization #8

@shelkmike

Description

@shelkmike

For large eukaryotic genomes, the file overlap.paf may be very large. I think, VeChat can be optimized in two ways to deal with this:

  1. Instead of making overlap.paf, it can make overlap.paf.gz . This can be achieved by compressing the output of fpa with " | gzip -1 >". Racon can take gzipped files with overlaps as input.
  2. It's probably worth to add a parameter that sets the minimum overlap length. If reads' N50 is, for example, 20 kbp, the minimum overlap can be safely raised from the default 500 bp to, for example, 5000 bp. It will not only decrease the size of the paf file, but also probably accelerate the error correction by avoiding consideration of short overlaps.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions