Skip to content

Speed up GenomicArray.intersect for nested intervals #232

@etal

Description

@etal

Currently, GA.intersect checks that endpoints are monotonic, then uses simple binary search (np.searchsorted), otherwise falls back to O(n^2) iteration over both arrays. Not a problem for most of CNVkit, since bins are non-overlapping, but e.g. genome annotation can get ugly.

Explore other methods that make O(n log n) intersection possible even when regions are fully nested:

  1. pygr's Nested Containment List (NCList)
  2. Binary Interval Search (BITS)
  3. Layer & Quinlan (2015)

Look for other efficiencies here too.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions