Currently, GA.intersect checks that endpoints are monotonic, then uses simple binary search (np.searchsorted), otherwise falls back to O(n^2) iteration over both arrays. Not a problem for most of CNVkit, since bins are non-overlapping, but e.g. genome annotation can get ugly.
Explore other methods that make O(n log n) intersection possible even when regions are fully nested:
- pygr's Nested Containment List (NCList)
- Binary Interval Search (BITS)
- Layer & Quinlan (2015)
Look for other efficiencies here too.