Skip to content

Add Nanopore support#31

Open
GilbertHan1011 wants to merge 7 commits intoregulatory-genomics:mainfrom
GilbertHan1011:main
Open

Add Nanopore support#31
GilbertHan1011 wants to merge 7 commits intoregulatory-genomics:mainfrom
GilbertHan1011:main

Conversation

@GilbertHan1011
Copy link
Collaborator

In Nanopore ATAC and Hi-C sequencing, reads from the same library can map to either the forward or reverse strand. To handle this, this commit introduces an Unstranded mark in seqspec.

Key changes include:

  1. Two-Phase Parsing: If marked as unstranded, the parser will evaluate both forward and reverse orientations.
  2. Barcode Rescue: Because a single Nanopore read can contain both forward and backward linkers simultaneously, the parser now returns both results if both orientations parse successfully, ensuring we can rescue barcodes in downstream task.
  3. Indel Tolerance: Added bounded edit-distance logic to account for the high insertion/deletion error rates characteristic of scNanopore sequencing.
  4. Algorithmic Optimization: Refactored the anchor search by replacing the KMP algorithm with a sliding window approach, which I find benchmarked at 2x faster in stress tests.

TODO:

  • Develop a Levenshtein-based barcode correction algorithm, as Nanopore barcodes are also highly susceptible to indels. I have implement this in hic-tailor, but I don't have time to implement for precellar now.

This strategy can retrieve 96% barcode in a normal single nanopore datasets.

@kaizhang
Copy link
Member

@GilbertHan1011 @PPSherry Rui is working on this as well. You two can discuss how to best handle this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants