fix: Fixes issue with unsupported sequences from reference during encoding #3
Open
kazmiekr wants to merge 26 commits intobroadinstitute:dev_refactor_invitaefrom
Open
fix: Fixes issue with unsupported sequences from reference during encoding #3kazmiekr wants to merge 26 commits intobroadinstitute:dev_refactor_invitaefrom
kazmiekr wants to merge 26 commits intobroadinstitute:dev_refactor_invitaefrom
Conversation
…e fasta would cause the tensor encoding to fail
fix: Fixes issue with unsupported sequences from reference during encoding
…es are potentially overwritten during iteration of the genes
fix: Ensure the version is exposed
fix: Fixes issue with the normalized chr string
fix: Issue with calculating score on predictions with multiple genes
feat: warn about single-exon transcripts
# Conflicts: # pangolin/legacy.py
…n-alt fix: skip variants with non-ACTG bases in alt seq
fix: Removes legacy code in favor of always using the batching path
chore: Bumps version
chore: Updates legal text
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
While running Pangolin on a larger set of variants, we found it failing to properly encode the padded reference sequence from the fasta due to some unsupported characters.
For example, sometimes a
Ywas returned in the reference sequence, and it would cause a runtime error here: https://github.com/invitae/pangolin/blob/main/pangolin/utils.py#L151This fix does a check to ensure all the characters from the reference can be encoded (
ACTGN) before it gets batched for prediction, otherwise no prediction is generated for that variant without a runtime error.