Skip to content

fix: Fixes issue with unsupported sequences from reference during encoding #3

Open
kazmiekr wants to merge 26 commits intobroadinstitute:dev_refactor_invitaefrom
invitae:main
Open

fix: Fixes issue with unsupported sequences from reference during encoding #3
kazmiekr wants to merge 26 commits intobroadinstitute:dev_refactor_invitaefrom
invitae:main

Conversation

@kazmiekr
Copy link

While running Pangolin on a larger set of variants, we found it failing to properly encode the padded reference sequence from the fasta due to some unsupported characters.

For example, sometimes a Y was returned in the reference sequence, and it would cause a runtime error here: https://github.com/invitae/pangolin/blob/main/pangolin/utils.py#L151

This fix does a check to ensure all the characters from the reference can be encoded (ACTGN) before it gets batched for prediction, otherwise no prediction is generated for that variant without a runtime error.

kazmiekr and others added 26 commits May 17, 2023 09:01
…e fasta would cause the tensor encoding to fail
fix: Fixes issue with unsupported sequences from reference during encoding
…es are potentially overwritten during iteration of the genes
fix: Ensure the version is exposed
fix: Fixes issue with the normalized chr string
fix: Issue with calculating score on predictions with multiple genes
# Conflicts:
#	pangolin/legacy.py
…n-alt

fix: skip variants with non-ACTG bases in alt seq
fix: Removes legacy code in favor of always using the batching path
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants