Skip to content

Transcript/Genome alignments (with gaps) - Necessary for RefSeq HGVS c./g. conversion #81

@davmlaw

Description

@davmlaw

Hi, this project looks good! Thanks!

I would like to use Tark as a source of transcripts for Biocommons HGVS Python library

RefSeq transcripts can differ from the genome sequence, so can align to the genome build with indels

For instance NM_001205122.2 (ATG13) aligning to GRCh38 has a 2bp deletion in exon 15 (alignment is 509bp match, 2 bp deletion, 1753bp match).This is critical to know when converting between genomic (g.) and c. HGVS so you can adjust for these gaps

I have already done so in my own project -cdot - which reads RefSeq/Ensembl GFF/GTF files, ideally I would like to stop maintaining this myself and move over to Tark

Eg: https://cdot.cc/transcript/NM_001205122.2 has this alignment info (in Biocommons HGVS style)

[46672254, 46674518, 14, 1635, 3896, "M509 D2 M1753"]

As far as I can see, Tark doesn't have this yet:

https://tark.ensembl.org/api/transcript/?stable_id=NM_001205122&stable_id_version=2&expand_all=true

                {
                    "exon_id": 73193759,
                    "stable_id": "exon-NR_144423.2-19",
                    "stable_id_version": 1,
                    "assembly": "GRCh38",
                    "loc_start": 46672255,
                    "loc_end": 46674518,
                    "loc_strand": 1,
                    "loc_region": "11",
                    "loc_checksum": "F44BD3F6F8F8764182282A78AE315772F78ECCF8",
                    "exon_checksum": "55D9C6A38CC3510856809E31ED688BB19C01786A",
                    "exon_order": 15
                }

Could you please add these alignment strings to RefSeq transcript exons? Knowing mismatches would also be beneficial

I hope to write a JSON client for HGVS, that will only be enabled for Ensembl to start with. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions