Transcript/Genome alignments (with gaps) - Necessary for RefSeq HGVS c./g. conversion

Hi, this project looks good! Thanks!

I would like to use Tark as a source of transcripts for [Biocommons HGVS Python library](https://github.com/biocommons/hgvs)

RefSeq transcripts can differ from the genome sequence, so can [align to the genome build with indels](https://hgvs.readthedocs.io/en/stable/examples/using-hgvs.html?highlight=gap#projecting-in-the-presence-of-a-genome-transcript-gap) 

For instance NM_001205122.2 (ATG13) aligning to GRCh38 has a 2bp deletion in exon 15 (alignment is 509bp match, 2 bp deletion, 1753bp match).This is critical to know when converting between genomic (g.) and c. HGVS so you can adjust for these gaps

I have already done so in my own project -[cdot](https://github.com/SACGF/cdot) - which reads RefSeq/Ensembl GFF/GTF files, ideally I would like to stop maintaining this myself and move over to Tark

Eg: https://cdot.cc/transcript/NM_001205122.2 has this alignment info (in Biocommons HGVS style)

```
[46672254, 46674518, 14, 1635, 3896, "M509 D2 M1753"]
```

As far as I can see, Tark doesn't have this yet:

https://tark.ensembl.org/api/transcript/?stable_id=NM_001205122&stable_id_version=2&expand_all=true

```
                {
                    "exon_id": 73193759,
                    "stable_id": "exon-NR_144423.2-19",
                    "stable_id_version": 1,
                    "assembly": "GRCh38",
                    "loc_start": 46672255,
                    "loc_end": 46674518,
                    "loc_strand": 1,
                    "loc_region": "11",
                    "loc_checksum": "F44BD3F6F8F8764182282A78AE315772F78ECCF8",
                    "exon_checksum": "55D9C6A38CC3510856809E31ED688BB19C01786A",
                    "exon_order": 15
                }
```

Could you please add these alignment strings to RefSeq transcript exons? Knowing mismatches would also be beneficial

I hope to write a JSON client for HGVS, that will only be enabled for Ensembl to start with. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcript/Genome alignments (with gaps) - Necessary for RefSeq HGVS c./g. conversion #81

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Transcript/Genome alignments (with gaps) - Necessary for RefSeq HGVS c./g. conversion #81

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions