Hi, this project looks good! Thanks!
I would like to use Tark as a source of transcripts for Biocommons HGVS Python library
RefSeq transcripts can differ from the genome sequence, so can align to the genome build with indels
For instance NM_001205122.2 (ATG13) aligning to GRCh38 has a 2bp deletion in exon 15 (alignment is 509bp match, 2 bp deletion, 1753bp match).This is critical to know when converting between genomic (g.) and c. HGVS so you can adjust for these gaps
I have already done so in my own project -cdot - which reads RefSeq/Ensembl GFF/GTF files, ideally I would like to stop maintaining this myself and move over to Tark
Eg: https://cdot.cc/transcript/NM_001205122.2 has this alignment info (in Biocommons HGVS style)
[46672254, 46674518, 14, 1635, 3896, "M509 D2 M1753"]
As far as I can see, Tark doesn't have this yet:
https://tark.ensembl.org/api/transcript/?stable_id=NM_001205122&stable_id_version=2&expand_all=true
{
"exon_id": 73193759,
"stable_id": "exon-NR_144423.2-19",
"stable_id_version": 1,
"assembly": "GRCh38",
"loc_start": 46672255,
"loc_end": 46674518,
"loc_strand": 1,
"loc_region": "11",
"loc_checksum": "F44BD3F6F8F8764182282A78AE315772F78ECCF8",
"exon_checksum": "55D9C6A38CC3510856809E31ED688BB19C01786A",
"exon_order": 15
}
Could you please add these alignment strings to RefSeq transcript exons? Knowing mismatches would also be beneficial
I hope to write a JSON client for HGVS, that will only be enabled for Ensembl to start with. Thanks!
Hi, this project looks good! Thanks!
I would like to use Tark as a source of transcripts for Biocommons HGVS Python library
RefSeq transcripts can differ from the genome sequence, so can align to the genome build with indels
For instance NM_001205122.2 (ATG13) aligning to GRCh38 has a 2bp deletion in exon 15 (alignment is 509bp match, 2 bp deletion, 1753bp match).This is critical to know when converting between genomic (g.) and c. HGVS so you can adjust for these gaps
I have already done so in my own project -cdot - which reads RefSeq/Ensembl GFF/GTF files, ideally I would like to stop maintaining this myself and move over to Tark
Eg: https://cdot.cc/transcript/NM_001205122.2 has this alignment info (in Biocommons HGVS style)
As far as I can see, Tark doesn't have this yet:
https://tark.ensembl.org/api/transcript/?stable_id=NM_001205122&stable_id_version=2&expand_all=true
Could you please add these alignment strings to RefSeq transcript exons? Knowing mismatches would also be beneficial
I hope to write a JSON client for HGVS, that will only be enabled for Ensembl to start with. Thanks!