-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Hi Dave, I am currently working on a project in Alex Wagner's laboratory aimed at standardizing output from different gene fusion detection algorithms. A central component of this work is using a transcript-based model to model the transcript junctions for each of the partners in a fusion (see our specification for further reference).
We are currently using UTA to get this transcript data, but have observed several cases where an outputted fusion may report genes from bacterial artificial chromosomes as a fusion partner (e.g. RP5-899B16.3 and CTD-2055G21.1). We are considering using cdot in addition to UTA to help get transcript data for gene symbols that may not exist in the recent UTA release.
By processing earlier versions of GENCODE GTFs, such as version 38, we were able to extract the transcripts linked to these gene symbols. However, when querying the matched transcripts using cdot, the gene_name attribute was None. For example, for the gene RP5-899B16.3 we observed:
{'id': 'ENST00000666152.1',
'chrom': 'NC_000006.12',
'start': 139938863,
'end': 139991094,
'strand': '-',
'cds_start': 139991094,
'cds_end': 139991094,
'gene_name': None,
'exons': [[139938863, 139939458],
[139978011, 139978404],
[139978621, 139978873],
[139990992, 139991094]]}We were wondering why the gene_name attribute returned None? Also, would cdot be appropriate for this use case (getting a list of transcripts associated with a gene symbol)? Thank you for help.