Skip to content

Accessing transcript data for genes from bacterial artificial chromosomes (BACs) #89

@jarbesfeld

Description

@jarbesfeld

@davmlaw

Hi Dave, I am currently working on a project in Alex Wagner's laboratory aimed at standardizing output from different gene fusion detection algorithms. A central component of this work is using a transcript-based model to model the transcript junctions for each of the partners in a fusion (see our specification for further reference).

We are currently using UTA to get this transcript data, but have observed several cases where an outputted fusion may report genes from bacterial artificial chromosomes as a fusion partner (e.g. RP5-899B16.3 and CTD-2055G21.1). We are considering using cdot in addition to UTA to help get transcript data for gene symbols that may not exist in the recent UTA release.

By processing earlier versions of GENCODE GTFs, such as version 38, we were able to extract the transcripts linked to these gene symbols. However, when querying the matched transcripts using cdot, the gene_name attribute was None. For example, for the gene RP5-899B16.3 we observed:

{'id': 'ENST00000666152.1',
 'chrom': 'NC_000006.12',
 'start': 139938863,
 'end': 139991094,
 'strand': '-',
 'cds_start': 139991094,
 'cds_end': 139991094,
 'gene_name': None,
 'exons': [[139938863, 139939458],
  [139978011, 139978404],
  [139978621, 139978873],
  [139990992, 139991094]]}

We were wondering why the gene_name attribute returned None? Also, would cdot be appropriate for this use case (getting a list of transcripts associated with a gene symbol)? Thank you for help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions