Question

current transcipt given genelist

0

Entering edit mode

7.4 years ago

bioguy24 ▴ 230

Is there a way to get the current version of a transcript for a genelist?

For example

a file with
MECP2 is used with the below results

MECP2     NM_004992.3

I have tried a mysql dump from UCSC and using the LRG_RefSeqGene, the problem is those have many duplicate entries in it that lead to incorrect information. Thank you :).

ngs • 1.7k views

ADD COMMENT • link 7.3 years ago by bioguy24 ▴ 230

1

Entering edit mode

have you tried looking up from the gtf/gff file?

ADD REPLY • link 7.4 years ago by Prasad ★ 1.6k

0

Entering edit mode

No how would I do that? Not familiar with that format. Thank you :).

ADD REPLY • link 7.3 years ago by bioguy24 ▴ 230

0

Entering edit mode

I think @Prasad meant that you would get the GTF/GFF annotation file the for the genome of choice (which should be human in your case based on past interactions).

I think the problem may not be "duplicate" entries but of these gene names being common for multiple organisms. MECP2 seems to come up with 116 human entries in Genes database at NCBI (and over 2000 for all organisms).

Another option would be to get the gene2refseq mapping file. Look for entries that say reviewed and then narrow down to the organism of choice.

ADD REPLY • link 7.3 years ago by GenoMax 141k

score 0 · Answer 1 · 2016-12-28

Using the mapping file I get results after trimming with awk that look like

9606    4204    REVIEWED    NM_001110792.1  160707949   NP_001104262.1  160707950   NC_000023.11 568815575  154021799   154097730   -   Reference GRCh38.p7 Primary Assembly    -   -   MECP2
9606    4204    REVIEWED    NM_001316337.1  938320030   NP_001303266.1  938320031   NC_000023.11568815575   154021799   154097730   -   Reference GRCh38.p7 Primary Assembly    -   -   MECP2
9606    4204    REVIEWED    NM_004992.3 160707948   NP_004983.1 4826830 NC_000023.11    568815575154021799  154097730   -   Reference GRCh38.p7 Primary Assembly    -   -   MECP2

The last NM_004992.3 contains all exons while the first NM_001110792.1 lacks exon 2. I have a list of ~700 genes that are similar and trying to find an automated way of mapping it to the correct NM_. Maybe downloaded the canonical transcripts for the 700 genes? Is this the best approach? Thank you :).