current transcipt given genelist
1
0
Entering edit mode
7.4 years ago
bioguy24 ▴ 230

Is there a way to get the current version of a transcript for a genelist?

For example

a file with
MECP2 is used with the below results

MECP2     NM_004992.3

I have tried a mysql dump from UCSC and using the LRG_RefSeqGene, the problem is those have many duplicate entries in it that lead to incorrect information. Thank you :).

ngs • 1.7k views
ADD COMMENT
1
Entering edit mode

have you tried looking up from the gtf/gff file?

ADD REPLY
0
Entering edit mode

No how would I do that? Not familiar with that format. Thank you :).

ADD REPLY
0
Entering edit mode

I think @Prasad meant that you would get the GTF/GFF annotation file the for the genome of choice (which should be human in your case based on past interactions).

I think the problem may not be "duplicate" entries but of these gene names being common for multiple organisms. MECP2 seems to come up with 116 human entries in Genes database at NCBI (and over 2000 for all organisms).

Another option would be to get the gene2refseq mapping file. Look for entries that say reviewed and then narrow down to the organism of choice.

ADD REPLY
0
Entering edit mode
7.3 years ago
bioguy24 ▴ 230

Using the mapping file I get results after trimming with awk that look like

9606    4204    REVIEWED    NM_001110792.1  160707949   NP_001104262.1  160707950   NC_000023.11 568815575  154021799   154097730   -   Reference GRCh38.p7 Primary Assembly    -   -   MECP2
9606    4204    REVIEWED    NM_001316337.1  938320030   NP_001303266.1  938320031   NC_000023.11568815575   154021799   154097730   -   Reference GRCh38.p7 Primary Assembly    -   -   MECP2
9606    4204    REVIEWED    NM_004992.3 160707948   NP_004983.1 4826830 NC_000023.11    568815575154021799  154097730   -   Reference GRCh38.p7 Primary Assembly    -   -   MECP2

The last NM_004992.3 contains all exons while the first NM_001110792.1 lacks exon 2. I have a list of ~700 genes that are similar and trying to find an automated way of mapping it to the correct NM_. Maybe downloaded the canonical transcripts for the 700 genes? Is this the best approach? Thank you :).

ADD COMMENT
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 1482 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6