Biostar Beta. Not for public use.
annotation issue from Ensemble ID to gene name
0
Entering edit mode
16 months ago
Learner • 160

Hello,

I asked a question and someone gave an answer which I liked (this is the question https://www.biostars.org/p/293965/#294344) . The problem I have been facing is that there are some genes (about 3000 that I cannot annotate) I am using the same method as described here or I tried to convert them based on Uniprot. I have been trying to find a solution which I could not. Is there anybody who knows how to convert them to gene names? I posted few of the ones that I cannot convert.

If there is no solution, then can you please explain why?

ENSG00000122718
ENSG00000130201
ENSG00000150076
ENSG00000150526
ENSG00000155640
ENSG00000166748
ENSG00000168260
ENSG00000168787
ENSG00000170590
ENSG00000170803
ENSG00000171484
ENSG00000172381
ENSG00000172774

RNA-Seq genome • 583 views
0
Entering edit mode

Problem is these are retired gene identifiers. If you were to look these up HERE you can map them. see examples below.

ENSG00000166748 = AGBL1
ENSG00000170803 = OR2AG1

0
Entering edit mode

@genomax are you aware of any way to annotate them with programing? It is very hard to annotate 3000 genes one by one

0
Entering edit mode

Why are you using old annotations? Did you align your data against hg19/GRCh37?

0
Entering edit mode

0
Entering edit mode

I am not sure what you ultimate aim is but you are going to be taking a leap of faith by assuming that results from data aligned to an old genome build are going to translate to current genome build. Any new work you may end up doing, you will likely need to use GRCh38 to be able to publish.

There are rest API end-points for Ensembl archives. You may want to create a help ticket with Ensembl support if you want to get help in using that API. There may also be past threads on Biostars related to this topic.

0
Entering edit mode

I had a similar issue last year. I spoke with Tomas at EBI and he directed me to the REST API also. Basically what happens is it gets the coords of the retired ENSG and then, using those coords, it grabs the new ENSG from the latest reference genome.

He highlighted one likely problem... some old IDs may over lap 2 new Ids - so which one to choose may be an issue.

0
Entering edit mode

@kennethcondon2007 can you please share with me the way you did it? I am really confused and i dont know what to do to get their gene name :-(

0
Entering edit mode

Unfortunately I never had a chance to implement his advice, but here are the steps I wrote down so I knew where to start when I got back to it:

ENSEMBL REST API

REST API: MAPPING --> convert coords of one assembly to another

REST API: OVERLAP --> Retrieves features (e.g. geneIDs) that overlap a given region (warning: u may get more than one object for a region but it should be rare)

Sorry I can't be more help.