Retrieve gene names from gene symbols using Entrez?
4
0
Entering edit mode
7.8 years ago

Hello,

Is it possible to use any of the Entrez tools to query a gene symbol and retrieve the gene name? As in, query "DRD1" and retrieve "dopamine receptor D1" as a result. If it can't be done with Entrez but can be some other way, I would gladly follow that too!

Thank you in advance, you're all BioStars :)

entrez biopython gene symbol gene • 2.2k views
ADD COMMENT
2
Entering edit mode
7.8 years ago

As very very often, the answer is probably Ensembl's biomart :)

ADD COMMENT
2
Entering edit mode
7.8 years ago
GenoMax 141k

It can probably be done using eUtils or by parsing the correct gene name (there are more than one DRD1 genes) from this file.

ADD COMMENT
2
Entering edit mode
7.8 years ago

Thanks guys! Here's the program I ended up writing in case anyone wants to do the same thing!

def gene_alias(gene_file):
from Bio import Entrez
import csv

Entrez.email = "your email"
genes = [gene.rstrip('\n') for gene in open(gene_file)]

ids = []
aliases = []

for gene in genes:
    #retrieve gene ID
    handle = Entrez.esearch(db="gene", term="Mus musculus[Orgn] AND " + gene + "[Gene]")
    record = Entrez.read(handle)

    if len(record["IdList"]) > 0:
        ids.append(record["IdList"][0])

        #retrieve aliases
        record_with_aliases = Entrez.efetch(db="gene",id=record["IdList"][0],retmode="json")
        entry = record_with_aliases.read()
        entry_lines = entry.splitlines()
        for i in range(len(entry_lines)):
            while 'This record was replaced with GeneID:' in entry_lines[i]:
               new_id = entry_lines[i][38:]
               record_with_aliases = Entrez.efetch(db="gene",id=new_id ,retmode="json")
               entry = record_with_aliases.read()
               entry_lines = entry.splitlines()


        firstline = entry.splitlines()[1]
        if gene.lower() == firstline[3:].lower():
            thirdline = entry.splitlines()[3]
            fourthline = entry.splitlines()[4]
            if thirdline[0:13] == 'Other Aliases':
                aliases.append(thirdline[15:])
            elif fourthline == 'This record was discontinued.':
                aliases.append(fourthline)
            else:
                aliases.append('no aliases')
        else:
            aliases.append(firstline[3:])

    else:
        ids.append(gene + ' is not in Gene')
        aliases.append(gene + ' is not in Gene')

rows = zip(genes, ids, aliases)
with open('gene_aliases.csv', 'wb') as thefile:
    writer = csv.writer(thefile)
    writer.writerow(['Gene', 'ID', 'Aliases'])
    for row in rows:
        writer.writerow(row)
ADD COMMENT

Login before adding your answer.

Traffic: 1411 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6