Use Python (e.g. mygene, biomart, db2db) to convert GO Terms to all genes in pathway
1
1
Entering edit mode
8.5 years ago
jolespin ▴ 150

I want to figure out how to use Python to get all genes associated with a GO Term. I was trying to Biomart Python API but it's really weird and Retrieve All Genes Associated With A Go Term is for R. I'm trying to use mygene but there's no GO for the scopes (the input term) only in the fields (output terms) Gene Id Conversion Tool I need to do it for ~6000 GO pathways. I got it to work using bioservices db2db but I get timed out. Set a pause for 5 seconds between each search and I'm still getting the timeout. (http://pythonhosted.org/bioservices/references.html#module-bioservices.biodbnet if anyone wants to know how to use this...really useful)

Does anyone know a tool in Python I can use to do this that won't time me out?

gene python ontology mygene pathway • 3.6k views
ADD COMMENT
0
Entering edit mode

https://github.com/endrebak/biomartian

biomartian -d rnorvegicus_gene_ensembl  -i external_gene_name -o go_id | shuf -n 10
Lpcat1  GO:0005509
Klb GO:0005975
LOC498555   GO:0003735
Map3k12 GO:0046777
Hoxb1   GO:0045944
Cir1    GO:0006397
Rhoc    GO:0005525
Casr    GO:0060613
Cib1    GO:1900026
Onecut1 GO:0002064
ADD REPLY
3
Entering edit mode
7.8 years ago
Newgene ▴ 370

You can use mygene Python module to query GO terms for matching genes:

import mygene
mg = mygene.MyGeneInfo()

to query just one GO term:

mg.query('GO:0023026', size=1000)

By default, it returns the first ten matched genes, to get all genes, set a higher size like 1000. Note that there are some GO terms having a large list of genes associated, and you probably don't want to retrieve them all (not that useful anyway), so cap the returned gene list up to 1000 should be a reasonable setting (also avoid timeout).

With mygene, you can also query multiple GO terms in a batch:

mg.querymany(["GO:0023026", "GO:0002503"], scopes='go', size=1000)

In returned result, each gene hit contains a "query" attribute with the value of the corresponding GO term.

And you might want to restrict the number of GO terms in one batch, so that you don't overload the server.

ADD COMMENT

Login before adding your answer.

Traffic: 1832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6